BIG Data Interviews

Interviews with prominent Big Data experts, including the Vice Presidents, Founders and CEOs of a number of main players as well as a number of leading academics in the area. These interviews were conducted with the EU BIG Project which will develop a number of white papers and roadmaps on Big Data (http://www.big-project.eu/)

Alon Halevy Research Scientist at Google

In the interview Alon drew upon his work on Google Fusion Tables which allows users to upload and store their datasets. A collection of technologies which are not necessarily new but are now beginning to work at scale are having an impact. These include: reconciling entities (saying X and Y are the same thing), resolving schema and ontology differences, extracting high quality data from the web, large ontology based datasets (typically built from wikipedia), crowd sourcing computation.

Alon Halevy leads the Structured Data research team at Google Research. Prior to that, Alon was a professor of Computer Science at the University of Washington, where he started UW CSE Database Group in 1998, and worked in the field of data integration and web data.

 

Jeni Tennison Technical Director of the Open Data Institute

Jeni discussed how open data can be found and combined to serve decision making. A key technology of interest, pointed out by Jeni, was discovery of datasets that are distributed in the internet and tools that allow achieving this in an automated way.

Within the wider UK public sector, Jeni Tennison worked on the early linked data work on data.gov.uk, helping to engineer new standards for the publication of statistics as linked data; building APIs for geographic, transport and education data; and supporting the publication of public sector organograms as open data. She continues her work within the UK's public sector as a member of both the UK Government Linked Data Group and the Open Data User Group.

 

Usman Haque, Pachube Founder and Director Urban Projects Division COSM

Usman mostly covered a community oriented view to Big Data Acquisition which he says is very important if citizens and communities are to fully engage with important issues in the world. Key here is the fact that the community can overcome any deficiencies (errors or heterogeneities) by creating their own specific tools.

Usman Haque has worked a lot with interactive environments over the years, founded a web platform for building internet-connected devices, buildings and environments for storing, sharing and discovering of real time sensor, energy and environmental data, known as Pachube, acquired by LogMeln in 2011. Later on, Usman took part in launching COSM.com platform where he was heading up urban projects that dealt with data, sensors and internet of things.

 

Prasanna Lal Das Lead Program Officer Controllers World Bank

The interview covered how he sees Big Data can help tackle poverty and proactively address corruption by support decision making based on real-time data. On behalf of Prasanna we would like to stress out that the opinion provided in the interview and Prasanna himself should not be regarded as an expert in poverty or anti-corruption measures.

Prasanna Lal Das is senior program officer, Office of the Controller, World Bank. He is a content strategist and KM practitioner with experience in journalism, computer games design, and management consulting.

 

Hjalmar Gislason Founder of DataMarket.com

Hjalmar Gislason is the founder and CEO of DataMarket.com. In this Interview Hjalmar Gislason covers the area of Data Visualization and Data Modelling via semantics. He believes the simplicity of use to be crucial to success and that lot of technologies like the Semantic Web Stack are over engineered. According to him there is a high demand for "democratization of semantic technologies" - making everything accessible through a web browser and dealing with legacy versions of IE.

DataMarket helps business users find and understand data, and data providers to efficiently publish their data and reach new audiences. DataMarket.com provides access to thousands of data sets holding hundreds of millions of facts and figures from a wide range of public and private data providers including the United Nations, the World Bank, Eurostat and the Economist Intelligence Unit. The data portal allows this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner.

 

Andraz Tori Founder and CTO of Zemanta

In this interview Andraz mainly covers Hadoop framework, explains why it was successful and provides interesting remarks on why the US seems to do better than Europe in Big Data technologies at the moment.

Andraz Tori is a CTO and co-founder of Zemanta, a 5-years old company dealing with semantic analysis of text for the purpose of having a personal writing assistant and general purpose recommendations. In terms of Big Data Andraz characterizes Zemanta as a "small data" inside Big Data. The company operates in terabytes of compressed data, running CPU intensive operations.

 

Jim Webber Chief Scientist at Neo

Neo Technology brings the power of graph databases to a wide variety of clients. Neo4J, the world's leading graph database, has the largest ecosystem of partners and developers and tens of thousands of successful deployments. From websites adding social capabilities to Telco's providing personalized customer services to innovative bioinformatics research, organizations adopt graph databases to quickly model and query connected data.

 

Francois Bancilhon CEO of Data Publica

Data Publica is a young French startup, which includes a group of data developers responsible for producing data sets (custom datasets based on the customer specifications and off-the-shelf datasets based on market demands). Producing data implies the identification of the sources of data; extracting the data; turning the raw data into the structured data; and, finally, delivering it to the customer.

 

Richard Benjamins Director Business Intelligence Telefónica Digital

Richard Benjamins is the Director of Business Intelligence at Telefónica Digital and a coordinator of multiple global projects in the Big Data analytics area.

 

Frank van Harmelen Professor in AI at the Free University of Amsterdam

The talk covers two large areas: a number of technologies which need to be deployed before large scale reasoning and then large scale reasoning itself.

Frank van Harmelen is a Dutch Computer Scientist and Professor in Knowledge Representation & Reasoning in the AI department at the Vrije Universiteit Amsterdam.

 

Alek Kolcz a Research Scientist at Twitter

Alek Kolcz is a data scientist at Twitter, where he is working in User Personalization and Recommender Systems group, particularly, on the user modeling side. His relationship to the Big Data involves Big Data analysis on Twitter that is a large real time social networking service with extremely high rate of updates and high information flows per day.

 

Sören Auer a Senior Scientist at the University of Leipzig

Sören Auer in one of leading experts in Semantic Web technologies, particularly, in Linked Data area. Sören is a coordinator of EU-FP7-ICT LOD2 - Creating Knowledge out of Interlinked Data project, which aims to support a lifecycle of Linked Data on the Web ranging from extraction of information, authoring, quality analysis, visualization, user interfaces for exploration of Linked Data etc. Sören is a member of Leipzig research group dealing with such a well known project as DBPedia.

 

Jim Hendler Professor at Rensselaer Polytechnic Institute

James Alexander Hendler is an artificial intelligence researcher at Rensselaer Polytechnic Institute, USA, and one of the originators of the Semantic Web.

 

Ricardo Baeza-Yates

The main suggested themes to invest in by Ricardo are:

A) What he called Hadoop++ the ability to handle graphs with trillions of edges as MapReduce doesn't scale well for graphs;

B) Stream data mining - the ability to handle streams of large volumes of data. Handling lots of data in a 'reasonable' amount of time is key for Ricardo - for example, being able to carry out offline computations within a week rather than a year.

Additional point of interest of Ricardo was personalisation and its relation to privacy. Rather than personalising based on user data we should personalise around user tasks. More details in the interview!

Ricardo Baeza-Yates is VP of Research for Europe and Latin America, leading the Yahoo! Research labs at Barcelona, Spain and Santiago, Chile, and also supervising the lab in Haifa, Israel. Until 2005 he was the director of the Center for Web Research at the Department of Computer Science of the Engineering School of the University of Chile; and ICREA Professor and founder of the Web Research Group at the Dept. of Information and Communication Technologies of Universitat Pompeu Fabra in Barcelona, Spain.

 

Big Data Analysis Interview with Bill Thompson Head of Partnership Development, BBC Archives

According to Bill Thompson, the term Big Data as a general label is to be viewed sceptically, as to his mind there is nothing fundamentally new in computer science terms.However, he does agree that there are certain technologies that should be invested in. Especially the EU should invest from a public service point of view to counteract large companies that will focus purely on areas for profit.

He also provided two UK-related analogies that should be avoided: Firstly, UK schools having to suffer in computer science education because MS Office was adopted and secondly big pharma not investing in cures for Malaria.

Furthermore, he thinks that it is very important that the EU invests from a public service point of view to counteract the big companies that will focus purely on areas for profit. He gave 2 analogies that we might want to avoid: UK schools having to suffer in computer science education because MS Office was adopted; and big pharma not investing in cures for Malaria.

Bill Thompson, Head of the Partnership Development within the BBC Archives Development group, is an English technology journalist, commentator and writer, best known for his weekly column in the Technology section of BBC News Online and his appearances on Click, a radio show on the BBC World Service.