Open data exploration – Pilot testing by SYNYO

Throughout the third pilot use-case of the SmartDataLake project, SYNYO has applied selected components from the SmartDataLake toolkit to complement the end-to-end workflow of open data processing, analysis and converting open data into insights and actionable information. This was performed primarily on datasets surrounding the European research landscape, e.g., projects performed under the H2020 programme, and the organisations participating in executing them, with a focus of applying key SmartDataLake functionalities, including:

– Flexibly and interactively working with the data using RAW.
– Extending the core dataset by joining it with other, independent datasets through entity resolution.
– Analysing and evaluating the data with support from the SciNeM, LOCI and SimSearch components.
– Integrating the gained insights into an end-user targeting solution.

While the first two action points enabled working with the data more easily and flexibly, e.g., by providing a flexible SQL-like interface to raw data sources, the analysis made it possible to evaluate the relevance of the data, by e.g.:

– Ranking individual entities based on their interaction within the whole network.
– Identifying communities of entities belonging together.
– Comparing entities based on their similarity.
– Applying the above in a geographical context and over time.

The main insights gained from the analysis were then selected and integrated into, either by persisting the gained information or by directly integrating the components and their configuration into an end-user targeting application that enables a user to navigate the European research landscape, by leveraging their own knowledge as a starting point. Specifically, this includes reflecting the above capabilities:

– Ranking of organisations, based on their activity in the European research landscape.
– Identifying communities of organisations based on how frequently they collaborate.
– Identifying similar organizations, projects and topics to identify future collaborations.

Overall, using the SmartDataLake toolkit made it easier for the user to navigate through the research landscape, while supporting more relevant suggestions, based on their similarity.