The overall goal of SmartDataLake is to design, develop and evaluate novel approaches and techniques for extreme-scale analytics over Big Data Lakes, facilitating the journey from raw data to actionable insights.
In a nutshell, SmartDataLake will pursue the following objectives:
- Objective 1 : Virtualized and Adaptive Data Access – We will facilitate efficient direct data accesses on heterogeneous data; enable cross-format query optimizations; and accommodate data types and workloads that are not known a priori or change over time.
- Objective 2 : Automated and Adaptive Data Storage Tiering – To reduce hardware costs and operational expenses, we will offer techniques that utilize different storage tiers; design a transparent multi-tier data storage layer; and construct and store different representations of the same data to different tiers.
- Objective 3 : Smart Data Discovery, Exploration and Mining – To support and facilitate the user in exploring and making sense of the Data Lake’s contents, we will enable an entity-centric view and organization of the data, and will provide a machinery of mining operations over Heterogeneous Information Networks.
- Objective 4 : Monitor Changes and Assess their Impact – We will treat the Data Lake as a live data ecosystem, and we will enable the detection and monitoring of changes, as well as the incremental update of the results of the analyses.
- Objective 5 : Support the Data Scientist in the Loop – To leverage the data scientist’s intuition and domain knowledge, we will provide scalable and interactive visualizations for different types of data (spatial, temporal, graph) and we will enable filtering, aggregating, ranking and summarizing information in multiple dimensions.