Visual Analytics in SmartDataLake

The visual analytics layer of SmartDataLake comprises two main components, namely the Visual Analytics Engine and the Visual Explorer.

The Visual Analytics Engine interfaces directly with the data virtualization and mining components of the SmartDataLake toolkit, preparing their results for visualization and parameter tuning. By exposing its services via a REST API, the Visual Analytics Engine provides the foundation for the included visual analytics applications. Moreover, the Visual Analytics Engine services can be used as a convenient way to access different components of the SmartDataLake toolkit through a common interface, e.g., to build customized visualizations or visual analytics solutions. The REST API is documented following the OpenAPI specification and can be accessed in the official GitHub repository of the project.

The Visual Explorer provides a graphical user interface that bundles multiple visual analytics applications in the context of SmartDataLake, enabling knowledge generation on the data and algorithms of SmartDataLake. Thereby, the Visual Explorer supports various relevant tasks when exploring large, heterogeneous data foundations, such as data profiling, descriptive analytics, similarity search, hierarchical graph exploration, or entity resolution. Its modular interface, structured by tabs that follow a typical analysis workflow, allows an easy extension for future exploitation. Incorporating external or tightly-integrated visual analytics applications is effortlessly possible in the provided framework.

One of the main supported functionalities is data profiling, which is typically the first step when dealing with a new dataset. We support this task in the SmartDataLake visual analytics layer with the V-Plots application (short video presentation available here). A combined chart, called a “V-Plot,” provides a task-driven solution for descriptive analysis over numerical data columns. An interactive guide allows the user to enter her preferences and goals for the analysis. The V-Plots component translates those into a selection of suitable chart types to reach the analysis goals. The most suitable chart types are then automatically combined into a V-Plot. In SmartDataLake, we have extended V-Plots through different database connections, allowing the user to inspect schema information and effortlessly add numerical columns to the V-Plot.

Moreover, building upon the similarity search functionality of SmartDataLake, we provide a visual analytics application tackling the challenge of real-time parameter optimization on time-consuming offline algorithms. Our application enables users to specify and refine search parameters in an interactive exploration loop. The search algorithm results are visualized such that relationships in the data and clusters become evident. To speed up the exploration and enable real-time exploration, we support the user by overlaying speculative results, indicating the approximate outcome of an action before its actual execution.