Approximate Query Processing

The Query Approximation Layer (QAL), developed by the Eindhoven University of Technology, allows users to get approximate results with error guarantees for SQL aggregation queries. QAL introduces a novel adaptive approximate processing engine that constructs the synopses which maximize the future throughput. More specifically, QAL proposes multiple approximate plans for an individual SQL query, and then chooses and executes the plan whose synopses will be mostly used in future queries, and stores the generated synopses in a warehouse. The three main modules of QAL are the query planner for generating the logical and physical plans, the query evaluator for choosing between the plans, and the synopsis warehouse for storing the synopses. Unlike the other state-of-the-art AQP engines that only utilize samples, QAL leverages sketches (e.g., Count-min sketch with dyadic ranges) so that it can propose various execution plans for the approximate queries.

The synopses are defined as a physical operator, and they are injected into the execution plan. QAL pushes down the synopses in the execution plan that not only decreases the number of records to be calculated but also increases the reusability of synopses. The efficiency of QAL heavily depends on the quality of predicted future queries. To address this problem, QAL uses two different approaches: (a) a window-based query prediction and (b) an ML predictive model. The second approach is a novel technique that forecasts the potential future SQL queries. Our preliminary results demonstrate a significantly better performance of the proposed adaptive AQP compared to state-of-the-art AQP.

QAL is implemented in Scala v.2.11.12, and it runs on top of Apache Spark v.2.4.3. The source code is available in SmartDataLake Github repository under the directory named QAL. All functionality is under a single project and has been built and tested using Java JDK v.1.8 and SBT v.1.3.13. The final component is available as a Docker image hosting Spark instance and as a stand-alone jar file.

Back