SimSearch v0.2

A new release (version 0.2) of the SimSearch component is available on our GitHub repository. SimSearch is a Java library providing functions for combined similarity search against multi-faceted entities, i.e., datasets with different types of attributes (textual, numerical, spatial and temporal). The queries enable multi-attribute similarity search for data exploration, involving appropriate similarity measures per attribute (e.g., Jaccard, Euclidean, etc.). This library builds appropriate indices for each specific attribute type, which are used at query time to retrieve the k-nearest neighbors for each criterion. These results are then aggregated to obtain the final top-k matches. Attribute values may come from diverse data sources, including CSV files, tables in a PostgreSQL database or JSON data hosted in Elasticsearch. SimSearch can be deployed either as a standalone Java application or as a RESTful web service. More information can be found in this related publication. Below is a video presenting a brief demonstration of this component.

Back