B2B Portfolio recommendation is one of the pilot use cases in SmartDataLake, led by SpazioDati. Given the list of clients of a company, the objective of business-to-business (B2B) portfolio recommendation is to find other potential clients (called “leads”) for the company.
Being a data company with a focus on collecting and enriching many datasets about Italian companies, from both official and unofficial sources, SpazioDati has developed a novel approach to this task, which involves: analyzing the companies that make up the current client base to understand their profiles and features; based on such analysis, creating and tuning a Machine Learning model that suggests new leads. The data scientists and ML engineers that perform portfolio recommendation follow a multi-step process, which starts from a list of companies (the current client base) as input, and goes through data cleansing, entity matching, data analysis, feature exploration and engineering, model training and evaluation.
SmartDataLake provided several tools that fit nicely into this general workflow:
–Ingesting the input dataset and matching it with the existing companies dataset. By uploading the input file in RAW, it is possible to use its type inference functionality to get an idea of the data type of each column. The Visual Explorer can provide a high-level view of the input dataset. sHINER can assist with the matching of the input companies with those in the companies dataset.
–Analyzing the customers. A deeper analysis of the input companies is needed to identify which features to use in the ML model. All the SmartDataLake tools that can assist in giving insights into the data (or in making this process more efficient) are useful here: the Visual Explorer to get a first sense of the dataset; QAL to run several fast, explorative queries; RAW/Proteus to perform complex join queries.
–Feature exploration and feature engineering. The data scientist has to select the features of the model, run it, evaluate the results and tweak the features and the model until a satisfactory result is obtained. RAW/Proteus are once again handy in order to easily and efficiently query the data; while the similar companies suggested by SimSearch queries are helpful to enrich the positive and negative classes used to train the model.
Overall, SmartDataLake proved to be very useful, both for the benefits in terms of reduced time and reduced manual effort, and because it provided new tools (BRS, LOCI, SimSearch) which open new business opportunities also beyond the portfolio recommendation task.Back