Link Prediction with PathLearn

PathLearn is an open-source component for performing link prediction in Heterogeneous Information Networks (HINs). PathLearn predicts links by modeling the effect of every path that exists between pairs of nodes. Instead of assigning a discrete type to each path, PathLearn models each path individually, assigning to it a continuous value that quantifies its effect on the formation of a link. Given a pair of nodes, PathLearn computes a total score for the probability that an edge exists between them. To derive this score, it finds all paths between the nodes, up to a given length, and, for each path, it computes a value as a function of all the types and features of the path’s nodes and edges. The values of all the paths are then aggregated into a single score for the probability that an edge exists between the two nodes, which can be used to find pairs of nodes that are likely to be connected with an edge.

PathLearn is implemented in Python, using PyTorch and NetworkX. The source code is available on GitHub. The implementation supports four functions that are required for using the model, namely preprocessing, training, testing and prediction. It also includes both a REST API and a graphical user interface.