Histogram-Based Gradient Boosting Trees for Efficient Graph Learning with Wasserstein Embeddings

[Paper] [GitHub]

Molecular property prediction is a task faced in a variety of high-impact medical and biochemical fields, from drug discovery to the development of diagnostic screening tools to the synthesis of biologics. Though molecular property prediction has traditionally been a difficult task for machine learning models, recent advances in graph-based techniques have yielded significant performance increases. This paper contributes to this growing field at the intersection of biology, chemistry, and deep learning by offering an improved pipeline for predicting whether a given molecule is able to inhibit the replication of HIV. We build on previous work that utilizes Wasserstein graph embeddings and optimize an end-to-end pipeline to achieve near state-of-the-art performance on the Open Graph Benchmark’s ogbg-molhiv dataset using a fraction of the model parameters as the current best model.