Tsinghua University recently hosted the finals of the first-ever Global AI Drug R&D Algorithm Competition, an esteemed event that garnered interest from both academia and industry. Following intense competition among 878 teams from universities, research institutions, and companies worldwide, the collaborative team of IceKredit and Nanjing University stood out, securing the prestigious third-place prize.
This groundbreaking competition, jointly sponsored by Baidu PaddlePaddle, the School of Pharmacy at Tsinghua University, and Lingang Laboratory, received strong support from the Chinese Pharmaceutical Association and other key organizations. A panel of renowned experts and scholars in the biopharmaceutical field contributed their expertise, acting as the competition’s distinguished jury.
The competition garnered participation from 1105 individuals within 878 teams worldwide, amassing a total of 6080 algorithmic submissions. IceKredit’s collaboration with Nanjing University yielded a formidable contender. Following an intense three-month preliminary and semi-final phase, the partnership found itself locked in combat with 14 exceptional teams hailing from luminaries such as Microsoft Research Asia, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai Jiaotong University, Zhejiang University, and Xi’an Jiaotong University during the finals. The ultimate showdown encompassed on-site defenses and in-depth discussions covering competition question strategies, core theories, data analytics and manipulation, and algorithmic solutions. Out of this crucible, the IceKredit-Nanjing University alliance clinched the third spot with distinction.
The triumphant collaborative effort between IceKredit and the Christopher J. Butch research group at Nanjing University began with the preliminary rounds. Here, the team engaged a diverse array of conventional machine learning algorithms such as Bayesian docking, SVM, LightGBM, GBDT, along with advanced deep neural network models including Transformer-CNN, GCN, and D-MPNN. The team tried several different molecular representations, including 3D molecular conformation data, graph features, and molecular characterization methodologies like Morgan Fingerprinting, to anticipate enzyme activity. Notably, the SVM model trained with Morgan fingerprinting emerged as a standout performer, effectively predicting enzyme activity.
In the semi-final round, participants grappled with the complex task of predicting molecular activity in Caco2 cell experiments. Rising to the challenge, the IceKredit-Nanjing University consortium pioneered a method of feature fusion, artfully enhancing the GEM baseline model. A new MFP encoder structure was introduced, with a focus on imbuing graph features and global attention to molecular structure. This was combined with local structural information within molecules, courtesy of Morgan fingerprints. The result was a more holistic molecular data representation, augmenting the model’s predictive capacity and classification prowess. This innovative approach mitigated overfitting risks, elevating overall model generalization. The team also introduced dropout mechanisms and strategically divided the training and validation sets based on molecular scaffold, ensuring the model’s performance was rigorously tested against novel molecules.
Since March 2022, IceKredit and Nanjing University have embarked on a collaborative journey, delving into AI applications within the medical domain. Their efforts focus on computer-aided drug molecular design methodologies, blending artificial intelligence, molecular dynamics simulation, and computational biology with traditional chemistry and biology laboratories. The outcome is an accelerated discovery process for potential drug molecules.
In the span of a mere year, this partnership has yielded impressive outcomes. Apart from securing honors in the competition, the collaborative team recently published a groundbreaking SCI paper titled “Improving Drug Discovery with a Hybrid Deep Generative Model Using Reinforcement Learning Trained on a Bayesian Docking Approximation.” This novel drug discovery method, a hybrid of deep generative models and reinforcement learning, has demonstrated remarkable potential. Utilizing approximate docking scores predicted by a Bayesian regression model, the method generates new compounds that outperform docking scores of similarly sized molecules by 10-20%, all while expediting the process 130 times faster than conventional docking methods. The innovative approach holds the promise of efficiently uncovering novel chemical molecular structures with potential therapeutic applications.