Science Success Story

Rapid ID of Potential Anti-COVID-19 Agents Powered by XSEDE

Bridges-AI identifies more than 20,000 compounds with possible anti-virus activity, thousands of times faster than earlier methods

By Ken Chiacchia, Pittsburgh Supercomputing Center


COVID protease with an inhibitor molecule (light blue) in the active site.

The COVID-19 pandemic has shown that speed can be as important as quality in medical research.

A team from Carnegie Mellon University has developed a new computational pipeline for greatly speeding up identification of possible anti-COVID candidates using artificial intelligence (AI) on the XSEDE-allocated Bridges-AI system at the Pittsburgh Supercomputing Center (PSC). They used this tool to screen about five billion chemical compounds to select a small number of candidates for combating the disease, thousands of times faster than possible with previous methods.

Why It's Important

The success, so far, of the rapid COVID-19 immunization effort serves to hammer home a lesson we'd all already learned. Namely, in some medical research scenarios, speed is every bit as important as accuracy.

Workflow that the AI uses to screen candidate anti-virus agents for COVID-19.

To help people who are infected before they can get vaccinated, those who medically can't be vaccinated and as a backstop for the vaccine, scientists are also still searching for medications that can disrupt the SARS-CoV-2 virus's life cycle. That effort, too, needs to be fast as well as good. One method for finding new COVID-19 drugs is to simulate the interactions of candidate molecules with the target proteins that the virus needs to infect people. This saves the prohibitive time and expense of lab-testing every candidate by allowing scientists to test only the most promising. But the standard method of simulating large proteins with candidate drugs depends on the complex rules of quantum chemistry. This takes enormous computing power. It typically requires weeks to test a library of molecules.

Olexandr Isayev of Carnegie Mellon University wondered whether it would be possible to use the power of AI to supercharge that search. Working with colleagues at the University of North Carolina Chapel Hill, where he began the effort, and the University of Florida, he turned to XSEDE to make it work.

As computational chemists, when COVID happened we tried to think of what we could do to help. One idea, and what we've been doing in the past few years, is using AI for drug discovery. The traditional way is mostly a physics-based method to predict the binding between small molecules and proteins. And this is relatively slow. It still takes hours per compound. You use extremely large machines and it takes you days or weeks to test a library of compounds; so, you have a limited throughput.— Olexandr Isayev, Carnegie Mellon University

How XSEDE Helped

A flavor of AI called deep neural networks (DNN) has been incredibly successful in many applications. DNNs running on graphics processing units, or GPUs, have fueled a revolution in the ability of AI to recognize objects in images, read texts, etc. But the quantum chemistry of large molecules relies on a lot more than what they look like. The information needed is much more complicated than a simple image. Work by other scientists has created quantum-based neural network potentials (NNPs) that can make accurate predictions for specific combinations of molecules. These tools are tens of thousands of times faster than classical quantum computations. But they have no ability to generalize. Present them with another set of molecules, and they'd be nearly useless.

Computational approaches are widely used to generate 3D models and conformers of molecules used in pharmaceutical research. Common applications include conformational analysis for selected compounds, 3D-QSAR, lead optimization and ligand-based virtual screening (LBVS) or receptor structure-based virtual screening (SBVS). A crucial component is preparation of electronic libraries for virtual screening since the outcome depends on the quality of the compounds and their conformers. The quality of the computed conformers, their geometries and conformational energetics have been ongoing concerns, especially in the contests of unreliability and poor quality of existing methods.— Olexandr Isayev, Carnegie Mellon University

Isayev and his colleagues had an idea to simulate the molecules in a way that avoided the massive quantum computations. Their plan would speed the computation by offloading the complexity onto a large database of molecular characteristics. It would require both the speed of GPUs and massive data handling capability. The Bridges-AI system at PSC was perfect for the work. For one thing, it was designed from the outset for leading-edge speed in GPU-based AI training. It also offers Big Data capabilities through its confederation with the larger Bridges platform. The scientists gained access to Bridges-AI through an allocation from XSEDE via the COVID-19 HPC Consortium.

Bridges-AI has these state-of-the-art NVIDIA Tesla V100 chips. Those are critical. Also, in terms of data passing, the architecture of Bridges-AI is perfect. We interacted with Shawn Brown, PSC's Director, and a couple of admins at PSC; the technical challenge was to orchestrate these workflows. The fantastic relationship we have with [XSEDE support staff] was extremely important to the success of this project.—Olexandr Isayev, Carnegie Mellon University

The scientists used Bridges-AI to train and test their tool, called ANI (short for ANAKIN-ME, Accurate NurAl networK engINe for Molecular Energies).  First, a large library of molecules will be parameterized using AI-accelerated quantum mechanical methods to prepare it for subsequent structure-based automated virtual screening.  Then the  second AI model is trained how to predict interactions between drugs known to be effective against SARS-CoV-2 with three target proteins—two from the virus, one from human cells. By trial and error, this pipeline is pruned connections between "layers" of neurons inside the neural networks reacting to specific characteristics of the molecules until it reproduces the known interactions. Then the team tested the model against another set of known drugs, this time without the model "knowing" the answers ahead of time. Going back and forth between training and testing, the scientists honed the program's ability to predict interactions accurately.

In a final computational step, the team used their AI on several databases containing about five billion chemical compounds, including antiviral compounds and FDA-approved or investigational drugs. In just a day, ANI narrowed the field by predicting which were most likely to interfere with the target proteins in a way that would block infection. Top hits from the screening will be used for experimental validation in the partner labs. 

Isayev and his colleagues made datasets of structures and properties of the most promising anti-COVID agents—20,000 antiviral compounds and FDA approved drugs—freely available to the research community here. They also entered in the European Union's COVID Challenge project here. In the next step of that competition, several groups of lab scientists are working on synthesizing and testing the compounds in virus-related tests. The work was also recognized by an Editors' Choice Award from  HPCWire, a leading publication in the high-performance computing field, for "Best Use of High-Performance Data Analytics & Artificial Intelligence" during the virtual 2020 International Conference for High Performance Computing, Networking, Storage and Analysis (SC20).

Isayev's team is also preparing a paper on the work for submission to a peer-reviewed journal. You can read an earlier paper on their development of ANI here and  here.

One major advantage of AI methods is their flexibility in solving other chemical problems. The team intends to apply the AI pipeline to other medical and industrial processes unrelated to the SARS-CoV-2 virus in the future.


At a Glance:

  • COVID-19 has shown that speed can be as important as quality in medical research.

  • A team from Carnegie Mellon University has developed a new computational pipeline for greatly speeding up identification of possible anti-SARS-CoV-2 virus candidates using artificial intelligence.

  • Using the XSEDE-allocated Bridges-AI system, they screened about five billion chemical compounds to select a small number of candidates for combating the disease, thousands of times faster than possible with previous methods.