Science Success Story

« Back

MIT Researchers Use Machine Learning to Advance Computational Chemistry

National Science Foundation Supercomputing Resources Simulate Complex Models

By Kimberly Mann Bruch, SDSC Communications

 

Multi-reference character of 3,165 structures as evaluated by two of the 15 diagnostics used by experts in the field, nHOMO[MP2] (top left) and C02 (top right). Bottom panels show all 15 diagnostics displayed using the uniform manifold approximation and projection (UMAP), with the bottom/top 10% for the two metrics shown as solid blue/red circles. This machine learning approach makes it possible to predict multi-reference character and determine whether computationally inexpensive techniques such as density functional theory (DFT) are sufficient. Credit: Kulik et al, MIT.

Even though computational chemistry represents a challenging arena for machine learning, a team of researchers from the Massachusetts Institute of Technology (MIT) may have made it easier.

Thanks to allocations from the National Science Foundation's (NSF) Extreme Science and Engineering Discovery Environment (XSEDE), they succeeded in developing an artificial intelligence (AI) approach to detect electron correlation – the interaction between a system's electrons – which is vital but expensive to calculate in quantum chemistry.

AI-based methods, however, show promise in making electron correlation detection much more tractable while improving the throughput, or number of materials that can be analyzed, of such computations.

Using Comet at the San Diego Supercomputer Center at UC San Diego and Bridges at the Pittsburgh Supercomputing Center, Professor Heather Kulik and her MIT colleagues developed several unique artificial neural network models, which are published in the Journal of Chemical Theory and Computation and the Journal of Physical Chemistry Letters. These simulations could help advance an array of new materials with predictive modeling.

"In these two papers, we first developed supervised models to predict high-quality, high-cost diagnostics of strong correlation at low computational cost," said Kulik, a computational chemist and chemical engineering professor at MIT. "We overcame the fact that diagnostics seldom agree to build a consensus-based classifier model, so we used various low- and predicted high-cost as inputs to the virtual adversarial training of an artificial neural network model in what we believe to be the first semi-supervised learning model applied to computational chemistry." 

The simulations showed how certain strong correlations could be present in some, but not other, molecules typically explored during high-throughput screening of materials. This allowed the researchers to identify when more affordable computational models would be predictive.

Using machine learning, the models were able to make the predictions for strong correlation in the materials at a much lower computational cost than conventional methods, potentially accelerating the search for materials in a range of applications, such as finding drug-like compounds for treating diseases or new materials for improving batteries.

"This type of machine learning model is uniquely suited to this multi-stage approach because it is robust and stands up to noisy/erroneous inputs," further explained Fang Liu, an NSF Molecular Sciences Software Institute fellow who was co-author on both papers. "We used a great deal of theoretical chemistry codes to conduct our studies and that would not have been possible without Comet and Bridges."

Using XSEDE supercomputers firsthand allowed me to think about ways I can teach students who may just be learning computational chemistry to complement their experimental research for ways that they can use not only now but in the future. - Heather Kulik, a computational chemist and chemical engineering professor at MIT.

The team's workflow, MultirefPredict, interfaced with at least three electronic structure codes and used both central processing units (CPUs) and graphics processing units (GPUs) on Comet and Bridges.

"Due to our complex requirements, having resources where we could set up workflows to run in an interoperable manner with different codes was very helpful for us," said Kulik, who also teaches a course on XSEDE resources. "Using XSEDE supercomputers firsthand allowed me to think about ways I can teach students who may just be learning computational chemistry to complement their experimental research for ways that they can use not only now but in the future." 

This research was primarily supported by the U.S. Department of Energy (DE-SC0018096). Access to Comet and Bridges was provided by XSEDE (TG-CHE140073).

At a Glance