Science Success Story

Dynamic Domains in the Spike Protein of SARS-CoV-2

 Using XSEDE supercomputing resources, researchers identify unique regions of the spike protein key to understanding variants


By Faith Singer, Texas Advanced Computing Center (TACC)


Figure 1: (A) Spike protein structure with generated movements highlighted by gray arrows. The spike protein is colored according to different functional units (B).

For many people, COVID-19 has highlighted the role of science in fighting disease around the world. From a young scientist's perspective, the thrill of figuring out something that no one has done before and producing answers that are very needed in society is an unmatched experience.

"Researching biomedical systems can be very complex and is always exciting. But to know that your results may have a direct impact on a major world issue makes this work even more thrilling and immensely gratifying," said Genevieve Kunkel, a graduate student at the Tarakanova Lab at the University of Connecticut (UConn). "I'm interested in using traditional, fundamental principles from physics and engineering to understand diseases."

In Summer 2020, Kunkel, and her colleague Mohammed Madani, another graduate student in the lab, started gathering information on the SARS-CoV-2 virus under the direction of Professor Anna Tarakanova. Like everyone worldwide, they wanted to understand the virus as the new variants began to emerge – Alpha, Delta, Omicron.

The graduate students are co-authors of a recent paper in the Biophysical Journal published in November 2021. The other co-authors are Simon J. White, Paulo H. Verardi, and Anna Tarakanova. The research required a lot of supercomputing power, which the team relied on from the NSF-funded Extreme Science and Engineering Discovery Environment (XSEDE).

"We were trying to find a way of resolving the protein's dynamics in a fast, targeted way – gaining a better understanding of spike protein mutations that can be applied to more aggressive variants," Kunkel said. "That's how we arrived at the process of ‘normal mode analysis,' which is the method we used to resolve protein dynamics, that in turn allows us to identify dynamic domains – regions key to spike protein function. We also looked at resolving thermal stability and protein longevity. If we learn how to control these factors, we can provide insights for future vaccine design."

Graduate student Genevieve Kunkel (left) and Professor Anna Tarakanova (right) of Tarakanova Lab at the University of Connecticut (UConn).

"To evaluate the thermal stability of spike protein mutations in a fast and accurate manner, we also built a machine learning-based tool using the XSEDE-allocated Stampede2 supercomputer to train our model," Madani said. "Access to the supercomputer was critical for our work. These types of simulations would not be possible on our local machines."

Why It's Important 

The findings in this study allow the researchers to make recommendations about the design of future SARS-CoV-2 spike protein variants for effective immunogens that trigger neutralizing antibodies to hinder virus activity. The integrated computational approach they used can be applied to optimize vaccine design and predict the antibody responses by SARS-CoV-2 variants.

The researchers studied key regions associated with specific dynamic mechanisms – such as the movement of the receptor-binding domain (RBD), a key part of the virus located on its spike protein that allows it to dock to receptors of cells in the human body to enter these cells and initiate infection.

"The mechanisms we saw from the combined research of the project offered some insights into what types of mutations may be able to stabilize or destabilize certain regions of the spike protein to alter RBD motions so antibodies can recognize it," Kunkel said. "This is important for identifying disease mechanisms in later variants or for vaccine design."

The region of the coronavirus spike that sits outside the viral membrane and is susceptible to being recognized by the immune system. The wiggly part (upper right) is the receptor binding domain – this is the key region that interacts with receptors on the surface of our cells. It's also an important antibody target. The position of this domain is important both for antibodies to be able to recognize it (this wiggle appears to be enough to confuse the immune system) and for the virus to bind to cells and invade them. Credit: Tarakanova Lab/UConn

In addition, the methods used in this study – a combination of normal mode analysis (an approach to extract the most biologically relevant motions experienced by the molecules) and dynamic domain analysis – look at large numbers of different variants at one time.

"This was key to the research," Kunkel said. "It's useful for the continual development of treatments because it allows researchers to quickly resolve and compare the different movements of many different spike protein variants, which is more essential now that we're dealing with many variants all over the world."

How XSEDE Helped

The researchers used XSEDE allocations on TACC's  Stampede2 supercomputer and the center's Ranch data storage system for this study.

"When you're looking at 10-20 proteins or more, it's better to employ a supercomputer to accelerate the simulations," Kunkel said. "Another component of the work, the thermal stability predictor, was developed using Stampede2 – this is a machine-learning predictor, and we needed many core hours of computational power to train the model."

The Ranch storage system was used to archive each protein they studied.

Anna Tarakanova, their professor, said: "I started using XSEDE about 10 years ago as a graduate student at MIT. I've been using XSEDE continuously, first on my own, then with my students. Most of the work I do wouldn't be possible without XSEDE. It's been a hugely useful resource."

Some of the key results from the study are as follows:

  • By comparing the dynamic signatures of different spike proteins, the researchers unearthed key differences between spike protein variants (both naturally-occurring variants and engineered proteins used in vaccine design research). In identifying mutational effects on key functional regions, they began to understand how this research could be used to help create customized spike proteins for future immunogen design.
  • The researchers developed a comprehensive antigen map of the human body's immune response, linking how dynamic signatures of different spike protein variants coincide with key antibody binding regions. This will help scientists understand how effective a protein variant may be at neutralizing the virus. Up until this body of research, there were no resources with comprehensive antigenic binding information parsed out based on protein dynamics.

Tarakanova concluded: "The computational methods we used are transferable and can be applied more broadly – not just to SARS-CoV-2, but to any other types of viruses that may arise in the future."

This work was supported by the University of Connecticut Office of the Vice President for Research COVID-19 Rapid Start Funding Program. This work utilized the Extreme Science and Engineering Discovery Environment (XSEDE), supported by National Science Foundation grant number ACI-1548562. XSEDE resources Stampede 2 and Ranch at the Texas Advanced Computing Center were used through allocation TG-MCB180008.

At A Glance


  • Using XSEDE supercomputing resources, researchers from the University of Connecticut have identified unique regions of the spike protein key to understanding variants.
  • The study was published in Biophyiscal Journal in November 2021.
  • The researchers used two primary methods – normal mode analysis (extracting the most biologically relevant motions experienced by the molecules) and dynamic domain analysis (looking at large numbers of different variants at one time).
  • The findings will allow researchers to make recommendations about the design of future SARS-CoV-2 spike protein variants for effective immunogens that trigger neutralizing antibodies to hinder virus activity.
  • The researchers used XSEDE allocations on TACC's Stampede2 supercomputer and the Ranch data storage system for this study.