XSEDE-Allocated Resources Power Population Genomics Study
Comet, Jetstream, Bridges Run Millions of Genomic Simulations Assisted by the Open Science Grid
|Three modeled demographic histories of the Ashkenazi Jewish (AJ) population as outlined in the Molecular Biology and Evolution journal. The Null Model has no substructure within the AJ population, and was found to be highly unlikely. The Substructure model has a population split between Eastern and Western Ashkenazi Jews and one common admixture event from Europeans; the researchers found this model to be the best fit in their study. The Substructure with Differential Admixture model was favored, showing separate admixture events from the Europeans. Populations are labeled as E (European); AJ, (Ashkenazi Jews); EAJ (Eastern Ashkenazi Jews); WAJ (Western Ashkenazi Jews); J (Sephardic/Mizrahi Jews); ME (Middle Eastern). Credit: A. Gladstein and M. Hammer, 2019|
By Kimberly Mann Bruch, SDSC Communications
Using multiple supercomputers allocated by the National Science Foundation's XSEDE (Extreme Science and Engineering Discovery Environment) program as well as the Open Science Grid, researchers at the University of Arizona recently published results of a detailed population genomics study that involved running millions of genomic simulations, to show a substantial breakthrough in the demographic history of Ashkenazi Jews.
The study, published in a recent issue of Molecular Biology and Evolution, focused on Ashkenazi Jewish subpopulations from Eastern and Western/Central Europe and what led to their genetic differentiation.
"Using computational capabilities provided by XSEDE, we were able to perform millions of genomic simulations at an unprecedented chromosome-size scale, allowing us to use large chunks of the genome to learn about the population's history," said Ariella Gladstein, first-author of the study. "The Ashenazi Jews have always been studied as one population and we showed that their two sub-populations correspond to cultural differences. Specifically, we learned the differentiation between the two groups is primarily concerned with population growth with the Eastern Ashkenazi."
|Ariella Gladstein, former PhD student in Ecology and Evolutionary Biology at the University of Arizona. Credit: University of Arizona University Information Technology Services|
Gladstein, currently at the University of North Carolina at Chapel Hill, along with her PhD advisor, Michael Hammer of the University of Arizona, developed models that encompassed population structure, population size changes, and gene flow. Their goal was to keep their models simple while providing novel insight.
Population Growth Differences
"Ashkenazi Jews are highly studied because they are more likely than the general population to carry genetic mutations that result in specific genetic diseases," said Hammer. "The genetic differentiation that we found in existence since 400 years ago could be attributed to more extreme population growth in the Eastern Ashkenazi Jews than the Western Ashkenazi Jews, and could have implications for medical genetics research in the Ashkenazim."
To run the simulations of their models, several XSEDE-allocated supercomputers and resources were used including Comet at the San Diego Supercomputer Center (SDSC) at UC San Diego, Jetstream at Indiana University, and Bridges at the Pittsburgh Supercomputing Center. Gladstein worked closely with affiliates of Research Bazaar Arizona, a group that helps researchers with computational problems, including Blake Joyce, assistant director of research computing at the University of Arizona; Julian Pistorius, a software engineer at the University of Arizona; and Mats Rynge, a computer scientist at SciTech at the University of Southern California Information Sciences Institute.
"The Arizona Research Bazaar helped me get my simulation code computationally feasible," said Gladstein. "Without their help in optimizing the code, we calculated it would have taken just over 18 years to run all the simulations."
Specifically, the team helped bridge together the simulations that were computed on Comet, Jetstream, and Bridges via the Open Science Grid (OSG), a multi-disciplinary research partnership and XSEDE resource funded by the NSF and the U.S. Department of Energy, to create the final models for the study.
According to Joyce, more than 273,000 core-hours were consumed on the three XSEDE-allocated supercomputers and almost 7.3 million via the OSG over a period of six months. In all, the two-year project consumed some 15 million core-hours including resources at the University of Arizona and the University of Wisconsin.
"We used an approach called Approximate Bayesian Computation (ABC) to determine the most likely scenario of Ashkenazi history," said Gladstein. "In ABC one essentially does lots of simulations of various models, compares the data generated by the simulations to real data, and uses the simulations that produce data most similar to the real data to create posterior distributions that can be used to choose the best model and infer its parameter values."
Going forward, the researchers hope to include more details in the demographic models and apply the method to other populations. Gladstein is currently working on using deep learning, trained on simulated data, to answer similar population genomics questions.
Researchers were also provided computing time and expertise from the University of Arizona's Research Data Center, the University of Wisconsin's Center for High Throughput Computing, and CyVerse, an NSF-funded project to design, deploy, and expand a national cyberinfrastructure for life sciences research.
Access to computational resources granted under XSEDE award BIO170028. XSEDE is supported by the National Science Foundation under award 1548562. The Open Science Grid is supported by NSF award 1148698 and the U.S. Department of Energy's Office of Science.