Science Success Story

« Back

RNA Atlas Assembles Comprehensive Knowledge on Human Transcriptome

XSEDE resources help identify, validate new RNA molecules for the study of disease

By Molly Chiu, Baylor College of Medicine / Faith Singer, TACC

 

Pavel Sumazin, Associate Professor in pediatrics–oncology, Baylor College of Medicine; Member of the Dan L Duncan Comprehensive Cancer Center.

Researchers at Ghent University, Amsterdam University of Medicine, National Chiao Tung University, UNSW Sydney, Illumina, and the Baylor College of Medicine have built one of the most comprehensive catalogs of the human transcriptome ever. By combining complementary sequencing techniques, they have deepened our understanding of the function of known ribonucleic acid (RNA) molecules and discovered thousands of new RNAs.

Why It's Important

RNAs are a nucleic acid present in all living cells. Their principal role is to act as a messenger carrying instructions from DNA for controlling the synthesis of proteins.

Their research, published in Nature Biotechnology in June 2021, is the result of more than five years of work to further unravel the complexity of the human transcriptome. A better understanding of the human transcriptome is essential to study disease processes and uncover novel genes that may serve as therapeutic targets or biomarkers.

How XSEDE Helped

The researchers relied on Extreme Science and Engineering Discovery Environment (XSEDE) resources to prove that these genes play a role in cells and tissues and are not merely byproducts of other cellular processes.

"Over the past three years we've received generous allocations of computing time on the XSEDE-allocated Stampede2 supercomputer," said Pavel Sumazin, an associate professor in pediatrics–oncology at Baylor College of Medicine and member of the Dan L Duncan Comprehensive Cancer Center. "We used Stampede2 to predict the function of thousands of genes that were never before identified. The validation of these predictions verified that these genes—including thousands of uncharacterized single-exon long non-coding RNAs, which were previously categorized as junk RNAs—are important regulators of key pathways in multiple human cells and tissues."

 "This analysis was computationally intensive because it required computing distance and delta distance correlations for many billions of gene pairs and triplets, respectively, including the creation of null distributions to evaluate significance," Sumazin said.

Over the past three years we've received generous allocations of computing time on the XSEDE-allocated Stampede2 supercomputer. We used Stampede2 to predict the function of thousands of genes that were never before identified. The validation of these predictions verified that these genes—including thousands of uncharacterized single-exon long non-coding RNAs, which were previously categorized as junk RNAs—are important regulators of key pathways in multiple human cells and tissues.—Pavel Sumazin, Associate Professor in Pediatrics–Oncology at Baylor College of Medicine

The transcriptome is the sum of all RNA molecules that are transcribed from the DNA strands that make up our genome. However, there is not a one-for-one relationship. Firstly, each cell and tissue have unique transcriptomes, with varying RNA production and compositions, including tissue-specific RNAs. Secondly, not all RNAs are transcribed from typical, protein coding genes that eventually produce proteins. Many of our RNA molecules are not used as a template to build proteins. They originate from what once was called junk DNA, or long sequences of DNA with unknown functions.

These non-coding RNAs (ncRNAs) come in all kinds of shapes and sizes: short, long, and even circular RNAs. Many of them even lack the tail of adenine molecules that is typical for protein-coding RNAs.

"There have been other projects to catalog our transcriptome but the RNA-Atlas project is unique because of the applied sequencing methods," said Pieter Mestdagh, professor at the Center for Medical Genetics at Ghent University. "Not only did we look at the transcriptome of as many as 300 human cell and tissue types but, most importantly, we did so with three complementary sequencing technologies, one aimed at small RNAs, one aimed at polyadenylated (polyA) RNAs, and a technique called total RNA sequencing."

This last sequencing technology led to the discovery of thousands of novel non-coding RNA genes, including a novel class of non-polyadenylated single-exon genes and many new circular RNAs. By combining and comparing the results of the different sequencing methods, the researchers were able to define for every measured RNA transcript, the abundance in the different cells and tissues, whether it has a polyA-tail or not (it appears that for some genes this can differ from cell type to cell type), and whether it is linear or circular.

TACC's Stampede2 supercomputer is the flagship system of the NSF-funded Extreme Science and Engineering Discovery Environment (XSEDE). Stampede2 is listed third among academic systems in the U.S. on the Top500 list (June 2021).

Moreover, the consortium searched and found important clues in determining the function of some of the ncRNAs. By looking at the abundance of different RNAs in different cell types they found correlations that indicate regulatory functions and could determine whether this regulation happens on the transcription level (by preventing or stimulating transcription of protein-coding genes) or post-transcriptional (e.g. by breaking down RNAs).

"By combining all data in one comprehensive catalog, we have created a new valuable resource for biomedical scientists around the world studying disease processes," Sumazin said. "The age of RNA therapeutics is swiftly rising – we've all witnessed the impressive creation of RNA vaccines, and already the first medicines that target RNA are used in the clinic. I'm sure we'll see lots more of these therapies in the next years and decades."

The XSEDE allocation for this research is MCB180203.

 

 

At a Glance:

  • Researchers internationally have built one of the most comprehensive catalogs of the human transcriptome ever.
  • By combining complementary sequencing techniques, they have deepened our understanding of the function of known ribonucleic acid (RNA) molecules and discovered thousands of new RNAs.
  • Their research, published in Nature Biotechnology, is the result of more than five years of work to further unravel the complexity of the human transcriptome to study disease processes and uncover novel genes that may serve as therapeutic targets or biomarkers.
  • The researchers relied on XSEDE resources to prove that these genes play a role in cells and tissues and are not merely byproducts of other cellular processes.