Science Stories

Discover science successes with XSEDE. Read feature stories where XSEDE has helped researchers in their work.

Key Points
See how XSEDE supports science
Contact Information
« Back

De Novo Assembly of Lucina pectinata Genome using Ion Torrent Reads

De Novo Assembly of Lucina pectinata Genome using Ion Torrent Reads

Ingrid M. Montes-Rodríguez, Comprehensive Cancer Center, University of Puerto Rico

While many shellfish genomes have recently been assembled with an eye toward aquaculture and human food sources, the clam Lucina pectinata holds scientific interest for a very different reason. Native to sulfide-rich mud flats, the organism has formed a symbiotic relationship with microbes that both protect the clam against the toxicity of hydrogen sulfide and require that chemical for their normal metabolism. As part of its co-evolution with its endosymbiont, L. pectinata has developed a version of hemoglobin, Hb-I, that has preferential affinity for H2S rather than oxygen. Unfortunately, to date the first exon of the Hb-I gene has not been successfully assembled. In an attempt to obtain the full sequence of Hb-I and identify other adaptations to the mollusk's unique environment, Ingrid M. Montes-Rodríguez and colleagues at several campuses of the University of Puerto Rico as well as XSEDE Extended Collaborative Support Service experts at the Pittsburgh Supercomputing Center have used the XSEDE-allocated Blacklight and Bridges systems at PSC to conduct de novo assemblies of L. pectinata whole-genome DNA sequences generated with the Ion Torrent next-generation sequencing technology. The group investigated the ability of both the MIRA4 and SPAdes assembly tools in accomplishing this task. In addition, they tested the ability of the distantly related Lottia gigantea (giant owl limpet) to serve as a reference genome to guide the assembly. While the 12.4 GB of data and 80 million "reads" — DNA fragments to be assembled into the whole genome — initially generated by the sequencer were not particularly large by whole-genome-assembly standards, the large amount of repetitive DNA proved a challenge to both assemblers. Overall, the MIRA assembler (employing the now-retired Blacklight) performed better than did SPAdes (employing the 3-TB "large memory" nodes of Bridges), assembling more bases (1.3 versus 0.5 million) and identifying more exons (73,521 vs. 31,927) and mRNA species (65,984 vs. 27,302). However, SPAdes produced larger "scaffolds"—successfully assembled sequences of DNA—than did MIRA (1.1 million DNA bases vs. 0.1 million bases in scaffolds longer than 10,000 bases, respectively) and each method produced unique information not acquired by the other (3,009 unique matches to the L. gigantea proteome for MIRA and 1,671 for SPAdes). Unfortunately, neither method captured exon 1 of Hb-I. The group concluded that the Ion technology poses unique challenges in eukaryotic assembly and that the use of multiple assemblers may be advisable as a routine. They reported their results in a peer-reviewed paper at the PEARC17 conference in New Orleans in July 2017.

Figure 1: Representation of the genes identified in the L. pectinata assembly by either SPAdes (blue) or MIRA (red), classified by their predicted role in cellular function: A, cellular components; B, molecular functions; or C, biological processes. Note MIRA consistently outperformed SPAdes in terms of number of annotations (landmarks in the genome identified).