PIs in University of California, Davis active in the last 90 days
Allocations with low numbers of SUs (10,000 or less) are usually those used as educational allocations, or are given as startup allocations, or extensions. Allocations with less than 10 SUs are usually used for storage purposes.

Go back Choose a different time period.

Name Project Title Teragrid Resource Discipline Board Type Base Allocation
Hajar Amini Construction of De novo transcriptome assembly to identify candidate pathway involved in the production of medicinal compounds in Ferula assafoetida IU/TACC Jetstream Biological Sciences Startup 300,000
" " IU/TACC Storage (Jetstream Storage) " " 4,000
Sebastian Bender Soil biodiversity and ecosystem functioning in agricultural systems IU/TACC Jetstream Ecological Studies Startup 50,000
" " IU/TACC Storage (Jetstream Storage) " " 100
Alan Bennett Genomic Characterization of Plant Growth Promoting Microbes from a unique maize landrace IU/TACC Jetstream Biological Sciences Startup 50,000
" " IU/TACC Storage (Jetstream Storage) " " 2,000
C. Titus Brown UC Davis GGG 201b: lab section IU/TACC Jetstream Biological Sciences Educational 46,100
Nancy Chen Recombination rate variation in a wild population of Florida Scrub-Jays PSC Regular Memory (Bridges) Biological Sciences Startup 50,000
" " PSC Large Memory Nodes (Bridges Large) " " 1,000
" " PSC Storage (Bridges Pylon) " " 500
Lisa Cohen Application for Jetstream startup allocations for running dammit de novo transcriptome annotation pipeline software IU/TACC Jetstream Biological Sciences Startup 100,000
Roland Faller Direct phase equilibrium simulation of NIPAM oligomers in water and optimization of the potential XStream/Stanford University GPU Supercomputer (Cray CS-Storm, Intel Ivy-Bridge, NVIDIA K80) Chemistry Startup 5,000
Melissa Kardish The role of microbiota in mediating local adaptation and plant influence on ecosystem function in a marine foundation species, Zostera marina SDSC Dell Cluster with Intel Haswell Processors (Comet) Ecological Studies Startup 50,000
" " SDSC Medium-term disk storage (Data Oasis) " " 500
Louise Kellogg CIG Science Gateway and Community Codes for the Geodynamics Community SDSC Dell Cluster with Intel Haswell Processors (Comet) Geophysics Research 500,000
" " TACC Dell/Intel Knights Landing, Skylake System (Stampede2) " " 85,608
" " SDSC Comet GPU Nodes (Comet GPU) " " 15,000
" " TACC Long-term tape Archival Storage (Ranch) " " 10,000
" " SDSC Medium-term disk storage (Data Oasis) " " 10,000
Yong Jae Lee Large-scale Video Object Detection PSC Storage (Bridges Pylon) Robotics and Machine Intelligence Startup 15,000
" " PSC Bridges GPU (Bridges GPU) " " 6,250
John Naliboff Testing the scalability and numerical efficiency of long-term tectonic models of continental extension SDSC Dell Cluster with Intel Haswell Processors (Comet) Geophysics Startup 50,000
" " SDSC Medium-term disk storage (Data Oasis) " " 500
" Numerical simulations of 3D fault system development during continental extension SDSC Dell Cluster with Intel Haswell Processors (Comet) " Research 1,000,000
" " SDSC Medium-term disk storage (Data Oasis) " " 500
Elias Oziolor Genomic mechanisms underlying the collapse and lack of recovery of Prince William Sound herring IU/TACC Jetstream Environmental Biology Startup 100,000
N Tessa Pierce Assessment of RNA editing over Doryteuthis opalescens development IU/TACC Jetstream Genetics and Nucleic Acids Startup 100,000
" " IU/TACC Storage (Jetstream Storage) " " 8,000
Yundi Quan surveying binary superconducting hydrides using ab-initio methods PSC Regular Memory (Bridges) Physics Startup 85,968
" " SDSC Dell Cluster with Intel Haswell Processors (Comet) " " 50,000
" " IU/TACC Jetstream " " 8,980
" " TACC Data Analytics System (Wrangler) " " 1,000
" " PSC Storage (Bridges Pylon) " " 1,000
" " PSC Large Memory Nodes (Bridges Large) " " 1,000
" " SDSC Medium-term disk storage (Data Oasis) " " 500
" " TACC Long-term tape Archival Storage (Ranch) " " 500
NAVNEET RAI Prediction of cellular state using deep neural networks Open Science Grid (OSG) Biological and Critical Systems Startup 102,050
" " PSC Regular Memory (Bridges) " " 50,000
" " SDSC Comet GPU Nodes (Comet GPU) " " 2,500
" " PSC Storage (Bridges Pylon) " " 1,000
" " SDSC Medium-term disk storage (Data Oasis) " " 1,000
Anandkumar Surendrarao : Genome assembly, annotation and characterization of Fusarium strains to understand their evolution in the context of chickpea pathogenicity TACC Dell/Intel Knights Landing, Skylake System (Stampede2) Biological Sciences Startup 1,600
" " TACC Long-term tape Archival Storage (Ranch) " " 500
Ilias Tagkopoulos Automatic knowledge base construction and hypothesis generation: antibiotic resistance mechanisms for Escherichia coli Open Science Grid (OSG) Biological Sciences Startup 200,000
" " SDSC Dell Cluster with Intel Haswell Processors (Comet) " " 50,000
" " SDSC Comet GPU Nodes (Comet GPU) " " 5,000
" " SDSC Medium-term disk storage (Data Oasis) " " 1,000
Dean Tantillo MECHANISMS OF BIOORGANIC AND ORGANOMETALLIC CYCLIZATION REACTIONS SDSC Dell Cluster with Intel Haswell Processors (Comet) Organic and Macromolecular Chemistry Research 1,910,603
" " TACC Dell/Intel Knights Landing, Skylake System (Stampede2) " " 48,979
" " SDSC Medium-term disk storage (Data Oasis) " " 500
" " TACC Long-term tape Archival Storage (Ranch) " " 500
Igor Vorobyov Elucidation of molecular mechanisms of sex-dependent pro-arrhythmia through hERG block by drugs and steroid hormones SDSC Dell Cluster with Intel Haswell Processors (Comet) Biophysics Startup 50,000
" " XStream/Stanford University GPU Supercomputer (Cray CS-Storm, Intel Ivy-Bridge, NVIDIA K80) " " 5,000
" " SDSC Comet GPU Nodes (Comet GPU) " " 2,500
" " SDSC Medium-term disk storage (Data Oasis) " " 1,000
" Atomistic simulations to elucidate molecular mechanisms of drug- and hormone-induced pro-arrhythmia proclivities SDSC Dell Cluster with Intel Haswell Processors (Comet) " Research 1,503,790
" " SDSC Comet GPU Nodes (Comet GPU) " " 154,733
" " TACC Dell/Intel Knights Landing, Skylake System (Stampede2) " " 26,038
" " SDSC Medium-term disk storage (Data Oasis) " " 22,500
" " TACC Long-term tape Archival Storage (Ranch) " " 500
Andrew Wetzel Simulating the Local Group TACC Dell/Intel Knights Landing, Skylake System (Stampede2) Astronomical Sciences Research 116,674
" " TACC Long-term tape Archival Storage (Ranch) " " 30,000
Vladimir Yarov-Yarovoy State-dependent drug modulation of sodium channels Open Science Grid (OSG) Biophysics Startup 200,000
" " PSC Regular Memory (Bridges) " " 50,000
" " LSU Cluster (superMIC) " " 50,000
" " XStream/Stanford University GPU Supercomputer (Cray CS-Storm, Intel Ivy-Bridge, NVIDIA K80) " " 5,000
" " PSC Bridges GPU (Bridges GPU) " " 2,500
" " PSC Storage (Bridges Pylon) " " 1,000
Close

Project Abstract

Construction of De novo transcriptome assembly to identify candidate pathway involved in the production of medicinal compounds in Ferula assafoetida

PI: Hajar Amini



Ferula assafoetida is an important source of oleo-gum-resins such as asafoetida, which is useful for therapeutic industries such as inflammations, neurological disorders, digestive disorders, rheumatism, neurological disorders, headache, arthritis and dizziness. Therefore it is important to determine the biological properties of Oleo-gum-resin compounds isolated from F. assafoetida. However, in spite of the known medicinal attributes of compounds from F. assafoetida, most of these compounds as well as the enzymes involved in their biosynthesis remain uncharacterized at the molecular level. Therefore we decided to evaluate the transcriptome and metabolome of different tissues of F. assafoetida to identify candidate mechanisms and pathway involved in the production of some important medicinal compounds. This proposal is for requesting resources from Jetstream cloud for the purpose of assembling the transcriptome of Ferula from RNA-Seq reads generated from three different plant species and from four different tissues. De novo transcriptome assembly will be constructed using Trinity after the reads have been subjected for quality trimming and digital normalization. De novo assembly construction is considered as highly memory intensive and time taking process and it involves several iterations of running the assembler with different k-mer sizes until an optimum assembly is generated. Once the assembler is constructed, the assembly will be assessed using a variety of tools such as Transrate, BUSCO and so on. The final part of the analysis will be annotating the assembling transcriptome using Dammit software. Currently I am using High Performance Computing cluster at UC Davis for initial assembly, but there is a long wait time to start any kind of analysis on the cluster. Allocation of resources on the public Jetstream and persistent storage will allow us to further exploration of this data set and running whole pipeline easily and rapidly. Our results from this analysis will facilitate studies on the functions of genes involved in the secondary metabolite biosynthesis pathway in other medicinal plants. Furthermore the information about metabolic pathways of this transcriptome is very valuable for understanding the biosynthesis process of the production Oleo-gum-resin such as the place that is produced, or the tissue that transfer it to other parts etc., Resources Request Information: In order to achieve the goals, i request the following: 100,000 SU’s s1.xxlarge (44 CPUs, 120 GB memory, 480 GB disk) instance 1 TB external volume space for storing my raw RNA-Seq reads as well as all the outputs and intermediate files generated from
Close

Project Abstract

Soil biodiversity and ecosystem functioning in agricultural systems

PI: Sebastian Bender



Soils are among the most species rich habitats on Earth, and are of fundamental importance for terrestrial ecosystems. It is increasingly being recognized that human land-use, such as intensive agricultural land management, has adverse effects on soil biota and their diversity. Moreover, recent research findings suggest that soil organisms are key players for ecosystem functioning and, hence, determine the ecosystem services delivered by soils. Therefore, reductions in soil biodiversity induced by human land-use may also lead to a decline in ecosystem functioning. First evidence for this has been generated in model systems in greenhouse experiments showing that reductions in soil biodiversity lead to a decline of several ecosystem functions simultaneously (i.e. ecosystem multifunctionality), but field-based evidence for this relationship is rare. A detailed understanding of the factors and processes determining ecosystem-service delivery is, however, of pivotal importance for human well-being. In this project, the effect of agricultural management intensity on soil biota and ecosystem multifunctionality will be investigated in a range of fields differing in management intensity across Northern California. Moreover, it will be tested whether the removal of soil organisms has stronger effects in natural and extensively managed ecosystems as compared to intensively managed systems. It is hypothesized that natural ecosystems possess a high capacity for internal self-regulation, provided by soil organisms. With increasing land-use intensity, the capacity of ecosystems for internal self-regulation is reduced, as these systems comprise lower soil biodiversity and depend on external resource inputs (e.g. fertilizers). Assessments of soil biodiversity and ecosystem functioning will be complemented with state-of-the-art metagenomic analyses to analyse the functional capacities of soils and to identify potential indicator species for the respective land-use types. This project will provide important basic information on the role played by soil biodiversity in ecosystem functioning and how this is affected by land-use intensity.
Close

Project Abstract

Genomic Characterization of Plant Growth Promoting Microbes from a unique maize landrace

PI: Alan Bennett



Exploring maize biodiversity near the center of domestication reveals unorthodox biological traits such as the secretion of a polysaccharide-rich mucilage from the aerial roots of an isolated variety of maize (corn). The distinguishing characteristic of this mucilage that makes it a strong candidate for innovation in food science technology is the abundance of fucose sugar residues woven into the secreted complex carbohydrates. This hallmark feature is of particular interest based on recent studies in Human Milk Oligosaccharides, which confirmed fucose as both a key component and a functional glyconutrient that bolsters the immune system. Metagenomics analysis of aerial roots secreting mucilage has permitted us to identify bacterial species from the Bacteriodetes and Proteobacteria phyla that possess genetic biodiversity in carbohydrate-acting enzymes. Moreover, we isolated pure cultures of microorganisms from the aerial root mucilage microbiota and corroborated the information found in the previous metagenomics study. We hypothesize that through the use of bioinformatics tools on the XSEDE high performance cloud computing platform, the analysis of whole genome and expression sequence data from over 600 microbial isolates of this unique biological source will yield an array of metabolic tools that may be repurposed for innovation in probiotic systems engineering to benefit human health.
Close

Project Abstract

UC Davis GGG 201b: lab section

PI: C. Titus Brown



Prokaryotic and eukaryotic genomes. Experimental strategies and analytical challenges of modern genomics research and the theory and mechanics of data analysis. Structural, functional, and comparative genomics. Related issues in bioinformatics. In this course, we run 10 practical computational labs and have three associated homeworks. The labs cover software install, shell-level data analysis, Jupyter Notebook & RStudio-based visualization, etc.
Close

Project Abstract

Recombination rate variation in a wild population of Florida Scrub-Jays

PI: Nancy Chen



Meiotic recombination plays an important role in determining levels of genetic diversity in eukaryotic genomes. Understanding the causes and consequences of variation in recombination rates is therefore crucial for predicting how populations respond to selection and studying genotype-phenotype associations. However, our knowledge of the factors governing recombination rate variation in natural populations remains limited. We propose to investigate the recombination landscape and individual variation in recombination rates using extensive pedigree and genomic data in a wild population of Florida Scrub-Jays (Aphelocoma coerulescens). We will build a high-density linkage map using CRIMAP to estimate individual recombination rates across the genome for males and females, then test for environmental and genetic factors associated with recombination rate variation. A detailed linkage map for the Florida Scrub-Jay will also provide insights to the avian recombination landscape and serve as an important resource for studies of evolution.
Close

Project Abstract

Application for Jetstream startup allocations for running dammit de novo transcriptome annotation pipeline software

PI: Lisa Cohen



This proposal is requesting resources on the IU/TACC Jetstream platform for annotating de novo transcriptome assemblies from the Marine Microbial Eukaryotic Transcriptome Sequencing Project [1]. A previous allocation (TG-BIO160028) was used to assemble and evaluate 678 transcriptomes from the MMETSP [2]. Assembled contigs must now be annotated against curated databases to predict the identity of proteins the sequences are encoding. Annotation has historically been a bottleneck step in the process of developing de novo reference transcriptomes for non-model species because many steps are required to align the translated transcript sequences to multiple databases each with a different format, then condense the output information and alignment e-value scores into useable annotation output files. The annotation pipeline software program, “dammit” [3] was written and named to address the frustration with the current status of annotation software. It outputs results in useful formats (.csv, .gff3, .fasta) required for downstream analyses, such as differential expression and functional annotation. Dammit uses the curated Pfam [4], Rfam [5], and OrthoDB [6] protein databases, and (if available) custom species-specific protein databases, as evidence for annotating translated contigs from the de novo transcriptome assembly. While dammit is easy to use, it is difficult to install because it requires several programs (BUSCO, Transdecoder, last, hmmer) and accompanying Python dependencies. Jetstream elastic cloud computing infrastructure is the ideal environment for installing and using this annotation tool because it does not rely on administrative privileges for software installation. Since unannotated de novo transcriptome assembly files are relatively small to upload (MB in size) and dammit output results are also relatively small to download (MB in size), instances can be launched from images with pre-installed software, memory-intensive processes run, then terminated without the need for persistent storage or compute beyond the time needed to run the software. We will develop materials specifically for using dammit on Jetstream [7] to annotate 678 transcriptome assemblies from the MMETSP. These materials will then be tested in a workshop setting at the UC Davis Data Intensive Summer Institute (DIBSI) run by Dr. Titus Brown. A public Jetstream image will be built for others to use the dammit transcriptome annotation pipeline, similar to the available public image for genome annotation, MAKER1154and_Friends. References: [1] Keeling PJ et al. 2014. The marine microbial eukaryote transcriptome sequencing project (MMETSP): Illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLOS Biology. 12:(6): e1001889. [2] Cohen, Lisa; Alexander, Harriet; Brown, C. Titus (2017): Marine Microbial Eukaryotic Transcriptome Sequencing Project, re-assemblies. Figshare. https://doi.org/10.6084/m9.figshare.3840153.v6 [3] Scott, C. 2016. dammit: an open and accessible de novo transcriptome annotator. (in prep.) Source code: https://github.com/camillescott/dammit, User manual: http://dammit.readthedocs.io/en/latest/ [4] Finn RD et al. 2016. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44:D279–85. [5] Gardner PP et al. 2009. Rfam: updates to the RNA families database. Nucleic Acids Res. 37:D136–40 [6] Zdobnov EM, et al. 2017. OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Res. 45:D744–9 [7] Workflow tutorial: http://rnaseq-workshop-2017.readthedocs.io/en/latest/dammit_annotation.html
Close

Project Abstract

Direct phase equilibrium simulation of NIPAM oligomers in water and optimization of the potential

PI: Roland Faller



N-isopropylacrylaminde-based polymers (PNIPAM) are one of the best-studied thermoresponsive materials. These can be used in a wide range of applications, including catalysis, sensors, enzyme encapsulation and drug delivery, which makes it very desirable to understand the molecular behavior of PNIPAM. In water, PNIPAM shows a lower critical solution temperature (LCST) at 305 K and a conformational transition of single chains at the same temperature. Below this temperature PNIPAM is completely soluble in water, but above the LCST water and PNIPAM separate into two pure phases. In the last years this behavior has been simulated in atomistic molecular dynamics (MD) simulations to gain a deeper understanding of the mechanisms leading to the phase separation. Because the molecular mechanisms are very complex, to this date the correct phase behavior has not been simulated without an error regarding the LCST. For better results in MD simulations a modification of the potential for PNIPAM can be introduced, such that the LCST is shifted to the experimental observed value. The objective of this work is to adapt a modification of the potential, such that it fits the experimentaly observed data. To archive this goal, MD simulations of oligo-NIPAM using Amber94 + TIP3P force fields will be performed. The parameter for modification of the potential will then be fitted to match the experimental results. Therefore, the experimental results for NIPAM-oligomers (ONIPAM), synthesized at the “Leibniz-Institut für Interaktive Materialien” (DWI), are available. Thus a model, which can simulate the real LCST of ONIPAM, will be developed.
Close

Project Abstract

The role of microbiota in mediating local adaptation and plant influence on ecosystem function in a marine foundation species, Zostera marina

PI: Melissa Kardish



Increasing research suggests that microbiota interact with plants and animals to alter host fitness and disease resistance. Furthermore, microbiome composition can vary among host genotypes and environments, and may contribute to observed variation in host phenotype. Individual variation in phenotype within key species, such as foundation plant species or keystone consumers, affects the structure and functioning of entire ecosystems, providing a potentially important mechanism by which microbiomes contribute to the functioning of macroscopic ecosystems. However, few experiments test causal links between host phenotype and microbiome composition, and, outside of a few model systems, virtually no studies examine the cascading effects of variation in a host’s microbiome on communities or ecosystems. I conducted a series of reciprocal transplants of the marine angiosperm, Zostera marina, and have sequenced the V4-V5 region of the 16 S gene of bacteria associated with leaves, roots, and adjacent sediment. This will allow me to examine the sources of natural variation in the microbiome of the marine angiosperm Zostera marina (eelgrass), and the potential consequences of microbiome composition for host fitness, host local adaptation, and the effect of eelgrass on ecosystem structure and functioning. To accomplish this analysis, I would like to use Qiime the Gordon Computing Cluster to assist in the processing of 16S data from these transplants as well as from temporal data.
Close

Project Abstract

CIG Science Gateway and Community Codes for the Geodynamics Community

PI: Louise Kellogg



The Computational Infrastructure for Geodynamics (CIG), an NSF cyberinfrastructure facility, aims to enhance the capabilities of the geodynamics community through developing software that can be used to address a range of challenging problems in geophysics. CIG supports code development and benchmarking, user training, and new users by providing small allocations of computation time along with user support for CIG codes. CIG supports the aforementioned efforts in the following areas of activity: mantle dynamics, seismic wave propagation, geodynamo, and crustal and lithospheric dynamics on both million-year and earthquake time-scales. These efforts have resulted in successful allocation requests by our community and involvement of international researchers in benchmarking the next generation of geodynamo codes all of which were enabled by our community allocation.
Close

Project Abstract

Large-scale Video Object Detection

PI: Yong Jae Lee



Visual object detection is a fundamental problem in computer vision, and has broad applicability in numerous fields including AI, defense, medicine, and agriculture. While there has been a long history of research in detecting objects in static images, there has been relatively little research in detecting objects in videos. However, cameras on robots, surveillance systems, unmanned vehicles, wearable devices, etc., receive videos and not static images. Thus, for these systems to recognize the key objects and their interactions, it is critical that they be equipped with accurate video object detectors. In this project, we propose a novel machine perception framework for detecting objects in video. The key contribution is a deep recurrent spatial temporal memory network that models the long-term temporal dependencies of an object's appearance and motion. We have begun development of the model and have initial results on the ImageNet-VID dataset, which contains 5000+ videos across 30 object categories. We would like to make improvements to do the approach in both speed and accuracy, and to evaluate it on a larger-scale dataset (e.g., YouTube-BB dataset, which has 240,000 videos). The compute requirements to do so are beyond the current resources available at UC Davis. Our model currently takes 0.2 seconds to process each frame, which by in itself is not slow, but this quickly adds up when we have to process e.g., 240,000 videos each with thousands of frames. Thus, we would like to parallelize the work on a GPU cluster. We are requesting 2500 GPU hours on the Bridges GPU (PSC) cluster and 15 terabytes of storage on the PSC Pylon.
Close

Project Abstract

Testing the scalability and numerical efficiency of long-term tectonic models of continental extension

PI: John Naliboff



This request for a startup allocation on the XSEDE-supported cluster Comet follows preliminary scaling tests on Comet and prior work on the Stampede1 cluster related to my research at the Computational Infrastructure for Geodynamics (CIG). As a project scientist at CIG, my work centers on developing, testing and applying the finite-element code ASPECT to simulations of long-term tectonic deformation (viscous and brittle behavior) in the solid earth. ASPECT is built on the open source finite element library deal.II, which provides massive scalability across 10^3-10^4 cores, adaptive mesh-refinement capabilities and robust linear and non-linear solvers. To date, strong and weak scaling tests with ASPECT have been performed on a wide range of clusters, including Stampede1, Lonestar, HLRN (Berlin) and many additional smaller clusters. These scaling results have been published in multiple peer-reviewed articles and are also contained in the CIG proposal for computing on Stampede 1: “CIG Science Gateway and Community Codes for the Geodynamics Community” (TG-EAR080022N). Here, I am applying for a startup allocation of 50,000 core-hours on Comet to perform additional scaling tests with ASPECT and test the relative efficiency of different model configurations (non-linear solver tolerances, linear solver schemes, etc) for 3-D simulations of continental extension. These simulations of continental extension are built on extensive 2-D and 3-D sensitivity tests for relatively small model sizes (< 10^7 degrees of freedom) and a limited (< 5) number of large 3-D simulations (> 10^8 degrees of freedom) run on STAMPEDE 1. Through a small trial allocation (1000 SUs) on Comet, I have performed strong and weak scaling tests on up to 96 cores for models that range from ~60,000 to ~16,000,000 degrees of freedom. This trial allocation will be used in part for scaling tests that examine models with up to 10^9 degrees of freedom run across hundreds or thousands of cores (up to 1536). Notably, these scaling tests are based on relatively simple models that only require using linear solvers and do not contain large variations in material parameters. To ensure the code scales efficiently on Comet for models using a non-linear rheology and large (orders of magnitude) variations in material properties, I will perform a second series of scaling tests with a model setup derived from earlier simulations of continental breakup. While these two series of scaling tests will likely require on the order of 10-20 thousand core-hours, the models are only run for a small number (1-2) of time steps. In contrast, the simulations of continental breakup require thousands of time steps, during which the dynamics can change significantly. To ensure that the predicted scaling behavior extends throughout the model duration, the remaining core-hours (30-40 thousand) will be used to run one large simulation to completion. This estimate is based on the preliminary 3-D simulation of continental extension (~ 108 degrees of freedom) run on STAMPEDE1. As an example, one model required 10.333 hours and 960 cores (~ 9920 core hours) to run for 25% of the simulation time planned for future models. The results of the scaling tests and trial simulation outlined above will form the basis of a proposal requesting further computing time on Comet for a series of production runs. If further details regarding the details of ASPECT or the planned scaling tests is required, I will provide this information in haste. Thank you for the consideration of this startup allocation request of 50,000 core-hours and I will be looking forward to hearing from you.
Close

Project Abstract

Numerical simulations of 3D fault system development during continental extension

PI: John Naliboff



Deformation of Earth's tectonic plates (e.g., lithospheric plates) reflects the forces driving plate motion and the rheological response of the plates to the applied forces. In both current and previously active plate boundaries, the geologic record and modern observations clearly indicate that the style of lithospheric deformation can evolve significantly due to strong feedbacks between lithospheric structure, rheology and plate driving forces. To elucidate how temporal feedbacks between plate driving forces and the lithosphere's non-linear rheology influence observed deformation patterns, numerical simulations of lithospheric deformation are commonly used to systematically investigate the effects of key parameters. While numerous lithospheric deformation studies have provided significant insight to a wide range of tectonic processes, most studies are restricted to 2D and rely on qualitative comparisons due to the computational cost of high-resolution 3D simulations and a scarcity of 3D structural data sets that track fault development through time. Here, I propose to use the state-of-the-art finite-element code ASPECT to perform high-resolution 3D simulations of fault development during the initial phase of continental extension where distributed faulting occurs. The simulations will systematically explore how fault system characteristics vary with the rate and magnitude of strain weakening, which are known to have a significant affect on lithospheric deformation patterns but remain loosely constrained by experimental work and observed. To provide further constraints, the results of these simulations will be compared in a statistical fashion with an ongoing study of normal fault evolution in the North Sea.
Close

Project Abstract

Genomic mechanisms underlying the collapse and lack of recovery of Prince William Sound herring

PI: Elias Oziolor



The Exxon Valdez oil spill in 1989 preceded the collapse of the Prince Williams Sound (PWS) herring fishery in 1993. However, the causes of the collapse and the striking lack of recovery of PWS herring over the past twenty-five years remains a mystery. The emergence of disease pathogens around the time of the collapse and persistence thereof may offer a clue. The research proposed here aims to test hypotheses on the effects of oil exposure, the effects of disease challenge, and their interaction on herring health and fitness. The overall goal of this proposal is to determine whether there are functional connections that link the PWS herring collapse and lack of recovery with disease impacts and the oil spill. I hypothesize that comparative physiology and transcriptomics from laboratory experiments provide evidence that oil and pathogens were potential contributors to the collapse and lack of recovery of PWS herring. At this time no reference Pacific herring genome exists and the transcriptome remains publicly unavailable warranting assembly of a novel reference to understand gene regulatory changes. I aim to create publicly available genomic resources for Pacific herring by building a high quality reference transcriptome as a template to test hypotheses. In order to create gene annotations, I have performed IsoSeq using SMRT sequencing (PacBio) to generate full-length transcripts from RNA extracted from multiple tissues and developmental time-points of adult and larval Pacific herring. To date, the UC Davis Genome Center DNA Technologies Core has generated and sequenced 2 SMRT libraries using PacBio Sequel technology. Each SMRT cell generated about 30,000,000 reads. I am requesting access to the computing power of XSEDE to implement a de novo assembly of a reference transcriptome using the SMRTAnalysis v2.3.0 Iso-Seq pipeline. This pipeline includes an error-correction algorithm (Arrow) and iterative clustering of isoforms into gene families using COGENT (Coding genome reconstruction tool) v2.0 to output high-quality, full-length isoform consensus sequences. Genes will be defined according to top BLAST-search hits of assembled transcripts against known sequences in the respective databases (e.g., NCBI non-redundant). I will test the completeness of the transcriptome by also mapping corrected PacBio long reads against the annotated Atlantic herring genome). I also request access to XSEDE to test hypothesis with RNA-seq data from ongoing laboratory experiments. These long-term experiments were initiated in 2017 and will we will continue to collect data through fall of 2018. Quantification and comparison of gene-level expression based on high throughput sequencing reads will be carried out by a suite of software programs, including Salmon (to quantify transcript abundance) and DESeq2 (to quantify differential gene expression across biological replicates and between treatments). This study should inform preventative strategies for future oil spills and disease outbreaks and provide genomic resources for an economically and culturally important species.
Close

Project Abstract

Assessment of RNA editing over Doryteuthis opalescens development

PI: N Tessa Pierce



This proposal is requesting resources from the IU/TACC Jetstream cloud system for the purpose of assessing differential RNA editing over development in the California market squid, Doryteuthis opalescens. Recent work has identified extensive RNA editing of coding sequences as a unique characteristic of adult coleoid cephalopods, and suggested that it may contribute to the neural and behavioral plasticity that characterizes these animals [1]. Preliminary identification of putative transcriptome-wide editing sites and RNA editing enzymes in Doryteuthis opalescens developmental transcriptomes suggests a role for RNA editing in developmental plasticity as well. To elucidate this role, I will analyze corresponding RNA and DNA Illumina data from Doryteuthis opalescens to investigate the prevalence of RNA editing across a time course of replicated samples from six time points ranging from early development until hatching. [1] Liscovitch-Brauer, Noa, Shahar Alon, Hagit T. Porath, Boaz Elstein, Ron Unger, Tamar Ziv, Arie Admon, Erez Y. Levanon, Joshua JC Rosenthal, and Eli Eisenberg. "Trade-off between transcriptome plasticity and genome evolution in cephalopods." Cell 169, no. 2 (2017): 191-202.
Close

Project Abstract

surveying binary superconducting hydrides using ab-initio methods

PI: Yundi Quan



Discovery of superconductivity in pressurized hydrogen sulfides has stimulated renewed interest in hydrides. Much research effort has been focused on structure prediction, while the issue of numerical convergence with respect to k- and q-mesh is often neglect. In this project, we carry out systematic ab-initio calculations based on Wannier function interpolation of electronic and lattice degrees of freedom to study all the binary hydrides discovered so far under various pressures. We aim to build a database of highly accurate electronic structure and phonon spectrum of existing binary compounds to help understand possible connection between the Tc of a hydride and its various physical properties.
Close

Project Abstract

Prediction of cellular state using deep neural networks

PI: NAVNEET RAI



Accurate prediction of cellular state in new conditions is of significant interest in biology due to it’s impact on food, medicine and the environment. To capture complexity of cellular organisms, a predictive model should exhibit high knowledge representation capacity in par with the cell itself. Deep neural networks provide high representation capacity but their dependence on big datasets is a challenge for biological predictive tasks depending on OMICS data. Even for the most well studied microbe Escherichia coli, the largest OMICS compendium contains only 4389 genome-wide profiles for 649 conditions. To circumvent this gap, we generate large realistic OMICS datasets given various assumed biophysical properties of cellular organism using simulation software. This enables exploration of various neural network architectures and helps identify applicability of each architecture given the circumstances (e.g. organism complexity, data size, etc.). Using such simulation data for the task of predicting steady state gene expression in novel conditions, we developed novel neural network architecture outperforming existing models (10%-40% higher PCC) when evaluated on small sub-organisms (2-100 genes). To this end, we want to use OSG to help finish the current evaluation and provide evidence for applicability of our novel approach. For larger models we expect to need GPUs hence requested Comet.
Close

Project Abstract

: Genome assembly, annotation and characterization of Fusarium strains to understand their evolution in the context of chickpea pathogenicity

PI: Anandkumar Surendrarao



Fusarium oxysorum is a fungal pathogen with a very broad host range. It can associate with animals, including humans, and also plants, including both gymnosperms and angiosperms. One of the plant species that is adversely affected is chickpea, Cicer spp. The cultivated C. arietinum crop can experience even 100% losses due to wilt caused by F. oxysporum forma specialis ciceris (FOC). A major goal of our research group at UC Davis is to re-domesticate chickpea using wild germplasm. One of the agronomic traits of interest, that we wish to introduce into new varieties is Fusarium resistance. This requires understanding of Fusarium genomics and evolution by itself, and also in the context of chickpea genomics and co-evolution. Towards this goal, 290 strains of FOC were collected from a wide range of geographies in Ethiopia. Currently I am using Farm – a High Performance Computing cluster at UC Davis for my bioinformatic nees. However, user account limits me to not more than 7 jobs at a time, due to RAM availability limits. This is a major computational bottleneck that STAMPEDE allocation can help overcome quickly and efficiently. Results from my Fusarium genome analyses will serve many scientific goals - understand evolution of core and accessory genomes across these 290 pathovars, extrapolate these findings to other forma specialis strains of Fusarium oxysporum (that infect other plant hosts), and inform breeding strategies for development of Fusarium resistant chickpea varieties, locally adapted to various growing regions of the world. The STAMPEDE allocation will be used to perform typical genomics analyses steps, including but not limited to Illumina reads quality control and pre-processing, de novo genome assembly, gene prediction, genome annotation, orthology determination, core/accessory genome prediction, gene flow analyes, and correlating pathogen phenotypes with structural and single nucleotide variants (i.e. GWAS).
Close

Project Abstract

Automatic knowledge base construction and hypothesis generation: antibiotic resistance mechanisms for Escherichia coli

PI: Ilias Tagkopoulos



Antibiotic resistance is one of the leading threats to global health, food security and development today. The construction of a cohesive knowledge base for antibiotic resistance that can be a source for machine learning methods will make broad impacts in the field and eventually enable an artificial intelligent (AI) system to automate knowledge discovery in unprecedentedly efficient and unbiased ways. We are building a knowledge base for the E. coli antibiotic resistance mechanisms that are specifically structured for efficient machine learning. The constructed knowledge base is a set of tuples in graph format where nodes represent entities and edges represent relations provided with a confidence score of each tuple. It integrates information from existing antibiotic resistance databases (CARD and PATRIC), gene-regulatory relation databases (EcoCyc and RegulonDB), high-throughput profiles and manual curation of missing information from literature, if any. Due to their nature of interdependency, we are quantifying the confidence of each tuple as well as the confidence of its supporting evidence by measuring one another iteratively until a convergence is reached. In parallel, we are building prior models that will learn from the ever-growing knowledge graph to both defend against unreliable outside information as well as perform automatic hypothesis generation, which we will experimentally validate in our multidisciplinary lab. We would like to leverage the OSG, Comet GPU, and Data Oasis to both evaluate and further optimize the deep learning models that we are currently developing by applying hyper-parameter optimization through large-scale grid search.
Close

Project Abstract

MECHANISMS OF BIOORGANIC AND ORGANOMETALLIC CYCLIZATION REACTIONS

PI: Dean Tantillo



The focus of the research proposed herein, a renewal of CHE030089N, is to apply modern quantum chemical methods to the elucidation of molecular mechanisms of organic chemical reactions that are used in the synthesis and biosynthesis of polycyclic organic molecules. During this award period, we focus on reactions for which we suspect non-statistical dynamics effects play important roles. Consequently, we will focus our efforts on direct/ab initio molecular dynamics calculations, which are the most time-consuming calculations we carry out (other routine calculations will be carried out in-house).
Close

Project Abstract

Elucidation of molecular mechanisms of sex-dependent pro-arrhythmia through hERG block by drugs and steroid hormones

PI: Igor Vorobyov



Common and sometimes fatal heart rhythm disorders such as long QT syndrome (LQTS) have been linked to mutations in cardiac ion channels as well as unwanted drug interactions with those proteins. Female sex has been shown to be an independent risk factor for both inherited and acquired LQTS as well as associated arrhythmias. tentatively correlated with differential levels of sex hormones (estradiol, progesterone and testosterone) playing opposite roles in proclivities for heart rhythm disturbances. There is a critical need to understand cardiac ion channel modulation by drugs and/or sex hormones at the molecular level to develop safer and effective therapeutics. We will focus on drug and/or hormone interactions with the human ether-a-go-go (hERG) potassium channel (KV11.1), a major contributor to a cardiac action potential repolarization and an anti-target for diverse drug molecules. We propose atomistic modeling and simulation approaches to compute binding affinities of hormones and LQTS inducing drugs such as dofetilide as a first step. A recent cryo-EM structure of hERG will be used for those studies. We will use quantum mechanical (QM) calculations using Gaussian software to develop and/or validate drug and hormone force field parameters. Drug and hormone binding to hERG will be tested using both long unbiased drug/hormone “flooding’ as well as multi-window restrained umbrella sampling (US) molecular dynamics (MD) simulations using NAMD. Therefore we request the following XSEDE allocations: Comet (SDSC) – 50,000 cpu hours (to be used for QM and US MD simulations, XStream (Stanford) – 5000 GPU hours (or Comet GPU when it becomes available) for “flooding” MD. These estimates are based on benchmarks on our local resources (small 10-node GeForce GTX 1080 / Xeon E5-2620 GPU/CPU cluster and workstations). Equivalent resource substitution or using Open Science Grid up to an allowed maximum is a reasonable alternative for this project as well. The proposed allocation will be used for a few runs described above to provide preliminary scientific (including feasibility and convergence) and benchmarking data for a larger scientific allocation to be submitted in the nearest future.
Close

Project Abstract

Atomistic simulations to elucidate molecular mechanisms of drug- and hormone-induced pro-arrhythmia proclivities

PI: Igor Vorobyov



The human voltage gated potassium channel Kv11.1 encoded by the hERG gene is the key repolarizing K+ current in cardiomyocytes. Many drugs are known to block IKr, which can lead to an acquired long QT syndrome, the standard clinical ECG-based indicator for an increased risk of ventricular arrhythmias such as Tosades de Pointes (TdP). In fact, hERG block is a common side effect for drugs and drug candidates, leading to their withdrawal from the market. However, hERG blockers have very different proclivities for arrhythmogenesis and thus cardiac safety profiles. We hypothesize that the fundamental mode of drug interaction, derived from the unique structure activity relationship of a drug, determines the resultant effects on cardiac electrical activity in cells and tissue. In addition, female sex has been shown to be an independent risk factor for an acquired LQTS and TdP, and while our recent work has shown that drugs and estrogen can coexist in the hERG pore, the molecular mechanisms of these interactions and their effects remain unknown. Multiple unbiased and restrained molecular dynamics (MD) simulations on XSEDE CPU and GPU resources reaching tens of microseconds in total are ideally suited to explore atomic-scale basis of these effects. Now is also the best time for these studies since high-resolution cryo-EM structures of an open state hERG and a closed state of a homologous rEAG channels have recently become available. We have already begun testing hERG open state stability through long unbiased MD simulations using our local resources and an active DE Shaw Anton 2 allocation. Also, we have been working on homology models of hERG inactivated state through structural modeling with ROSETTA supported by electrophysiological measurements on mutant channels by our experimental collaborators. Multiple MD simulations will be crucially important for the validation of those models by considering different applied voltages, as well as channel mutants that are known to preferentially stabilize hERG in distinct conformational states. Accurate empirical force field parameters for several cardiac-safe and pro-arrhythmic hERG-blocking drugs have been developed in our laboratory as well using both local resources and a startup XSEDE allocation on Comet at SDSC. We will be validating those parameters and computing drug membrane partitioning thermodynamics and permeation rates using umbrella sampling MD simulations for drug/hydrated lipid membrane systems. This information will be used to perform multi-microsecond drug and hormone “flooding” unbiased and umbrella sampling all-atom MD simulations with hERG in distinct conformational states. We will also investigate the effects of drug ionization states as well as applied voltage, which can all modulate drug binding affinities and thus their pro-arrhythmia proclivities in hope to obtain an accurate molecular picture of channel state-dependent drug binding and egress pathways along with corresponding energetics. This information will allow us to predict pro-arrhythmia determinants at the molecular level and will help to develop new pharmaceuticals with improved cardiac safety profiles.
Close

Project Abstract

Simulating the Local Group

PI: Andrew Wetzel



A wealth of exciting ongoing/upcoming observational projects are targeted to near-field cosmology and galactic archaeology in the Local Group, by measuring stellar populations and phase-space distribution of stars in/around the Milky Way (MW), Andromeda (M31), and its satellite dwarf galaxies at unprecedented levels (for example, Hubble Space Telecope, SDSS-APOGEE, the Dark Energy Survey, Gaia, and LSST). These observational campaigns are revolutionizing our understanding of galaxy formation as well as the nature of dark matter on the smallest cosmological scales. However, interpreting and understanding these results, including making predictions for upcoming observations, requires ultra-high-resolution cosmological simulations, which can resolve structure on 1 pc scales, and which include the necessary physics of hydrodynamics, star formation, and feedback, all carefully targeted to the environment of the Local Group. We request a renewal allocation to continue our ultra-high-resolution simulations of galaxy evolution, star formation, and stellar feedback, with which we will study the physics of the interstellar medium (ISM), the formation of stars, stellar feedback, galaxy formation, and the cosmological distribution of dark matter, with new physics and unprecedented resolution. This renewal will allow us to build on the significant numerical and physical advances that our previous XSEDE research allocations have enabled, to run a targeted suite of simulations to understand the Local Group, comprising the MW, M31, the Large Magellanic Cloud (LMC), and numerous satellite dwarf galaxies. Each of our simulated systems will be resolved with > 200 million particles and followed self-consistently over their entire history to the present day in live cosmological settings carefully matched to the Local Group environment. Our proposed simulations will address a wide array of timely scientific questions. For the galaxies like the MW and M31, we will study in detail (1) gas accretion, angular momentum transport, and its role in disk formation, including the impact of close pairs of galaxies like the MW and M31, (2) stellar migration and chemical mixing within the disk, (3) the impact of massive satellites/subhalos on kinematic heating of the disk. Our simulations also span the dynamic range needed to model the satellite dwarf galaxies that are observed around the MW and M31, including the relevant baryonic physics to predict the properties of their observed stars: from massive satellites like the LMC with Mstar = 2 × 10^9 M⊙ to faint dwarf galaxies with Mstar ∼ 10^5 M⊙. Because they are so faint and dark-matter dominated, such “dwarf” galaxies represent a key frontier field for testing (1) the Cold Dark Matter (CDM) paradigm of cosmology, (2) the epoch of reionization, and (3) the most extreme regimes of galaxy formation. In this renewal, we request 180,000 SUs (node-hours) on Stampede2 to run a suite of simulations targeted to the environment of the Local Group. Specifically, we will run a realization of a Local Group-like pair of MW and M31-like galaxies (120,000 SUs) and a realization of a MW- like galaxies with an LMC-like satellites (60,000 SUs). To compare with observations of the Local Group, we must run each simulation across its entire formation history to the present day (z = 0).
Close

Project Abstract

State-dependent drug modulation of sodium channels

PI: Vladimir Yarov-Yarovoy



The goal of this project is to study the molecular mechanisms of voltage gated sodium (Nav) channel gating and modulation using molecular dynamics (MD) simulations. Our proposal will take advantage of several recent breakthroughs in the field of Nav channel structure: (1) a cryo-electron microscopy (cryoEM) structure of the first eukaryotic Nav channel (with pore-forming domain in the closed state and voltage-sensing domains in either activated or intermediate state); (2) new X-ray structures of bacterial Nav channels (with pore-forming domain in its open state); and (3) we have used Rosetta computational modeling software and MD simulations to generate stable ion conductive open state models of a bacterial Nav channel. We propose to simulate our new Rosetta structural models for human Nav channels in open, closed and inactivated states. This will enable demonstration of the molecular mechanisms of channel activation and inactivation. Experimental studies have identified structural regions forming the binding sites of small molecule inhibitors on Nav channels, yet the molecular mechanisms of modulation remain unclear. The proposed simulations on XSEDE supercomputers will significantly advance our basic knowledge of Nav channel gating and modulation, providing new understanding that may lead to novel therapeutics for neurological, muscular and cardiac diseases.