COVID-19 HPC Consortium

HPC Resources available to fight COVID-19

The COVID-19 HPC Consortium encompasses computing capabilities from some of the most powerful and advanced computers in the world. We hope to empower researchers around the world to accelerate understanding of the COVID-19 virus and the development of treatments and vaccines to help address infections. Consortium members manage a range of computing capabilities that span from small clusters to some of the very largest supercomputers in the world.

Preparing your COVID-19 HPC Consortium Request

To request access to resources of the COVID-19 HPC Consortium, you must prepare a description, no longer than three pages, of your proposed work. To ensure your request is directed to the appropriate resource(s), your description should include the following sections. Do not include any proprietary information in proposals, since your request will be reviewed by staff from a number of consortium sites. It is expected that teams who receive Consortium access will publish their results in the open scientific literature. All supported projects will have the name of the principal investigator, project title and project abstract posted to the COVID-19 HPC Consortium web site.

The proposals will be evaluated on the following criteria:

  • Potential benefits for COVID-19 response
  • Feasibility of the technical approach
  • Need for high-performance computing
  • High-performance computing knowledge and experience of the proposing team
  • Estimated computing resource requirements 

A. Scientific/Technical Goal

Describe how your proposed work contributes to our understanding of COVID-19 and/or improves the nation's ability to respond to the pandemic.

  • What is the scientific/technical goal?
  • What is the plan and timetable for getting to the goal?
  • What is the expected period for performance (one week to three months)?
  • Where do you plan to publish your results and in what timeline? 

B. Estimate of Compute, Storage and Other Resources

To the extent possible, provide an estimate of the scale and type of the resources needed to complete the work. The links in the Resources section are available to help you answer this question.

  • Are there specific computing architectures or systems that are most appropriate (e.g. GPUs, large memory, large core counts on shared memory node, etc.)
  • How much computing support will this effort approximately require in terms of core, node, or GPU hours?
  • How distributed can the computation be, and can it be split across multiple HPC systems?
  • Can this workload execute in a cloud environment? 
  • Describe the storage needs of the project.
  • Does your project require access to any public datasets? If so, please describe these datasets adn how you intend to use them? 

C. Support Needs

Describe whether collaboration or support from staff at the National labs, Commercial Cloud providers, or other HPC facilities will be essential, helpful, or unnecessary. Estimates of necessary application support are very helpful. Teams should also identify any restrictions that might apply to the project, such as export-controlled code, ITAR restrictions, proprietary data sets, regional location of compute resources, or HIPAA restrictions. 

D. Team and Team Preparedness

Summarize your team's qualifications and readiness to execute the project.

  • What is the expected lead time before you can begin the simulation runs?
  • What systems have you recently used and how big were the simulation runs?
  • Given that some resources are at federal facilities with restrictions, please provide a list of team members that will require accounts on resources along with their citizenship. 

Document Formatting

While readability is of greatest importance, documents must satisfy the following minimum requirements. Documents that conform to NSF proposal format guidelines will satisfy these guidelines.

  • Margins: Documents must have 2.5-cm (1-inch) margins at the top, bottom, and sides.
  • Fonts and Spacing: The type size used throughout the documents must conform to the following three requirements:
  • Use one of the following typefaces identified below:
    • Arial 11, Courier New, or Palatino Linotype at a font size of 10 points or larger;
    • Times New Roman at a font size of 11 points or larger; or
    • Computer Modern family of fonts at a font size of 11 points or larger.
  • A font size of less than 10 points may be used for mathematical formulas or equations, figures, table or diagram captions and when using a Symbol font to insert Greek letters or special characters. PIs are cautioned, however, that the text must still be readable.
  • Type density must be no more than 15 characters per 2.5 cm (1 inch).
  • No more than 6 lines must be within a vertical space of 2.5 cm (1 inch).

* **Page Numbering**: Page numbers should be included in each file by the submitter. Page numbering is not provided by XRAS. * **File Format**: XRAS accepts only PDF file formats.

Submitting your COVID-19 HPC Consortium request 

  1. Create an XSEDE portal account
    • Go to https://portal.xsede.org/
    • Click on "Sign In" at the upper right, if you have an XSEDE account … 
    • … or click "Create Account" to create one. 
    • To create an account, basic information will be required (name, organization, degree, address, phone, email). 
    • Email verification will be necessary to complete the account creation.
    • Set your username and password.
    • After your account is created, be sure you're logged into https://portal.xsede.org/
    • IMPORTANT: Each individual should have their own XSEDE account; it is against policy to share user accounts.
  2. Go to the allocation request form
    • Follow this link to go directly to the submission form.
    • Or to navigate to the request form:
      • Click the "Allocations" tab in the XSEDE User Portal,
      • Then select "Submit/Review Request."
      • Select the "COVID-19 HPC Consortium" opportunity.
    • Select "Start a New Submission."
  3. Complete your submission
    • Provide the data required by the form. Fields marked with a red asterisk are required to complete a submission.
    • The most critical screens are the PersonnelTitle/Abstract, and Resources screens.
      • On the Personnel screen, one person must be designated as the Principal Investigator (PI) for the request. Other individuals can be added as co-PIs or Users (but they must have XSEDE accounts).
      • On the Title/Abstract screen, all fields are required.
      • On the Resources screen…
        • Enter "n/a" in the "Disclose Access to Other Compute Resources" field (to allow the form to be submitted).
        • Then, select "COVID-19 HPC Consortium" and enter 1 in the Amount Requested field. 
    • On the Documents screen, select "Add Document" to upload your 3-page document. Select "Main Document" or "Other" as the document Type.
      • Only PDF files can be accepted.
    • You can ignore the Grants and Publications sections. However, you are welcome to enter any supporting agency awards, if applicable.
    • On the Submit screen, select "Submit Request." If necessary, correct any errors and submit the request again.

Resources available for COVID-19 HPC Consortium request 
Click on title to see full description

U.S. Department of Energy (DOE) Advanced Scientific Computing Research (ASCR)
Supercomputing facilities at DOE offer some of the most powerful resources for scientific computing in the world. The Argonne Leadership Computing Facility (ALCF) and Oak Ridge Leadership Computing Facility (OLCF) and the National Energy Research Scientific Computing Center (NERSC) may be used for modeling and simulation coupled with machine and deep learning techniques to study a range of areas, including examining underlying protein structure, classifying the evolution of the virus, understanding mutation, uncovering important differences, and similarities with the 2002-2003 SARS virus, searching for potential vaccine and antiviral, compounds, and simulating the spread of COVID-19 and the effectiveness of countermeasure options.

 

Oak Ridge Summit | 200 PF, 4608 nodes, IBM POWER9/NVIDIA Volta

Summit System

 2 x IBM POWER9 per node
42 TF per node
6 x NVIDIA Volta GPUs per node
512 GB DDR4 + 96 GB HBM2 (GPU memory) per node
1600 GB per node
2 x Mellanox EDR IB adapters (100 Gbps per adapter)
250 PB, 2.5 TB/s, IBM Spectrum Scale storage

 

Argonne Theta | 11.69 PF, 4292 nodes, Intel Knights Landing

1 x Intel KNL 7230 per node, 64 cores per CPU
192 GB DDR4, 16 GB MCDRAM memory per node
128 GB local storage per node
Aries dragonfly network
10 PB Lustre + 1 PB IBM Spectrum Scale storage
Full details available at: https://www.alcf.anl.gov/alcf-resources

 

National Energy Research Scientific Computing Center (NERSC)

NERSC Cori | 32 PF, 12,056 Intel Xeon Phi and Xeon nodes
9,668 nodes, each with one 68-core Intel Xeon Phi Processor 7250 (KNL)
96 GB DDR4 and 16 GB MCDRAM memory per KNL node
2,388 nodes, each with two 16-core Intel Xeon Processor E5-2698 v3 (Haswell)
128 GB DDR4 memory per Haswell node
Cray Aries dragonfly high speed network
30 PB Lustre file system and 1.8 PB Cray DataWarp flash storage
Full details at: https://www.nersc.gov/systems/cori/

U.S. DOE National Nuclear Security Administration (NNSA)

Established by Congress in 2000, NNSA is a semi-autonomous agency within the U.S. Department of Energy responsible for enhancing national security through the military application of nuclear science. NNSA resources at Lawrence Livermore National Laboratory (LLNL), Los Alamos National Laboratory (LANL), and Sandia National Laboratories (SNL) are being made available to the COVID-19 HPC Consortium.

Lawrence Livermore + Los Alamos + Sandia | 32.2 PF, 7375 nodes, IBM POWER8/9, Intel Xeon
  • LLNL Lassen
    • 23 PFLOPS, 788 compute nodes, IBM Power9/NVIDIA Volta GV100
    • 28 TF per node
    • 2 x IBM POWER9 CPUs (44 cores) per node
    • 4 x NVIDIA Volta GPUs per node
    • 256 BD DDR4 + 64 GB HBM2 (GPU memory) per node
    • 1600 GB NVMe local storage per node
    • 2 x Mellanox EDR IB (100Gb/s per adapter)
    • 24 PB storage
  • LLNL Quartz
    • 3.2 PF, 3004 compute nodes, Intel Broadwell
    • 1.2 TF per node
    • 2 x Intel Xeon E5-2695 CPUs (36 cores) per node
    • 128 GB memory per node
    • 1 x Intel Omni-Path IB (100Gb/s)
    • 30 PB storage (shared with other clusters)
  • LLNL Pascal
    • 0.9 PF, 163 compute nodes, Intel Broadwell CPUs/NVIDIA Pascal P100
    • 11.6 TF per node
    • 2 x Intel Xeon E5-2695 CPUs (36 cores) per node
    • 2 x NVIDIA Pascal P100 GPUs per node
    • 256 GB memory + 32 HBM2 (GPU memory) per node
    • 1 x Mellanox EDR IB (100Gb/s)
    • 30 PB storage (shared with other clusters) 
  • LLNL Ray
    • 1.0   PF, 54 compute nodes, IBM Power8/NVIDIA Pascal P100
    • 19 TF per node
    • 2 x IBM Power8 CPUs (20 cores) per node
    • 4 x NVIDIA Pascal P100 GPUs per node
    • 256 GB + 64 GB HBM2 (GPU memory) per node
    • 1600 GB NVMe local storage per node
    • 2 x Mellanox EDR IB (100Gb/s per adapter)
    • 1.5 PB storage
  • LLNL Surface
    • 506 TF, 158 compute nodes, Intel Sandy Bridge/NVIDIA Kepler K40m
    • 3.2 TF per node
    • 2 x Intel Xeon E5-2670 CPUs (16 cores) per node
    • 3 x NVIDIA Kepler K40m GPUs
    • 256 GB memory + 36 GB GDDR5 (GPU memory) per node
    • 1 x Mellanox FDR IB (56Gb/s)
    • 30 PB storage (shared with other clusters)
  • LLNL Syrah
    • 108 TF, 316 compute nodes, Intel Sandy Bridge
    • 0.3 TF per node
    • 2 x Intel Xeon E5-2670 CPUs (16 cores) per node
    • 64 GB memory per node
    • 1 x QLogic IB (40Gb/s)
    • 30 PB storage (shared with other clusters)
  • LANL Snow
    • 445 TF, 368 compute nodes, Intel Broadwell
    • 1.2 TF per node
    • 2 x Intel Xeon E5-2695 CPUs (36 cores) per node
    • 128 GB memory per node
    • 1 x Intel Omni-Path IB (100Gb/s)
    • 15.2 PB storage
  • LANL Badger
    • 790 TF, 660 compute nodes, Intel Broadwell
    • 1.2 TF per node
    • 2 x Intel Xeon E5-2695 CPUs (36 cores) per node
    • 128 GB memory per node
    • 1 x Intel Omni-Path IB (100Gb/s)
    • 15.2 PB storage
Rensselaer Polytechnic Institute
The Rensselaer Polytechnic Institute (RPI) Center for Computational Innovations is solving problems for next-generation research through the use of massively parallel computation and data analytics. The center supports researchers, faculty, and students a diverse spectrum of disciplines. RPI is making its Artificial Intelligence Multiprocessing Optimized System (AiMOS) system available to the COVID-19 HPC Consortium. AiMOS is an 8-petaflop IBM Power9/Volta supercomputer configured to enable users to explore new AI applications.

 

RPI AiMOS | 11.1 PF, 252 nodes POWER9/Volta

2 x IBM POWER9 CPU per node, 20 cores per CPU
6 x NVIDIA Tesla GV100 per node
32 GB HBM per GPU
512 GB DRAM per node
1.6 TB NVMe per node
Mellanox EDR InfiniBand
11 PB IBM Spectrum Scale storage

MIT/Massachusetts Green HPC Center (MGHPCC)
MIT is contributing two HPC systems to the COVID-19 HPC Consortium. The MIT Supercloud, a 7-petaflops Intel x86/NVIDIA Volta HPC cluster, is designed to support research projects that require significant compute, memory or big data resources. Satori, is a 2-petaflops scalable AI-oriented hardware resource for research computing at MIT composed of 64 IBM Power9/Volta nodes. The MIT resources are installed at the Massachusetts Green HPC Center (MGHPCC), which operates as a joint venture between Boston University, Harvard University, MIT, Northeastern University, and the University of Massachusetts.

 

MIT/MGHPCC Supercloud | 6.9 PF, 440 nodes Intel Xeon/Volta

2 x Intel Xeon (18 CPU cores per node)
2 x NVIDIA V100 GPUs pe node
32 GB HBM per GPU
Mellanox EDR InfiniBand
3 PB scratch storage

MIT/MGHPCC Satori | 2.0 PF, 64 nodes IBM POWER9/NVIDIA Volta

2 x POWER9 , 40  cores per node
4 x NVIDIA Volta GPUs per node (256 total)
32 GB HBM per GPU
1.6 TB NVMe per node
Mellanox EDR InfiniBand
2 PB scratch storage

IBM Research WSC

The IBM Research WSC cluster consists of 56 compute nodes, each with dual socket 22 core CPU and 6 GPUs, plus seven additional nodes dedicated to management functions. The cluster is intended to be used for the following purposes: client collaboration, advanced research for government-funded projects, advanced research on Converged Cognitive Systems, and advanced research on Deep Learning.

IBM Research WSC | 2.8 PF, 54 nodes IBM POWER9/NVIDIA Volta

  • 54 IBM POWER9 nodes

  • 2 x POWER9 CPU per node, 22 cores per CPU

  • 6 x NVIDIA V100 GPUs per node (336 total)

  • 512 GB DRAM per node

  • 1.4 TB NVMe per node

  • 2 x Mellanox EDR InfiniBand per node

  • 2 PB IBM Spectrum Scale distributed storage

  • RHEL 7.6

  • CUDA 10.1
  • 
IBM PowerAI 1.6
U.S. National Science Foundation (NSF)

The NSF Office of Advanced Cyberinfrastructure supports and coordinates the development, acquisition, and provision of state-of-the-art cyberinfrastructure resources, tools and services essential to the advancement and transformation of science and engineering. By fostering a vibrant ecosystem of technologies and a skilled workforce of developers, researchers, staff and users, OAC serves the growing community of scientists and engineers, across all disciplines. The most capable resources supported by NSF OAC are being made available to support the COVID-19 HPC Consortium.

Frontera | 23.5 PF, 8008 nodes, Intel Xeon

Operated by the Texas Advanced Computing Center (TACC), Frontera provides a balanced set of capabilities that supports both capability and capacity simulation, data-intensive science, visualization, and data analysis, as well as emerging applications in AI and deep learning. Frontera has two computing subsystems, a primary computing system focused on double precision performance, and a second subsystem focused on single-precision streaming-memory computing. 

Comet | 2.76 PF, total 2020 nodes, Intel Xeon, NVIDIA Pascal GPU

Operated by the San Diego Supercomputer Center (SDSC), Comet is a nearly 3-petaflop cluster designed by Dell and SDSC. It features Intel next-generation processors with AVX2, Mellanox FDR InfiniBand interconnects, and Aeon storage. 

Stampede2 | 18 PF, 4200 Intel KNL, 1,736 Intel Xeon

Operated by TACC, Stampede 2 is a nearly 20-petaflop HPC national resource accessible to  thousands of researchers across the country, including to enable new computational and data-driven scientific and engineering, research and educational discoveries and advances. 

Longhorn | 2.85 PF, 194 nodes, IBM POWER9, NVIDIA Volta

Longhorn is a TACC resource built in partnership with IBM to support GPU-accelerated workloads. The power of this system is in its multiple GPUs per node, and it is intended to support sophisticated workloads that require high GPU density and little CPU compute. Longhorn will support double-precision machine learning and deep learning workloads that can be accelerated by GPU-powered frameworks, as well as general purpose GPU calculations.

Bridges | 1.3 PF, 856 nodes, Intel Xeon, NVIDA K80, Volta GPUs, DGX-2

Operated by the Pittsburgh Supercomputing Center (PSC), Bridges and Bridges-AI provides an innovative HPC and data-analytic system, integrating advanced memory technologies to empower new modalities of artificial intelligence based computations, bring desktop convenience to HPC, connect to campuses, and express data-intensive scientific and engineering workflows.  

Jetstream | 320 nodes, Cloud accessible

Operated by a team led by the Indiana University Pervasive Technology Institute, Jetstream is a configurable large-scale computing resource that leverages both on-demand and persistent virtual machine technology to support a wide array of software environments and services through incorporating elements of commercial cloud computing resources with some of the best software in existence for solving important scientific problems.  

Open Science Grid

The Open Science Grid (OSG) is a large virtual cluster of distributed high-throughput computing (dHTC) capacity shared by numerous national labs, universities, and non-profits, with the ability to seamlessly integrate cloud resources, too. The OSG Connect service makes this large distributed system available to researchers, who can individually use up to tens of thousands of CPU cores and up to hundreds of GPUs, along with significant support from the OSG team. Ideal work includes computational tasks that can run as numerous independent tasks each needing 1-8 CPU cores, <8 GB RAM, and <10GB input or output data, though these can be exceeded significantly by integrating cloud resources and other clusters, including many those contributing to the COVID-19 HPC Consortium.  

 

NASA High-End Computing Capability

NASA Supercomputing Systems | 19.13 PF, 15800 nodes Intel x86

NASA's High-End Computing Capability (HECC) Portfolio provides world-class high-end computing, storage, and associated services to enable NASA-sponsored scientists and engineers supporting NASA programs to broadly and productively employ large-scale modeling, simulation, and analysis to achieve successful mission outcomes.

NASA's Ames Research Center in Silicon Valley hosts the agency's most powerful supercomputing facilities. To help meet the COVID-19 challenge facing the nation and the world, HECC is offering access to NASA's high-performance computing (HPC) resources for researchers requiring HPC to support their efforts to combat this virus. 

 

NASA Supercomputing Systems | 19.39 PF, 17609 nodes Intel Xeon

AITKEN | 3.69 PF, 1,152 nodes, Intel Xeon
ELECTRA | 8.32 PF, 3,456 nodes, Intel Xeon
PLEIDES | 7.09 PF, 11,207 nodes, Intel Xeon, NVIDIA K40, Volta GPUs
ENDEAVOR | 32 TF, 2 nodes, Intel Xeon
MEROPE | 253 TF, 1792 nodes, Intel Xeon

Amazon Web Services
As part of the COVID-19 HPC Consortium, AWS is offering research institutions and companies technical support and promotional credits for the use of AWS services to advance research on diagnosis, treatment, and vaccine studies to accelerate our collective understanding of the novel coronavirus (COVID-19). Researchers and scientists working on time-critical projects can use AWS to instantly access virtually unlimited infrastructure capacity, and the latest technologies in compute, storage and networking to accelerate time to results. Learn more here.
Microsoft Azure Cloud and High Performance Computing
By expanding our existing AI for Health program, Microsoft will give access to our Azure cloud and High Performance Computing capabilities.  Our team of AI for Health data science experts, whose mission is to improve the health of people and communities worldwide, is also open to collaborations with COVID-19 researchers as they tackle this critical challenge. More broadly, Microsoft's research scientists across the world, spanning computer science, biology, medicine, and public health, will be available to provide advice and collaborate per mutual interest. 

AI for Health web site: https://www.microsoft.com/en-us/ai/ai-for-health
Azure web site: https://azure.microsoft.com/en-us/
Azure HPC web site: https://azure.microsoft.com/en-us/solutions/high-performance-computing/
Hewlett Packard Enterprise
As part of this new effort to attack the novel coronavirus (COVID-19) pandemic, Hewlett Packard Enterprise is committing to providing supercomputing software and applications expertise free of charge to help researchers port, run, and optimize essential applications. Our HPE Artificial Intelligence (AI) experts are collaborating to support the COVID-19 Open Research Dataset and several other COVID-19 initiatives for which AI can drive critical breakthroughs. They will develop AI tools to mine data across thousands of scholarly articles related to COVID-19 and related coronaviruses to help the medical community develop answers to high-priority scientific questions. We encourage researchers to submit any COVID-19 related proposals to the consortium's online portal. More information can be found here: www.hpe.com/us/en/about/covid19/hpc-consortium.html.
Google

Transform research data into valuable insights and conduct large-scale analyses with the power of Google Cloud. As part of the COVID-19 HPC Consortium, Google is providing access to Google Cloud HPC resources for academic researchers.

 

.

(Revised 31 March 2020)

Key Points
Computing Resources utilized in research against COVID-19
National scientists encouraged to use computing resources
How and where to find computing resources
Contact Information