ECSS Symposium Archive

ECSS staff share technical solutions to scientific computing challenges monthly in this open forum.

January 19, 2016

Performance Enhancements to PlascomCM

Presenter(s): Lucas A. Wilson (TACC)
Principal Investigator(s): Daniel Bodony (UIUC)

Presentation Slides

PlascomCM is a Fortran90 application that is used to investigate the behavior of compressible, viscous gases, usually in the contexts of aerospace or mechanical engineering and with a focus on turbulence and generated sound. Several recent examples include predicting and controlling the noise produced by high-speed turbulent jets, such as found on commercial and military aircraft, and Mach 2.25 turbulent boundary layer grazing a flexible panel with application to multi-physics design of future hypersonic vehicles. The discretization of the governing non-linear partial differential equations uses an overset mesh and multiblock approach with locally structured meshes for which spatial derivatives are approximated with fixed-width stencil-based computations based on finite-difference-like considerations. This talk will highlight ECSS work done over the last 3 years to improve the performance of PlascomCM, with the end goal of efficiently using the Intel Xeon Phi coprocessors on Stampede. Code modifications which have improved caching and enabled vectorization will be highlighted. Further modifications which are currently being considered to improve performance on Xeon Phi will also be discussed.

Apache Airavata and XSEDE Science Gateways

Presenter(s): Suresh Marru (IU)
Principal Investigator(s): Mark Shephard (Rensselaer Polytechnic Institute) Cameron Smith (Rensselaer Polytechnic Institute)

Presentation Slides

The Symposium talk will walk through projects initially started as optimization and code porting ECSS efforts which later were extended to include gateway support. The resulting codes are made available to community at large through these gateway interfaces. Examples will include PI: Prof. Arne Pearlstein's flow-induced vibration simulation gateway and PI's Mark Shephard and Cameron Smith's PHASTA Gateway. The talk will also discuss the use of a multi-tenanted science gateway framework based on Apache Airavata as a starting point and to achieve short term operational sustainability through externally funded NSF projects. Lastly, we will discuss the reuse of ECSS contributed extensions across projects.

February 16, 2016

fMRI image registration with AFNI's 3dQwarp

Presenter(s): Junqi Yin (NICS)
Principal Investigator(s): Frank Skidmore (University of Alabama Birmingham)

The Analysis of Functional Neuroimaging (AFNI) software package is widely used in the community for the brain MR image analysis. For many types of analysis workflows, one important step is to register a subject's image to a pre-defined template so different subjects can be compared within a normalized coordination system. This is specially challenging if the subject has brain atrophy due to some kinds of neurological condition such as Parkinson's disease. The 3dQwarp code in AFNI is a non-linear image registration procedure that overcomes the drawbacks of a linear affine transformation. However, the existing OpenMP instrumentation in 3dQwarp is not efficient for small-patch optimization, and the lack of convergence criteria of the iterative algorithm also hurts the accuracy. Based on the profiling and benchmark, we have been working on the optimization of its OpenMP structure and the improvement of warped image fidelity, which can be used for voxel-to-voxel type of downstream analysis.

ECSS-er Junqi Yin (NICS) will be sharing observations from his work with PI Frank Skidmore (U Alabamba Birmingham) on Blacklight and Greenfield to optimize a widely used neuroimaging package called Analysis of Funtional Neuroimaging (AFNI).

Boosting molecular dynamics with advanced hardware and algorithms

Presenter(s): Lei Huang (TACC)
Principal Investigator(s): Dr. Doraiswami Ramkrishna (Purdue)

Presentation Slides

There are several open-sourced packages available for general purpose molecular dynamics (MD) simulations. However, researchers still need to develop their own MD engines under special circumstance. Dr. Doraiswami Ramkrishna's group at Purdue developed a package for umbrella sampling and molecular dynamics for polymorph prediction. By leveraging the power of Intel Xeon Phi and adopting several advanced algorithms in molecular dynamics, we achieved ~9x speedup and got a performance superior to LAMMPS.

Lei Huang (TACC) will tell us about his work with PI Doraiswami Ramkrishna (Purdue) to port several advanced algorithms in molecular dynamics to the Xeon Phi on Stampede with a factor of 9 speedup.

March 15, 2016

XDMoD(XD Metrics Service)

Presenter(s): Thomas Furlani (University at Buffalo, SUNY)

Presentation Slides

The University at Buffalo's XDMoD tool provides for the comprehensive management of HPC systems, including the ability to provide performance data for all jobs running on a cluster. Using the XDMoD Job Viewer, system support personnel can readily identify poorly performing jobs - with the end goal of working the the end user to improve performance.

In this presentation, we will begin with a brief PowerPoint presentation on XDMoD with an emphasis on the Job Viewer tab. This will be followed by a live demo that utilizes the Job Viewer within XDMoD to analyze various XSEDE jobs. The demo will be interactive - allowing ECSS staff to help guide the demo.

Link to the recorded presentation.
May require download of proprietary software to view video

The XDMoD team is interested in collecting feedback on usability. David LaVergne is conducting one-on-one interviews with users regarding their use of the current interface (mainly the Usage and Metrics Explorer tabs, as well as the new Job Viewer tab). As user support folks, I'd like to know generally what information you find the most (and least) useful, and how well the current interface (and a proposed redesign) supports that. Please contact him if interested, a small gift is involved!

April 19, 2016

How to Tune and Extract Higher Performance with MVAPICH2 Libraries

Presenter(s): Dhabaleswar K.(DK) Panda (Ohio State)

Presentation Slides

The Ohio State University MVAPICH2 libraries support the latest MPI 3.1 standard and deliver high performance, scalability and fault tolerance for high-end computing systems using InfiniBand, Omni-Path, 10-40 GigE/iWARP and RoCE (V1 and V2) networking technologies. MVAPICH2-GDR library uses novel designs to exploit the cutting-edge GPUDirect technology to provide high performance for MPI applications on systems with NVIDIA-GPUs. These libraries have multiple features, parameters and knobs to optimize the performance on modern systems. However, many users are not fully-aware of all these features, optimization and tuning techniques. This talk is aimed to address these concerns and provide a set of concrete guidelines to XSEDE users to boost performance of their applications. We will start with an overview of the MVAPICH2 libraries and their features and optimized designs. Next, we will provide an in-depth overview of the runtime optimizations and tuning flexibility. We will demonstrate how you can tune and optimize these libraries to fit the needs of your application on a given system. Using a set of `Best Practice' examples, we will highlight the impact of tuning and optimizations on a set of common XSEDE applications including Amber, Lulesh, Hoomdblue, and MILC.

Bio --------- DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 350 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, iWARP and RoCE) libraries, designed and developed by his research group (, are currently being used by more than 2,550 organizations worldwide (in 79 countries). More than 360,000 downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 10th, 13th and 25th ranked ones) in the TOP500 list. The RDMA packages for Apache Spark, Apache Hadoop and Memcached together with OSU HiBD benchmarks from his group ( are also publicly available. These libraries are currently being used by more than 160 organizations in 22 countries. More than 15,900 downloads of these libraries have taken place. He is an IEEE Fellow. More details about Prof. Panda are available at

May 17, 2016

Turning on Performance in LAMMPS Molecular Dynamics

Presenter(s): Kent Milfeld (TACC)
Principal Investigator(s): Peter Koenig (Procter & Gamble)

Presentation Slides

LAMMPS is a large-package atomistic and molecular dynamics simulator. Through the Industrial Challenge Program TACC supported Peter Koenig (PI) in using LAMMPS on the Stampede system. The object of the program was to create atomistic and particle simulations that could be used to determine micellar properties to confirm and replace experiments that develop rheology models for mixing, filling, and product performance predictions. The presentation will focus on the support work: LAMMPS optimizations, which included a few code changes, a description on adding new Classes (styles) for modifying interactions, and other efforts supporting efficient use of the Stampede system

Curation en masse: Exploration of the Quality of Video Collections

Presenter(s): Anne Bowen (TACC)
Principal Investigator(s): Alan Bovik (UT)

Presentation Slides

TACC provided support for Alan Bovik (Laboratory for Image and Video Engineering at UT Austin) to assess the use of automatic quality assessment algorithms at a large scale for museum digital video collections. This project involved developing a visual analysis tool and workflow for massive video quality assessment on TACC systems using the BRISQUE algorithm. The presentation will present an overview of the Quality Assessment workflow, and specifically focus on the challenges we encountered with using BRISQUE (and non-referential quality assessment algorithms in general) on museum collections. These challenges prompted the development of the visual analysis tool to assist with interpretation of the results.

June 21, 2016

SeedMe platform: Enabling scriptable data sharing

Presenter(s): Amit Chourasia (SDSC)

Abstract Most scientific computation and analyses create important transient data and preliminary results. Quick and effective access and assessments of this data is necessary for efficient use of researchers time and computation resources, but this process is complicated when a large collaborating team is geographically dispersed and/or some team members do not have direct access to the computation resource and output data. Current methods for sharing and assessing transient data and preliminary results are cumbersome, labor intensive, and largely unsupported by useful tools and procedures. Each research team is forced to create their own ad hoc procedures to push results from system to system, and user to user, to guide the next step in their research.

In this talk we introduce the SeedMe platform which provides a web-based cyberinfrastructure to enable easy sharing and streaming of transient data and preliminary results directly from computing resources to a variety of platforms, from mobile devices to workstations. The SeedMe platform is open to all researchers and provides web browser based as well scriptable tools for easy integration with ad hoc computation workflows. The talk will also briefly discuss applications and uses cases that may be relevant for ECSS and Science Gateway projects.

Biography Amit Chourasia is a Sr. Visualization Scientist at the San Diego Supercomputer Center (SDSC), UC San Diego. He leads the Visualization group where his work is focused on leading the research, development and application of software tools and techniques for visualization. Key area of his work is to develop methods to represent data in a visual form that is clear, succinct and accurate (a challenging yet very exciting endeavor). Data sharing is also at a forefront of his interests, to this end he is developed a web based infrastructure to enable this important and at times critical gap in scientific process via the SeedMe project.

August 16, 2016

Re-presenting Large Image Collections for Data Mining and Analysis

Presenter(s): Paul Rodriguez (SDSC)
Principal Investigator(s): Elizabeth Wuerffel (Valparaiso University) Alison Langmead (University of Pittsburgh)

Presentation Slides

I will discuss two NIP/ECSS projects that both involve primarily image analysis in the context of digital humanities. (Image Analysis of Rural Photography, PI Wuerffel; Decomposing Bodies, [aka Image Analysis of Bertillon Prison Cards], PI Langmead). Both of them are superficially about taking old B&W photograph collections and 'digitizing' them. In a general sense, the goal is to re-represent the data so that the digital humanist can perform particular socio/historical/artistic/cultural analysees. In a more practical sense, the goal is to extract feature from the images and metadata and provide infrastructure support for analysis. Part of our challenge is to line up these two goals.

I will also discuss the technical and programmatic aspects, mostly for my own pieces of the projects. Although the processes, project logistics, and infrastructure are very similar between projects, the actual feature extraction code and data products have little overlap - which is due to the nature of the image data themselves. Feature extraction for both projects primarily involve an assembly of techniques/tools that are in open source packages, where the trickier aspects require coming up with good strategies for applying techniques, evaluating how well they work on this data, and exploring possible methods that might be useful to the user.

Experiences running Dynamic Traffic Assignment Simulations at scale using HPC Infrastructure

Presenter(s): Amit Gupta (TACC)
Principal Investigator(s): Natalia Ruiz Juri (UT)

Presentation Slides

Dynamic Traffic Assignment (DTA) simulations form an important analysis tool for Transportation researchers in attempting to model complex interactions between travelers and transportation infrastructure. These simulation frameworks are complex to develop, maintain and extend. VISTA is a widely used Transportation Simulation framework providing Dynamic Traffic Assignment. I discuss our experiences in scaling VISTA on the Stampede system as an exemplar of how HPC infrastructure and tools can augment analysis workflows in transportation research by significantly speeding up simulation experiments. I also discuss some challenges and tradeoffs in enabling DTA frameworks for use in HPC environments and also directions for continuing/future work under ECSS support.

September 20, 2016

Integrating Scientific Tools and Web Portals

Presenter(s): Kevin (Feng) Chen (TACC)
Principal Investigator(s): Carol X. Song (Purdue) Ritu Arora (TACC)

Presentation Slides

Abstract: Diagrid is powered by the HUBzero® software developed at Purdue University. It is specifically designed to help a scientific community share resources and work together with one another. The Diagrid Science as a Service platform allows for easy web-based access to software applications used by thousands of researchers around the world. In today's ECSS symposium, Dr. Kevin Chen will discuss the development on scientific tools leveraging Diagrid web portal and XSEDE HPC resources.

System-level Checkpoint-Restart with DMTCP

Presenter(s): Jerome Vienne (TACC)
Principal Investigator(s): Gene Cooperman (Northeastern University)

Presentation Slides

DMTCP (Distributed MultiThreaded CheckPointing) is a software package used to checkpoint-restart applications. The primary purpose of checkpointing in HPC is achieving fault tolerance. If a computation fails, whether for reasons of hardware failure or temporary software failure, then the user restarts the computation from a previous checkpoint. This presentation highlights work on ECSS project with the team that develops it. The initial purpose of the ECSS project was to provide support to extend the scalability of DMTCP but it ended to be more than that. During the presentation, I will introduce DMTCP and explain how it can be used to checkpoint-restart and debug a batch session, checkpoint OpenSHMEM implementations and large scale experiments running on InfiniBand clusters. All these points brought to different challenges that were solved during this ECSS project. This collaboration led to papers presented at XSEDE'16, OpenSHMEM 2016 and IEEE ICPADS 2016.

October 18, 2016

Towards Large-scale Genomics, Transcriptomics, and Metagenomics for All

Presenter(s): Philip Blood (PSC)
Principal Investigator(s): Noushin Ghaffari (Texas A&M) Ping Ma (U. Georgia) James Taylor (Johns Hopkins)

Presentation Slides

Although increasing numbers of researchers in genomics and related disciplines are utilizing advanced cyberinfrastructure for their work, these still represent a relatively small fraction of the biologists who could benefit from access to the latest genomics tools backed by large-scale computing resources. Rapid advances in these fields have caused an explosion of tools and algorithms that present a dizzying array of constantly changing options. Hence, even for scientists who are adept at using advanced computing infrastructure, it is challenging to determine the optimal mix of tools and employ these effectively to analyze large genomic data sets. In this talk I will highlight several XSEDE ECSS projects aimed at tackling aspects of these problems, both through formal ECSS collaborations and the "Novel and Innovative Projects" (NIP) arm of ECSS. These projects include the development of a pipeline for high-quality transcriptome analysis based on well-characterized RNA Sequencing Quality Control (SEQC) datasets, making memory-hungry sequence assembly tools available through the Galaxy XSEDE Gateway (, enabling large-scale analysis of human microbiome data, and facilitating the Critical Assessment of Metagenome Interpretation (CAMI:

Petascale DNS Using the Fast Poisson Solver PSH3D

Presenter(s): Darren Adams (NCSA)
Principal Investigator(s): Antonio Ferrante (U Wash)

Presentation Slides

Direct numerical simulation (DNS) of high Reynolds number (Re = O(105)) turbulent flows requires computational meshes of O(1012) grid points. Thus, DNS requires the use of petascale supercomputers. DNS often requires the solution of a Helmholtz (or Poisson) equation for pressure, which constitutes the bottleneck of the solver. We have developed and implemented a parallel solver of the Helmholtz equation in 3D called petascale Helmholtz 3D (PSH3D). The numerical method underlying PSH3D combines a parallel 2D Fast Fourier transform (P2DFFT) and a parallel linear solver (PLS). Our numerical results show that PSH3D scales up to at least 262,144 cores. PSH3D has a peak performance 6× faster than 3D FFT-based methods (e.g., P3DFFT) when used with the partial-global optimization. We have verified that the use of PSH3D with the partial-global optimization in our DNS solver does not reduce the accuracy of the numerical solution when tested for the Taylor-Green vortex flow.