ECSS Symposium

ECSS staff share technical solutions to scientific computing challenges monthly in this open forum.

The ECSS Symposium allows the over 70 ECSS staff members to exchange on a monthly basis information about successful techniques used to address challenging science problems. Tutorials on new technologies may be featured. Two 30-minute, technically-focused talks are presented each month and include a brief question and answer period. This series is open to everyone.

Symposium coordinates

Day and Time: Third Tuesdays @ 1 pm Eastern / 12 pm Central / 10 am Pacific
Add this event to your calendar.

Webinar (PC, Mac, Linux, iOS, Android): Launch Zoom webinar

iPhone one-tap (US Toll): +16468769923,,114343187# (or) +16699006833,,114343187#

Telephone (US Toll): Dial(for higher quality, dial a number based on your current location):

US: +1 646 876 9923 (or) +1 669 900 6833 (or) +1 408 638 0968

Meeting ID: 114 343 187

Upcoming events are also posted to the Training category of XSEDE News.

Due to the large number of attendees, only the presenters and host broadcast audio. Attendees may submit chat questions to the presenters through a moderator.

Key Points
Monthly technical exchange
ECSS community present
Open to everyone
Tutorials and talks with Q & A
Contact Information

Previous years' ECSS seminars may accessed through these links:





May 17, 2016

Turning on Performance in LAMMPS Molecular Dynamics

Presenter(s): Kent Milfeld (TACC)
Principal Investigator(s): Peter Koenig (Procter & Gamble)

Presentation Slides

LAMMPS is a large-package atomistic and molecular dynamics simulator. Through the Industrial Challenge Program TACC supported Peter Koenig (PI) in using LAMMPS on the Stampede system. The object of the program was to create atomistic and particle simulations that could be used to determine micellar properties to confirm and replace experiments that develop rheology models for mixing, filling, and product performance predictions. The presentation will focus on the support work: LAMMPS optimizations, which included a few code changes, a description on adding new Classes (styles) for modifying interactions, and other efforts supporting efficient use of the Stampede system

Curation en masse: Exploration of the Quality of Video Collections

Presenter(s): Anne Bowen (TACC)
Principal Investigator(s): Alan Bovik (UT)

Presentation Slides

TACC provided support for Alan Bovik (Laboratory for Image and Video Engineering at UT Austin) to assess the use of automatic quality assessment algorithms at a large scale for museum digital video collections. This project involved developing a visual analysis tool and workflow for massive video quality assessment on TACC systems using the BRISQUE algorithm. The presentation will present an overview of the Quality Assessment workflow, and specifically focus on the challenges we encountered with using BRISQUE (and non-referential quality assessment algorithms in general) on museum collections. These challenges prompted the development of the visual analysis tool to assist with interpretation of the results.

April 19, 2016

How to Tune and Extract Higher Performance with MVAPICH2 Libraries

Presenter(s): Dhabaleswar K.(DK) Panda (Ohio State)

Presentation Slides

The Ohio State University MVAPICH2 libraries support the latest MPI 3.1 standard and deliver high performance, scalability and fault tolerance for high-end computing systems using InfiniBand, Omni-Path, 10-40 GigE/iWARP and RoCE (V1 and V2) networking technologies. MVAPICH2-GDR library uses novel designs to exploit the cutting-edge GPUDirect technology to provide high performance for MPI applications on systems with NVIDIA-GPUs. These libraries have multiple features, parameters and knobs to optimize the performance on modern systems. However, many users are not fully-aware of all these features, optimization and tuning techniques. This talk is aimed to address these concerns and provide a set of concrete guidelines to XSEDE users to boost performance of their applications. We will start with an overview of the MVAPICH2 libraries and their features and optimized designs. Next, we will provide an in-depth overview of the runtime optimizations and tuning flexibility. We will demonstrate how you can tune and optimize these libraries to fit the needs of your application on a given system. Using a set of `Best Practice' examples, we will highlight the impact of tuning and optimizations on a set of common XSEDE applications including Amber, Lulesh, Hoomdblue, and MILC.

Bio --------- DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 350 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, iWARP and RoCE) libraries, designed and developed by his research group (, are currently being used by more than 2,550 organizations worldwide (in 79 countries). More than 360,000 downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 10th, 13th and 25th ranked ones) in the TOP500 list. The RDMA packages for Apache Spark, Apache Hadoop and Memcached together with OSU HiBD benchmarks from his group ( are also publicly available. These libraries are currently being used by more than 160 organizations in 22 countries. More than 15,900 downloads of these libraries have taken place. He is an IEEE Fellow. More details about Prof. Panda are available at

March 15, 2016

XDMoD(XD Metrics Service)

Presenter(s): Thomas Furlani (University at Buffalo, SUNY)

Presentation Slides

The University at Buffalo's XDMoD tool provides for the comprehensive management of HPC systems, including the ability to provide performance data for all jobs running on a cluster. Using the XDMoD Job Viewer, system support personnel can readily identify poorly performing jobs - with the end goal of working the the end user to improve performance.

In this presentation, we will begin with a brief PowerPoint presentation on XDMoD with an emphasis on the Job Viewer tab. This will be followed by a live demo that utilizes the Job Viewer within XDMoD to analyze various XSEDE jobs. The demo will be interactive - allowing ECSS staff to help guide the demo.

Link to the recorded presentation.
May require download of proprietary software to view video

The XDMoD team is interested in collecting feedback on usability. David LaVergne is conducting one-on-one interviews with users regarding their use of the current interface (mainly the Usage and Metrics Explorer tabs, as well as the new Job Viewer tab). As user support folks, I'd like to know generally what information you find the most (and least) useful, and how well the current interface (and a proposed redesign) supports that. Please contact him if interested, a small gift is involved!

February 16, 2016

fMRI image registration with AFNI's 3dQwarp

Presenter(s): Junqi Yin (NICS)
Principal Investigator(s): Frank Skidmore (University of Alabama Birmingham)

The Analysis of Functional Neuroimaging (AFNI) software package is widely used in the community for the brain MR image analysis. For many types of analysis workflows, one important step is to register a subject's image to a pre-defined template so different subjects can be compared within a normalized coordination system. This is specially challenging if the subject has brain atrophy due to some kinds of neurological condition such as Parkinson's disease. The 3dQwarp code in AFNI is a non-linear image registration procedure that overcomes the drawbacks of a linear affine transformation. However, the existing OpenMP instrumentation in 3dQwarp is not efficient for small-patch optimization, and the lack of convergence criteria of the iterative algorithm also hurts the accuracy. Based on the profiling and benchmark, we have been working on the optimization of its OpenMP structure and the improvement of warped image fidelity, which can be used for voxel-to-voxel type of downstream analysis.

ECSS-er Junqi Yin (NICS) will be sharing observations from his work with PI Frank Skidmore (U Alabamba Birmingham) on Blacklight and Greenfield to optimize a widely used neuroimaging package called Analysis of Funtional Neuroimaging (AFNI).

Boosting molecular dynamics with advanced hardware and algorithms

Presenter(s): Lei Huang (TACC)
Principal Investigator(s): Dr. Doraiswami Ramkrishna (Purdue)

Presentation Slides

There are several open-sourced packages available for general purpose molecular dynamics (MD) simulations. However, researchers still need to develop their own MD engines under special circumstance. Dr. Doraiswami Ramkrishna's group at Purdue developed a package for umbrella sampling and molecular dynamics for polymorph prediction. By leveraging the power of Intel Xeon Phi and adopting several advanced algorithms in molecular dynamics, we achieved ~9x speedup and got a performance superior to LAMMPS.

Lei Huang (TACC) will tell us about his work with PI Doraiswami Ramkrishna (Purdue) to port several advanced algorithms in molecular dynamics to the Xeon Phi on Stampede with a factor of 9 speedup.

January 19, 2016

Performance Enhancements to PlascomCM

Presenter(s): Lucas A. Wilson (TACC)
Principal Investigator(s): Daniel Bodony (UIUC)

Presentation Slides

PlascomCM is a Fortran90 application that is used to investigate the behavior of compressible, viscous gases, usually in the contexts of aerospace or mechanical engineering and with a focus on turbulence and generated sound. Several recent examples include predicting and controlling the noise produced by high-speed turbulent jets, such as found on commercial and military aircraft, and Mach 2.25 turbulent boundary layer grazing a flexible panel with application to multi-physics design of future hypersonic vehicles. The discretization of the governing non-linear partial differential equations uses an overset mesh and multiblock approach with locally structured meshes for which spatial derivatives are approximated with fixed-width stencil-based computations based on finite-difference-like considerations. This talk will highlight ECSS work done over the last 3 years to improve the performance of PlascomCM, with the end goal of efficiently using the Intel Xeon Phi coprocessors on Stampede. Code modifications which have improved caching and enabled vectorization will be highlighted. Further modifications which are currently being considered to improve performance on Xeon Phi will also be discussed.

Apache Airavata and XSEDE Science Gateways

Presenter(s): Suresh Marru (IU)
Principal Investigator(s): Mark Shephard (Rensselaer Polytechnic Institute) Cameron Smith (Rensselaer Polytechnic Institute)

Presentation Slides

The Symposium talk will walk through projects initially started as optimization and code porting ECSS efforts which later were extended to include gateway support. The resulting codes are made available to community at large through these gateway interfaces. Examples will include PI: Prof. Arne Pearlstein's flow-induced vibration simulation gateway and PI's Mark Shephard and Cameron Smith's PHASTA Gateway. The talk will also discuss the use of a multi-tenanted science gateway framework based on Apache Airavata as a starting point and to achieve short term operational sustainability through externally funded NSF projects. Lastly, we will discuss the reuse of ECSS contributed extensions across projects.

December 15, 2015

Bridges: Connecting Researchers, Data, and HPC

Presenter(s): Nick Nystrom (PSC)
Principal Investigator(s): Nick Nystrom (PSC)

Presentation Slides

Bridges is a new kind of supercomputer being built at the Pittsburgh Supercomputing Center (PSC) to empower new research communities, bring desktop convenience to supercomputing, expand campus access, and help researchers facing challenges in Big Data to work more intuitively. Funded by a $9.65M NSF award, Bridges consists of tiered, large-shared-memory resources with nodes having 12TB, 3TB, and 128GB each, dedicated nodes for database, web, and data transfer, high-performance shared and distributed data storage, the Spark/Hadoop ecosystem, and powerful new CPUs and GPUs. Bridges is the first production deployments of Intel's new Omni-Path Architecture (OPA) Fabric, which will interconnect its nodes and storage. Bridges emphasizes usability, flexibility, and interactivity. Widely-used languages and frameworks such as Java, Python, R, MATLAB, Hadoop, and Spark benefit transparently from large memory and the high-performance OPA fabric. Virtualization enable hosting web services, NoSQL databases, and application-specific environments and enhances reproducibility. Bridges, allocated through XSEDE, is available at no charge to the open research community. Bridges is also available to industry through PSC's corporate programs.

Design of Experiments and Big Data Analytics for Energy Efficient Buildings

Presenter(s): Pragnesh Patel (NICS)
Principal Investigator(s): Joshua New (ORNL)

Presentation Slides

A central challenge in the domain of energy efficiency is being able to realistically model a specific class of building and scaling those classes up to the entire United States building stock across ASHRAE climate zones, then projecting how specific retrofits or retrofit packages would maximize return-on-investment for subsidies through federal, state, local, and utility tax incentives, rebates, and loan programs. Nearly all projections regarding energy savings, for any of the plethora of technologies required to address the need for US energy security, are reliant upon accurate models as the central primitive by which to integrate the national impact with meaningful measures of uncertainty, error, variance, and risk. This challenge is compounded by the fact that buildings, unlike cars or planes, are manufactured in the field at the time of construction based on one-off designs with a median lifespan of 73 years. Due to variance of building materials, construction, and equipment (and the necessary flux of these over time), a given building is unlikely to closely resemble the prototypical building class. Therefore, each building needs to be modeled individually and precisely to achieve optimal retrofit and construction practices. We have developed design of experiement for calibrating building energy models, which minimize the number of simulations required while maximizing the statistical resolution of analysis results. Initial statistical analysis of parametric ensembles using techniques such as multiple analysis of variance (MANOVA) and a software infrastructure tying together several machine learning packages (MLSuite) have recently pushed the cutting edge of building energy analysis from about 10 inputs and 12-24 outputs to156 inputs and 96 outputs. The science-enabling software infrastructure has been improved as part of this project include improving R code for design of experiments along with R analysis code while quickly instantiating R on every parallel node/core, integration of Energyplus code for large-scale simulation runs with OpenDIEL workflow system along with pre and post processing data analysis codes.

October 20, 2015

SoyKB pipeline on XSEDE - an overview

Presenter(s): Mats Rynge (USC/ISI)
Principal Investigator(s): Dong Xu (University of Missouri, Columbia)

Presentation Slides

The Soybean Knowledge Base project ( is conducting resequencing of more than 1000+ soybean germplasm lines using Illumina paired end sequencing for multiple projects, selected for major traits including oil, protein, soybean cyst nematode resistance (SCN), abiotic stress resistance (drought, heat and salt) and root system architecture. In this talk we discuss how SoyKB uses XSEDE for the sequencing pipeline and how ECSS helped create the Pegasus workflow for the pipeline. We will also discuss our current effort of transitioning from TACC Stampede to TACC Wrangler.

September 15, 2015

Asteroseismic Modeling Portal

Presenter(s): Haiying Xu (NCAR)
Principal Investigator(s): Travis Metcalfe (Space Science Institute)

Presentation Slides

The Asteroseismic Modeling Portal is a community facility that allows astronomers to derive the fundamental properties of sun-like stars from observations of their natural vibrations. The underlying science code uses a parallel genetic algorithm to match the observations with standard theoretical models of stars. In the first five years of the project, AMP was applied to more than 100 stars observed by NASA's Kepler mission, yielding a uniform set of stellar properties that have been used to study the structure and evolution of stars and their planetary systems. By using the AMP gateway, more than 100 users around world can submit jobs, retrieve results and even analyze the performance of source codes very easily. And during 8 year running, AMP has submitted 30424 jobs and spent 18,795,892 SUs. XSEDE/ECSS objectives include updating OS and related software of the servers, and optimizing parallel performance of AMP 2.0 science code on by TACC staff.

August 18, 2015


Presenter(s): Shiquan Su (NICS)
Principal Investigator(s): Robert Sean Norman (University of South Carolina) Atsuko Tanaka (University of Wisconsin-Madison) Chao Fu (University of Wisconsin-Madison)

Presentation Slides

In the first project, the PI from University of South Carolina developed a bioinformatic pipeline for analyzing millions of DNA and cDNA sequences. The major computational workload comes from querying a large database by the BLAST tools. Shiquan will present how he helped the PI to reorganize the database file into multiple sub-databases (more than 50) and implemented the advanced host selection feature on Stampede batch system in the PI's job script. The improved workflow shortens the turn around time of the PI's job up to 80%.

In the second project, the researcher Dr. Atsuko Tanaka, from the University of Wisconsin-Madison, studies the lifetime utility: she simulates the clients' behavior and match the simulated outcomes and the observed data with respect to wage profile and asset accumulation over life cycle. This is an ongoing project. Dr. Atsuko Tanaka is actively developing the home-grown codes, which has the potential to be the starting point of a community code. Shiquan works closely with Dr. Tanaka to optimize her serial version of codes to efficiently utilize the powerful resources on Stampede. In this talk, Shiquan discusses the multiple parallelization treatments implemented in Dr. Tanaka's code. Shiquan provided a module to unfold the deep nested loop structure (more than 15 layers) in the main program with MPI. Also per the specific request from Dr. Tanaka, Shiquan applied the new feature in OpenMP 3.0+ to collapse multiple loop spaces in the core subroutine to explore the parallelism within the Stampede node.

Large-shared-memory supercomputing for game-theoretic analysis with fine-grained abstractions, and novel tree search algorithms.

Presenter(s): John Urbanic (PSC)
Principal Investigator(s): Tuomas Sandholdm (Carnegie Mellon)

Presentation Slides

John Urbanic (PSC) will discuss the optimization of the poker bot that recently competed in the first "Brain vs. AI" no-limit Texas Hold'em tournament, the first time that a poker program has competed against the top pros. John's work was in optimizing the Tuomas Sandholm group's algorithm for Blacklight, the world's largest shared memory platform, at PSC. John will discuss the project in general, the specifics optimizations that were used to make the poker bot competitive, and of course the results – which will shortly be televised.

June 16, 2015

A Short Story of Efficiently Using Two Open-Source Applications on Stampede

Presenter(s): Ritu Arora (TACC)

Presentation Slides

This presentation will cover a summary of two challenges and solutions related to running the DROID (Digital Record Object Identification) and the FLASH astrophysics code on a large number of nodes on Stampede.
DROID is a software tool developed by The National Archives to perform automated batch identification of file formats. It is written in Java and works well when only one copy of it is run on a node. PI Jesscia Trelogan from the Institute of Classical Archaeology at UT Austin has been using DROID as part of her workflow for managing a large archaeological data collection. It would take her more than 2 days to extract metadata from about 4.3 TB of data using DROID on a local server. Since the process of culling and reorganizing the data collection is iterative, the metadata extraction using DROID needs to be done often. The goal of the ECSS project with PI Trelogan was to provide support in leveraging Stampede for parts of her workflow, which includes DROID, so that the overall time-taken in conducting all the steps in the workflow is reduced. The main challenge in using DROID on Stampede was related to executing its multiple copies in parallel on different nodes in a batch mode. An overview of this challenge and its solution strategy will be discussed during this presentation.
In another project, a copy of the FLASH astrophysics code was optimized such that the code does striped I/O on the Lustre File System. This project was proposed after it was found that a user overloaded the Lustre servers (which eventually became unresponsive) while running FLASH on 7000+ cores. The problem was related to the step that involved reading a checkpoint file. An overview of the problem and its solution will be included in this talk.

Optimization of Text Processing for the WordFlare Knowledge Graph

Presenter(s): Robert Sinkovits (SDSC)
Principal Investigator(s): Michael Douma (IDEA)

Presentation Slides

The goal of the WordFlare project is to create a tablet-based app to engage K-12 and lifelong learners in exploring language and knowledge. The app is based on a massive thesaurus and features dynamic visualizations of word relationships. Approximately 9% of the content is human-curated, while the other 91% is derived using computational methods executed on XSEDE resources. In this talk, I will describe the steps taken to accelerate two key steps in the automated text processing – optimization of the Latent Dirichlet Allocation (LDA) algorithm and the development of a fast method to simultaneously search for large numbers of words in a corpus. The speedups we obtain are highly problem dependent, ranging from 1.5-2.2x for the LDA algorithm and up to 1500x for the word search when using a large reference dictionary (e.g. the 400K words found in Wiktionary).

Showing 21 - 30 of 66 results.
Items per Page 10
of 7