ECSS Symposium Archive

ECSS staff share technical solutions to scientific computing challenges monthly in this open forum.

August 18, 2015


Presenter(s): Shiquan Su (NICS)
Principal Investigator(s): Robert Sean Norman (University of South Carolina) Atsuko Tanaka (University of Wisconsin-Madison) Chao Fu (University of Wisconsin-Madison)

Presentation Slides

In the first project, the PI from University of South Carolina developed a bioinformatic pipeline for analyzing millions of DNA and cDNA sequences. The major computational workload comes from querying a large database by the BLAST tools. Shiquan will present how he helped the PI to reorganize the database file into multiple sub-databases (more than 50) and implemented the advanced host selection feature on Stampede batch system in the PI's job script. The improved workflow shortens the turn around time of the PI's job up to 80%.

In the second project, the researcher Dr. Atsuko Tanaka, from the University of Wisconsin-Madison, studies the lifetime utility: she simulates the clients' behavior and match the simulated outcomes and the observed data with respect to wage profile and asset accumulation over life cycle. This is an ongoing project. Dr. Atsuko Tanaka is actively developing the home-grown codes, which has the potential to be the starting point of a community code. Shiquan works closely with Dr. Tanaka to optimize her serial version of codes to efficiently utilize the powerful resources on Stampede. In this talk, Shiquan discusses the multiple parallelization treatments implemented in Dr. Tanaka's code. Shiquan provided a module to unfold the deep nested loop structure (more than 15 layers) in the main program with MPI. Also per the specific request from Dr. Tanaka, Shiquan applied the new feature in OpenMP 3.0+ to collapse multiple loop spaces in the core subroutine to explore the parallelism within the Stampede node.

Large-shared-memory supercomputing for game-theoretic analysis with fine-grained abstractions, and novel tree search algorithms.

Presenter(s): John Urbanic (PSC)
Principal Investigator(s): Tuomas Sandholdm (Carnegie Mellon)

Presentation Slides

John Urbanic (PSC) will discuss the optimization of the poker bot that recently competed in the first "Brain vs. AI" no-limit Texas Hold'em tournament, the first time that a poker program has competed against the top pros. John's work was in optimizing the Tuomas Sandholm group's algorithm for Blacklight, the world's largest shared memory platform, at PSC. John will discuss the project in general, the specifics optimizations that were used to make the poker bot competitive, and of course the results – which will shortly be televised.

September 15, 2015

Asteroseismic Modeling Portal

Presenter(s): Haiying Xu (NCAR)
Principal Investigator(s): Travis Metcalfe (Space Science Institute)

Presentation Slides

The Asteroseismic Modeling Portal is a community facility that allows astronomers to derive the fundamental properties of sun-like stars from observations of their natural vibrations. The underlying science code uses a parallel genetic algorithm to match the observations with standard theoretical models of stars. In the first five years of the project, AMP was applied to more than 100 stars observed by NASA's Kepler mission, yielding a uniform set of stellar properties that have been used to study the structure and evolution of stars and their planetary systems. By using the AMP gateway, more than 100 users around world can submit jobs, retrieve results and even analyze the performance of source codes very easily. And during 8 year running, AMP has submitted 30424 jobs and spent 18,795,892 SUs. XSEDE/ECSS objectives include updating OS and related software of the servers, and optimizing parallel performance of AMP 2.0 science code on by TACC staff.

October 20, 2015

SoyKB pipeline on XSEDE - an overview

Presenter(s): Mats Rynge (USC/ISI)
Principal Investigator(s): Dong Xu (University of Missouri, Columbia)

Presentation Slides

The Soybean Knowledge Base project ( is conducting resequencing of more than 1000+ soybean germplasm lines using Illumina paired end sequencing for multiple projects, selected for major traits including oil, protein, soybean cyst nematode resistance (SCN), abiotic stress resistance (drought, heat and salt) and root system architecture. In this talk we discuss how SoyKB uses XSEDE for the sequencing pipeline and how ECSS helped create the Pegasus workflow for the pipeline. We will also discuss our current effort of transitioning from TACC Stampede to TACC Wrangler.

December 15, 2015

Bridges: Connecting Researchers, Data, and HPC

Presenter(s): Nick Nystrom (PSC)
Principal Investigator(s): Nick Nystrom (PSC)

Presentation Slides

Bridges is a new kind of supercomputer being built at the Pittsburgh Supercomputing Center (PSC) to empower new research communities, bring desktop convenience to supercomputing, expand campus access, and help researchers facing challenges in Big Data to work more intuitively. Funded by a $9.65M NSF award, Bridges consists of tiered, large-shared-memory resources with nodes having 12TB, 3TB, and 128GB each, dedicated nodes for database, web, and data transfer, high-performance shared and distributed data storage, the Spark/Hadoop ecosystem, and powerful new CPUs and GPUs. Bridges is the first production deployments of Intel's new Omni-Path Architecture (OPA) Fabric, which will interconnect its nodes and storage. Bridges emphasizes usability, flexibility, and interactivity. Widely-used languages and frameworks such as Java, Python, R, MATLAB, Hadoop, and Spark benefit transparently from large memory and the high-performance OPA fabric. Virtualization enable hosting web services, NoSQL databases, and application-specific environments and enhances reproducibility. Bridges, allocated through XSEDE, is available at no charge to the open research community. Bridges is also available to industry through PSC's corporate programs.

Design of Experiments and Big Data Analytics for Energy Efficient Buildings

Presenter(s): Pragnesh Patel (NICS)
Principal Investigator(s): Joshua New (ORNL)

Presentation Slides

A central challenge in the domain of energy efficiency is being able to realistically model a specific class of building and scaling those classes up to the entire United States building stock across ASHRAE climate zones, then projecting how specific retrofits or retrofit packages would maximize return-on-investment for subsidies through federal, state, local, and utility tax incentives, rebates, and loan programs. Nearly all projections regarding energy savings, for any of the plethora of technologies required to address the need for US energy security, are reliant upon accurate models as the central primitive by which to integrate the national impact with meaningful measures of uncertainty, error, variance, and risk. This challenge is compounded by the fact that buildings, unlike cars or planes, are manufactured in the field at the time of construction based on one-off designs with a median lifespan of 73 years. Due to variance of building materials, construction, and equipment (and the necessary flux of these over time), a given building is unlikely to closely resemble the prototypical building class. Therefore, each building needs to be modeled individually and precisely to achieve optimal retrofit and construction practices. We have developed design of experiement for calibrating building energy models, which minimize the number of simulations required while maximizing the statistical resolution of analysis results. Initial statistical analysis of parametric ensembles using techniques such as multiple analysis of variance (MANOVA) and a software infrastructure tying together several machine learning packages (MLSuite) have recently pushed the cutting edge of building energy analysis from about 10 inputs and 12-24 outputs to156 inputs and 96 outputs. The science-enabling software infrastructure has been improved as part of this project include improving R code for design of experiments along with R analysis code while quickly instantiating R on every parallel node/core, integration of Energyplus code for large-scale simulation runs with OpenDIEL workflow system along with pre and post processing data analysis codes.

January 20, 2015

Genomics Calculations as an Outreach Strategy

Presenter(s): Hugh Nicholas (PSC) Alexander Ropelewski (PSC)

Presentation Slides

Genomics calculations are a large part of the calculations carried out at the XSEDE centers and they have been an effective tool for attracting and teaching many minority scientists in how to effectively use XSEDE resources to carry out complete genome sequences, transcriptomes and studies on the analysis of individual genes. We have delivered a series of two-week workshops and longer internships that have been useful in training minority scientists and students in carrying out this range of calculations on a variety of different biological systems in the health and agricultural fields. We have maintained a very successful outreach program based on this type of calculation and it will be described and some individual accomplishments of scientists and students highlighted.

February 17, 2015

Migration to Phis and Enhancing Vectorization

Presenter(s): Jim Browne (University of Texas)

Presentation Slides

This talk will cover two topics: selection of execution modes for use of heterogeneous compute nodes which include Intel Xeon Phis and enhancing vectorization in application codes. Each lecture will be about 20 minutes. The lecture on selection of execution modes for heterogeneous compute nodes describes a systematic process for choosing among CPU only, Phi only, symmetric node and offload execution modes. The lecture is based on the Quick Start Guide developed by the Stampede Technology Insertion project. The Quick Start Guide is available from the TACC web site. Enhancement of vectorization is accomplished by combining compiler static analysis with runtime measurements of execution behavior to generate recommendations for adding directives, pragmas and source code changes to increase vectorization. The average gain in performance obtained in the case studies was in excess of 40% for execution on the Xeon Phis. The process for enhanced vectorization is supported by a tool called MACVEC which will soon be available for general use on Stampede.

March 6, 2015

PRACE-XSEDE Interoperability projects

Presenter(s): Morris Riedel (Juelich Supercomputing Centre) Sandra Gesing (Notre Dame University) Shantenu Jha (Rutgers University)

Presentation Slides Gesing Presentation

Presentation Slides Jha Presentation

1) "Smart Data Analytics for Earth Sciences across XSEDE and PRACE", speaker Morris Riedel, Juelich Supercomputing Centre.

The ever-increasing amount of earth science data arising from measurements or computational simulations requires new ‘smart data analytics techniques' capable of extracting meaningful findings from ‘pure big data'. XSEDE as well as PRACE provides excellent resources that enable efficient and effective data analytics when several technical frameworks and data analysis packages would be available. While we assessed tools and technologies for a couple of earth science case studies, the scientific case in this particular Webinar is driven by one particular earth science analytics use case: automated anomaly/outlier detection of earth science time series datasets that require a parallel and scalable clustering detection algorithm that is able to take advantage of the interoperability between XSEDE and PRACE systems today. As one of the key results of our technology assessment project, we present our parallel and scalable Density-based Spatial Clustering of Applications with Noise (DBSCAN) algorithm implementation and how it can be used across different infrastructures using open standards to decouple architecture from concrete implementations. Solutions will be outlined that can be used today in production if associated resource allocations in XSEDE and/or PRACE are granted for the user. More details will be presented at the Research Data Alliance (RDA) 5th Plenary Big Data Analytics Group Session at San Diego in March 2015 and at the European Geosciences Union (EGU) 2015 Big Data for Earth Science session in Vienna in April 2015.

2) "Unicore Use Case Integration for XSEDE and PRACE", speaker Sandra Gesing, U of Notre Dame

European Team: Molecular Simulation Community represented by MoSGrid, SCI-BUS and ER-flow and the computational radiation physics community represented by the Helmholtz-Zentrum Dresden-Rossendorf (HZDR), US: Center for Research Computing, University of Notre Dame

The project focuses on the integration of two UNICORE use cases for the joint support of XSEDE and PRACE: the first one targets the molecular simulation community, the second one the computational radiation physics community. The project MoSGrid (Molecular Simulation Grid) offers a web-based science gateway supporting the community with various services for quantum chemistry, molecular modeling, and docking applying UNICORE as grid middleware.

The main technical challenge in MoSGrid has been to extend the portal infrastructure for the intuitive use of the XSEDE and PRACE infrastructure via UNICORE and according credentials. The members of the computational radiation physics community involved in this project focuses on the generation of advanced laser-driven sources of particle and X-ray beams. They aim to "simulate what is measured" to reproduce experimental measurements and connecting them to the fundamental plasma processes on the single-particle scale. The main goal wasto make the according tools on both XSEDE and PRACE systems via UNICORE available to allow for the exchange of common workflows, which can be applied on both infrastructures. The talk will go into detail about the goals, the lessons learned and the accomplished steps.

3) "Interoperable High Throughput Binding Affinity Calculator for Personalised Medicine", speaker Shantenu Jha, Rutgers Team European: Prof Peter V. Coveney (University College London) and Prof Dieter Kranzlmuller (LRZ/LMU), US: Prof. Shantenu Jha (RADICAL, Rutgers)

To improve the ability calculate drug-binding affinities the CCS at UCL has developed the Binding Affinity Calculator (BAC), which allows the building of the necessary patient specific models required to simulate drug performance, a process which requires a complex number of steps in order to customise a generic model with patient specific information, and then run the calculations As part of this XSEDE-PRACE project, BAC has been interfaced with RADICAL-Cybertools to interoperably utilize XSEDE and PRACE resources. We will present some preliminary results

Meeting number: 849 368 891
Meeting password: PRACE-XSEDE

March 17, 2015

Gateway Building for the Non-Linear Adjoint Coefficient Estimation (NLACE) project

Presenter(s): Lan Zhao (Purdue) Chris Thompson (Purdue)
Principal Investigator(s): Paul Barbone (Boston University)

Presentation Slides

Presenters will discuss work providing a solution for the NLACE (Non-Linear Adjoint Coefficient Estimation) research group to making biomechanical imaging analysis model available to the community using XSEDE resources. The research has a wide variety of medical applications including brain scanning, bone structure analysis, and cancer detection. The Barbone group created and maintains the NLACE model and needed help with science gateway development. They have an allocation on Gordon, and the ECSS team was able to help them get their model installed there and quickly create an application for utilizing it on DiaGrid, a HubZero-based gateway for hosting scientific applications

Real-Time Next Generation Sequencing (NGS) in the Classroom using Galaxy

Presenter(s): Josephine Palencia (PSC) Alex Ropelewski (PSC)

Presentation Slides

We present an interesting real-user case scenario supporting 30 Carnegie Mellon University (CMU) Bioinformatics students from three classes performing real-time next generation sequencing (NGS). We describe the system setup, the scaling preparations, the tools and the full workflow, the data and reference files and the lessons learned from the classroom experience.

April 21, 2015

reproducibility@XSEDE: Reporting Back to our Colleagues

Presenter(s): Doug James (TACC) Carlos Rosales (TACC) Nancy Wilkins-Diehr (SDSC)

Presentation Slides

The reproducibility@XSEDE workshop ( was a full-day event held in conjunction with XSEDE14. The workshop featured an interactive, open ended, discussion-oriented agenda focused on reproducibility in large-scale computational science. This presentation includes (1) independent reactions to the event by three of the workshop principals; and (2) an open discussion on the topic of reproducibility in general.

May 19, 2015

ECSS experience with non-traditional HPC users

Presenter(s): Junqi Yin (NICS)
Principal Investigator(s): Annette Engel (U. Tenn) Yong Zeng (UMKC)

Mothur is an open source bioinformatics pipeline used for biological sequence analysis that has gained increasing attention in the microbial ecology community. Because a large set of functionalities in Mothur are memory bound, it is well suited for shared memory architectures. I will discuss performance results for several commands in Mothur that are popular in the operational taxonomic unit analysis, and show that pipeline processes can be accelerated by orders of magnitude faster.

Real-time Bayesian estimation for financial ultra-high frequency data is plagued with the curse of high dimensionality. Methods have been developed to manage this problem through the use of MPI. By porting to CUDA, I'll show that an adequately equipped GPU workstation can rise to the task, producing reasonably real-time results with actual data from financial markets.

P3DFFT: a scalable open-source solution for Fourier Transforms and other algorithms in three dimensions

Presenter(s): Dmitry Pekurovsky (SDSC)

P3DFFT is an open-source package developed at SDSC. It implements three-dimensional Fourier Transforms and other algorithms, in a highly scalable and efficient way. P3DFFT achieves good scaling on hundreds of thousands of compute cores. It has received much interest and use from scientists in diverse fields such as DNS turbulence simulations, astrophysics, oceanography and material science. Recently it has been the subject of an internal ECSS project, aimed at making it XSEDE community software. It has been ported, tested and documented on the largest computational systems at XSEDE. Additional features have been added to help widen the impact in the community. In this presentation I will go over the main features of P3DFFT, including the recently added, and review how users of XSEDE can access it on XSEDE platforms.

June 16, 2015

A Short Story of Efficiently Using Two Open-Source Applications on Stampede

Presenter(s): Ritu Arora (TACC)

Presentation Slides

This presentation will cover a summary of two challenges and solutions related to running the DROID (Digital Record Object Identification) and the FLASH astrophysics code on a large number of nodes on Stampede.
DROID is a software tool developed by The National Archives to perform automated batch identification of file formats. It is written in Java and works well when only one copy of it is run on a node. PI Jesscia Trelogan from the Institute of Classical Archaeology at UT Austin has been using DROID as part of her workflow for managing a large archaeological data collection. It would take her more than 2 days to extract metadata from about 4.3 TB of data using DROID on a local server. Since the process of culling and reorganizing the data collection is iterative, the metadata extraction using DROID needs to be done often. The goal of the ECSS project with PI Trelogan was to provide support in leveraging Stampede for parts of her workflow, which includes DROID, so that the overall time-taken in conducting all the steps in the workflow is reduced. The main challenge in using DROID on Stampede was related to executing its multiple copies in parallel on different nodes in a batch mode. An overview of this challenge and its solution strategy will be discussed during this presentation.
In another project, a copy of the FLASH astrophysics code was optimized such that the code does striped I/O on the Lustre File System. This project was proposed after it was found that a user overloaded the Lustre servers (which eventually became unresponsive) while running FLASH on 7000+ cores. The problem was related to the step that involved reading a checkpoint file. An overview of the problem and its solution will be included in this talk.

Optimization of Text Processing for the WordFlare Knowledge Graph

Presenter(s): Robert Sinkovits (SDSC)
Principal Investigator(s): Michael Douma (IDEA)

Presentation Slides

The goal of the WordFlare project is to create a tablet-based app to engage K-12 and lifelong learners in exploring language and knowledge. The app is based on a massive thesaurus and features dynamic visualizations of word relationships. Approximately 9% of the content is human-curated, while the other 91% is derived using computational methods executed on XSEDE resources. In this talk, I will describe the steps taken to accelerate two key steps in the automated text processing – optimization of the Latent Dirichlet Allocation (LDA) algorithm and the development of a fast method to simultaneously search for large numbers of words in a corpus. The speedups we obtain are highly problem dependent, ranging from 1.5-2.2x for the LDA algorithm and up to 1500x for the word search when using a large reference dictionary (e.g. the 400K words found in Wiktionary).