ECSS staff share technical solutions to scientific computing challenges monthly in this open forum.
The ECSS Symposium allows the over 70 ECSS staff members to exchange on a monthly basis information about successful techniques used to address challenging science problems. Tutorials on new technologies may be featured. Two 30-minute, technically-focused talks are presented each month and include a brief question and answer period. This series is open to everyone.
Day and Time: Third Tuesdays @ 1 pm Eastern / 12 pm Central / 10 am Pacific
Add this event to your calendar.
Webinar (PC, Mac, Linux, iOS, Android): Launch Zoom webinar
iPhone one-tap (US Toll): +16468769923,,114343187# (or) +16699006833,,114343187#
Telephone (US Toll): Dial(for higher quality, dial a number based on your current location):
US: +1 646 876 9923 (or) +1 669 900 6833 (or) +1 408 638 0968
Meeting ID: 114 343 187
Due to the large number of attendees, only the presenters and host broadcast audio. Attendees may submit chat questions to the presenters through a moderator.
February 16, 2016
fMRI image registration with AFNI's 3dQwarp
Presenter(s): Junqi Yin (NICS)
Principal Investigator(s): Frank Skidmore (University of Alabama Birmingham)
The Analysis of Functional Neuroimaging (AFNI) software package is widely used in the community for the brain MR image analysis. For many types of analysis workflows, one important step is to register a subject's image to a pre-defined template so different subjects can be compared within a normalized coordination system. This is specially challenging if the subject has brain atrophy due to some kinds of neurological condition such as Parkinson's disease. The 3dQwarp code in AFNI is a non-linear image registration procedure that overcomes the drawbacks of a linear affine transformation. However, the existing OpenMP instrumentation in 3dQwarp is not efficient for small-patch optimization, and the lack of convergence criteria of the iterative algorithm also hurts the accuracy. Based on the profiling and benchmark, we have been working on the optimization of its OpenMP structure and the improvement of warped image fidelity, which can be used for voxel-to-voxel type of downstream analysis.
ECSS-er Junqi Yin (NICS) will be sharing observations from his work with PI Frank Skidmore (U Alabamba Birmingham) on Blacklight and Greenfield to optimize a widely used neuroimaging package called Analysis of Funtional Neuroimaging (AFNI).
Boosting molecular dynamics with advanced hardware and algorithms
Presenter(s): Lei Huang (TACC)
Principal Investigator(s): Dr. Doraiswami Ramkrishna (Purdue)
There are several open-sourced packages available for general purpose molecular dynamics (MD) simulations. However, researchers still need to develop their own MD engines under special circumstance. Dr. Doraiswami Ramkrishna's group at Purdue developed a package for umbrella sampling and molecular dynamics for polymorph prediction. By leveraging the power of Intel Xeon Phi and adopting several advanced algorithms in molecular dynamics, we achieved ~9x speedup and got a performance superior to LAMMPS.
Lei Huang (TACC) will tell us about his work with PI Doraiswami Ramkrishna (Purdue) to port several advanced algorithms in molecular dynamics to the Xeon Phi on Stampede with a factor of 9 speedup.
January 19, 2016
Performance Enhancements to PlascomCM
Presenter(s): Lucas A. Wilson (TACC)
Principal Investigator(s): Daniel Bodony (UIUC)
PlascomCM is a Fortran90 application that is used to investigate the behavior of compressible, viscous gases, usually in the contexts of aerospace or mechanical engineering and with a focus on turbulence and generated sound. Several recent examples include predicting and controlling the noise produced by high-speed turbulent jets, such as found on commercial and military aircraft, and Mach 2.25 turbulent boundary layer grazing a flexible panel with application to multi-physics design of future hypersonic vehicles. The discretization of the governing non-linear partial differential equations uses an overset mesh and multiblock approach with locally structured meshes for which spatial derivatives are approximated with fixed-width stencil-based computations based on finite-difference-like considerations. This talk will highlight ECSS work done over the last 3 years to improve the performance of PlascomCM, with the end goal of efficiently using the Intel Xeon Phi coprocessors on Stampede. Code modifications which have improved caching and enabled vectorization will be highlighted. Further modifications which are currently being considered to improve performance on Xeon Phi will also be discussed.
Apache Airavata and XSEDE Science Gateways
Presenter(s): Suresh Marru (IU)
Principal Investigator(s): Mark Shephard (Rensselaer Polytechnic Institute) Cameron Smith (Rensselaer Polytechnic Institute)
The Symposium talk will walk through projects initially started as optimization and code porting ECSS efforts which later were extended to include gateway support. The resulting codes are made available to community at large through these gateway interfaces. Examples will include PI: Prof. Arne Pearlstein's flow-induced vibration simulation gateway and PI's Mark Shephard and Cameron Smith's PHASTA Gateway. The talk will also discuss the use of a multi-tenanted science gateway framework based on Apache Airavata as a starting point and to achieve short term operational sustainability through externally funded NSF projects. Lastly, we will discuss the reuse of ECSS contributed extensions across projects.
December 15, 2015
Bridges: Connecting Researchers, Data, and HPC
Presenter(s): Nick Nystrom (PSC)
Principal Investigator(s): Nick Nystrom (PSC)
Bridges is a new kind of supercomputer being built at the Pittsburgh Supercomputing Center (PSC) to empower new research communities, bring desktop convenience to supercomputing, expand campus access, and help researchers facing challenges in Big Data to work more intuitively. Funded by a $9.65M NSF award, Bridges consists of tiered, large-shared-memory resources with nodes having 12TB, 3TB, and 128GB each, dedicated nodes for database, web, and data transfer, high-performance shared and distributed data storage, the Spark/Hadoop ecosystem, and powerful new CPUs and GPUs. Bridges is the first production deployments of Intel's new Omni-Path Architecture (OPA) Fabric, which will interconnect its nodes and storage. Bridges emphasizes usability, flexibility, and interactivity. Widely-used languages and frameworks such as Java, Python, R, MATLAB, Hadoop, and Spark benefit transparently from large memory and the high-performance OPA fabric. Virtualization enable hosting web services, NoSQL databases, and application-specific environments and enhances reproducibility. Bridges, allocated through XSEDE, is available at no charge to the open research community. Bridges is also available to industry through PSC's corporate programs.
Design of Experiments and Big Data Analytics for Energy Efficient Buildings
Presenter(s): Pragnesh Patel (NICS)
Principal Investigator(s): Joshua New (ORNL)
A central challenge in the domain of energy efficiency is being able to realistically model a specific class of building and scaling those classes up to the entire United States building stock across ASHRAE climate zones, then projecting how specific retrofits or retrofit packages would maximize return-on-investment for subsidies through federal, state, local, and utility tax incentives, rebates, and loan programs. Nearly all projections regarding energy savings, for any of the plethora of technologies required to address the need for US energy security, are reliant upon accurate models as the central primitive by which to integrate the national impact with meaningful measures of uncertainty, error, variance, and risk. This challenge is compounded by the fact that buildings, unlike cars or planes, are manufactured in the field at the time of construction based on one-off designs with a median lifespan of 73 years. Due to variance of building materials, construction, and equipment (and the necessary flux of these over time), a given building is unlikely to closely resemble the prototypical building class. Therefore, each building needs to be modeled individually and precisely to achieve optimal retrofit and construction practices. We have developed design of experiement for calibrating building energy models, which minimize the number of simulations required while maximizing the statistical resolution of analysis results. Initial statistical analysis of parametric ensembles using techniques such as multiple analysis of variance (MANOVA) and a software infrastructure tying together several machine learning packages (MLSuite) have recently pushed the cutting edge of building energy analysis from about 10 inputs and 12-24 outputs to156 inputs and 96 outputs. The science-enabling software infrastructure has been improved as part of this project include improving R code for design of experiments along with R analysis code while quickly instantiating R on every parallel node/core, integration of Energyplus code for large-scale simulation runs with OpenDIEL workflow system along with pre and post processing data analysis codes.
October 20, 2015
SoyKB pipeline on XSEDE - an overview
Presenter(s): Mats Rynge (USC/ISI)
Principal Investigator(s): Dong Xu (University of Missouri, Columbia)
The Soybean Knowledge Base project (http://soykb.org/) is conducting resequencing of more than 1000+ soybean germplasm lines using Illumina paired end sequencing for multiple projects, selected for major traits including oil, protein, soybean cyst nematode resistance (SCN), abiotic stress resistance (drought, heat and salt) and root system architecture. In this talk we discuss how SoyKB uses XSEDE for the sequencing pipeline and how ECSS helped create the Pegasus workflow for the pipeline. We will also discuss our current effort of transitioning from TACC Stampede to TACC Wrangler.
September 15, 2015
Asteroseismic Modeling Portal
Presenter(s): Haiying Xu (NCAR)
Principal Investigator(s): Travis Metcalfe (Space Science Institute)
The Asteroseismic Modeling Portal is a community facility that allows astronomers to derive the fundamental properties of sun-like stars from observations of their natural vibrations. The underlying science code uses a parallel genetic algorithm to match the observations with standard theoretical models of stars. In the first five years of the project, AMP was applied to more than 100 stars observed by NASA's Kepler mission, yielding a uniform set of stellar properties that have been used to study the structure and evolution of stars and their planetary systems. By using the AMP gateway, more than 100 users around world can submit jobs, retrieve results and even analyze the performance of source codes very easily. And during 8 year running, AMP has submitted 30424 jobs and spent 18,795,892 SUs. XSEDE/ECSS objectives include updating OS and related software of the servers, and optimizing parallel performance of AMP 2.0 science code on by TACC staff.
August 18, 2015
HELPING NON-TRADITIONAL HPC USERS USING XSEDE RESOURCES EFFICIENTLY
Presenter(s): Shiquan Su (NICS)
Principal Investigator(s): Robert Sean Norman (University of South Carolina) Atsuko Tanaka (University of Wisconsin-Madison) Chao Fu (University of Wisconsin-Madison)
In the first project, the PI from University of South Carolina developed a bioinformatic pipeline for analyzing millions of DNA and cDNA sequences. The major computational workload comes from querying a large database by the BLAST tools. Shiquan will present how he helped the PI to reorganize the database file into multiple sub-databases (more than 50) and implemented the advanced host selection feature on Stampede batch system in the PI's job script. The improved workflow shortens the turn around time of the PI's job up to 80%.
In the second project, the researcher Dr. Atsuko Tanaka, from the University of Wisconsin-Madison, studies the lifetime utility: she simulates the clients' behavior and match the simulated outcomes and the observed data with respect to wage profile and asset accumulation over life cycle. This is an ongoing project. Dr. Atsuko Tanaka is actively developing the home-grown codes, which has the potential to be the starting point of a community code. Shiquan works closely with Dr. Tanaka to optimize her serial version of codes to efficiently utilize the powerful resources on Stampede. In this talk, Shiquan discusses the multiple parallelization treatments implemented in Dr. Tanaka's code. Shiquan provided a module to unfold the deep nested loop structure (more than 15 layers) in the main program with MPI. Also per the specific request from Dr. Tanaka, Shiquan applied the new feature in OpenMP 3.0+ to collapse multiple loop spaces in the core subroutine to explore the parallelism within the Stampede node.
Large-shared-memory supercomputing for game-theoretic analysis with fine-grained abstractions, and novel tree search algorithms.
Presenter(s): John Urbanic (PSC)
Principal Investigator(s): Tuomas Sandholdm (Carnegie Mellon)
John Urbanic (PSC) will discuss the optimization of the poker bot that recently competed in the first "Brain vs. AI" no-limit Texas Hold'em tournament, the first time that a poker program has competed against the top pros. John's work was in optimizing the Tuomas Sandholm group's algorithm for Blacklight, the world's largest shared memory platform, at PSC. John will discuss the project in general, the specifics optimizations that were used to make the poker bot competitive, and of course the results – which will shortly be televised.
June 16, 2015
A Short Story of Efficiently Using Two Open-Source Applications on Stampede
Presenter(s): Ritu Arora (TACC)
This presentation will cover a summary of two challenges and solutions related to running the DROID (Digital Record Object Identification) and the FLASH astrophysics code on a large number of nodes on Stampede.
DROID is a software tool developed by The National Archives to perform automated batch identification of file formats. It is written in Java and works well when only one copy of it is run on a node. PI Jesscia Trelogan from the Institute of Classical Archaeology at UT Austin has been using DROID as part of her workflow for managing a large archaeological data collection. It would take her more than 2 days to extract metadata from about 4.3 TB of data using DROID on a local server. Since the process of culling and reorganizing the data collection is iterative, the metadata extraction using DROID needs to be done often. The goal of the ECSS project with PI Trelogan was to provide support in leveraging Stampede for parts of her workflow, which includes DROID, so that the overall time-taken in conducting all the steps in the workflow is reduced. The main challenge in using DROID on Stampede was related to executing its multiple copies in parallel on different nodes in a batch mode. An overview of this challenge and its solution strategy will be discussed during this presentation.
In another project, a copy of the FLASH astrophysics code was optimized such that the code does striped I/O on the Lustre File System. This project was proposed after it was found that a user overloaded the Lustre servers (which eventually became unresponsive) while running FLASH on 7000+ cores. The problem was related to the step that involved reading a checkpoint file. An overview of the problem and its solution will be included in this talk.
Optimization of Text Processing for the WordFlare Knowledge Graph
Presenter(s): Robert Sinkovits (SDSC)
Principal Investigator(s): Michael Douma (IDEA)
The goal of the WordFlare project is to create a tablet-based app to engage K-12 and lifelong learners in exploring language and knowledge. The app is based on a massive thesaurus and features dynamic visualizations of word relationships. Approximately 9% of the content is human-curated, while the other 91% is derived using computational methods executed on XSEDE resources. In this talk, I will describe the steps taken to accelerate two key steps in the automated text processing – optimization of the Latent Dirichlet Allocation (LDA) algorithm and the development of a fast method to simultaneously search for large numbers of words in a corpus. The speedups we obtain are highly problem dependent, ranging from 1.5-2.2x for the LDA algorithm and up to 1500x for the word search when using a large reference dictionary (e.g. the 400K words found in Wiktionary).
May 19, 2015
ECSS experience with non-traditional HPC users
Presenter(s): Junqi Yin (NICS)
Principal Investigator(s): Annette Engel (U. Tenn) Yong Zeng (UMKC)
Mothur is an open source bioinformatics pipeline used for biological sequence analysis that has gained increasing attention in the microbial ecology community. Because a large set of functionalities in Mothur are memory bound, it is well suited for shared memory architectures. I will discuss performance results for several commands in Mothur that are popular in the operational taxonomic unit analysis, and show that pipeline processes can be accelerated by orders of magnitude faster.
Real-time Bayesian estimation for financial ultra-high frequency data is plagued with the curse of high dimensionality. Methods have been developed to manage this problem through the use of MPI. By porting to CUDA, I'll show that an adequately equipped GPU workstation can rise to the task, producing reasonably real-time results with actual data from financial markets.
P3DFFT: a scalable open-source solution for Fourier Transforms and other algorithms in three dimensions
Presenter(s): Dmitry Pekurovsky (SDSC)
P3DFFT is an open-source package developed at SDSC. It implements three-dimensional Fourier Transforms and other algorithms, in a highly scalable and efficient way. P3DFFT achieves good scaling on hundreds of thousands of compute cores. It has received much interest and use from scientists in diverse fields such as DNS turbulence simulations, astrophysics, oceanography and material science. Recently it has been the subject of an internal ECSS project, aimed at making it XSEDE community software. It has been ported, tested and documented on the largest computational systems at XSEDE. Additional features have been added to help widen the impact in the community. In this presentation I will go over the main features of P3DFFT, including the recently added, and review how users of XSEDE can access it on XSEDE platforms.
April 21, 2015
reproducibility@XSEDE: Reporting Back to our Colleagues
Presenter(s): Doug James (TACC) Carlos Rosales (TACC) Nancy Wilkins-Diehr (SDSC)
The reproducibility@XSEDE workshop (www.xsede.org/reproducibility) was a full-day event held in conjunction with XSEDE14. The workshop featured an interactive, open ended, discussion-oriented agenda focused on reproducibility in large-scale computational science. This presentation includes (1) independent reactions to the event by three of the workshop principals; and (2) an open discussion on the topic of reproducibility in general.
March 17, 2015
Gateway Building for the Non-Linear Adjoint Coefficient Estimation (NLACE) project
Presenter(s): Lan Zhao (Purdue) Chris Thompson (Purdue)
Principal Investigator(s): Paul Barbone (Boston University)
Presenters will discuss work providing a solution for the NLACE (Non-Linear Adjoint Coefficient Estimation) research group to making biomechanical imaging analysis model available to the community using XSEDE resources. The research has a wide variety of medical applications including brain scanning, bone structure analysis, and cancer detection. The Barbone group created and maintains the NLACE model and needed help with science gateway development. They have an allocation on Gordon, and the ECSS team was able to help them get their model installed there and quickly create an application for utilizing it on DiaGrid, a HubZero-based gateway for hosting scientific applications
Real-Time Next Generation Sequencing (NGS) in the Classroom using Galaxy
Presenter(s): Josephine Palencia (PSC) Alex Ropelewski (PSC)
We present an interesting real-user case scenario supporting 30 Carnegie Mellon University (CMU) Bioinformatics students from three classes performing real-time next generation sequencing (NGS). We describe the system setup, the scaling preparations, the tools and the full workflow, the data and reference files and the lessons learned from the classroom experience.