ECSS Symposium Archive
ECSS staff share technical solutions to scientific computing challenges monthly in this open forum.
December 16, 2014
New XSEDE Resources
Presenter(s): Mike Norman (UCSD) Niall Gaffney (TACC) Jim Lupo (LSU)
Presentation Slides - Mike Norman
Presentation Slides - Niall Gaffney
Presentation Slides - Jim Lupo
The advanced digital resources ecosystem is constantly evolving as resources decommissioned when they reach the end of their life and new resources are added. On this call you¹ll hear about the newest XSEE resources one of which is available now and two that will be available in early 2015.
- Michael Norman, San Diego Supercomputing Center
- Comet, a new 2 Pflops supercomputer designed to transform advanced scientific computing by expanding access and capacity among traditional as well as non-traditional research domains.
- http://comet.sdsc.edu
- Niall Gaffney, Texas Advanced Computing Center
- Wrangler, a groundbreaking data analysis and management system for the national open science community. https://www.tacc.utexas.edu/systems/wrangler
- SuperMIC, a ~1 Pflops system of Intel Xeon Phi processors, 40% of which is allocated via XSEDE
- http://www.hpc.lsu.edu/resources/hpc/system.php?system=SuperMIC
October 21, 2014
XSEDE New User Tutorials and User Support: Lessons Learned
Presenters: Jay Alameda (NCSA), Marcela Madrid (PSC)
Presentation Slides - Alameda
Presentation Slides - Madrid
I will discuss our experience teaching New User Tutorials both remotely as webinars and at the XSEDE conferences. I will start with a brief description of the New User tutorial, which includes the sections "Submitting a Successful Allocation", "How to Get Started Using XSEDE Resources", "Globus On-line" and Exercises. I will then discuss the problems encountered and questions frequently asked during these tutorials and hands-on exercises. I will link this experience to the frequently asked questions that I have observed in User Support . We welcome discussions for improvement of these ESTEOs activities.
September 16, 2014
Establishing TauDEM as a Science Gateway Service on XSEDE for Scalable Hydrological Terrain Analysis
Presenters: Yan Liu, Ye Fan(NCSA)
Principal Investigator: David Tarboton (Utah State)
Finer resolutions on Digital Elevation Models (DEM) data have been shown to have significant impact on hydrologically important variables and improve the accuracy and reliability of terrain analysis using DEM. TauDEM is a parallel computing solution to watershed delineation and the extraction of hydrological information from high-resolution DEMs. The first part of the talk will introduce a multi-institutional effort that leverages expertise in multiple disciplines (i.e., hydrology, computational science, geographic information science, and geography) through XSEDE ECSS to scale TauDEM from local cluster to supercomputers on national cyberinfrastructure (e.g., XSEDE) through rigorous computational performance profiling and analysis. The second part of the talk will present the data and software integration and science gateway application development experience obtained in establishing TauDEM as a CyberGIS Gateway application.
August 19, 2014
ParaView Coprocessing Visualization of Differential Equations
Presenters: Mark Vanmoer (NCSA)
Principal Investigator: Benson Muite (at U of Michigan during project, now at U of Tartu, Estonia)
This ESRT project investigated two in-situ visualization approaches for highly scalable differential equation codes. These codes investigated Rayleigh-Benard convection, an idealized problem which is of physical interest and can serve as a model for larger high resolution studies in computational fluid dynamics using spectral methods. A ParaView coprocessing adaptor was integrated with the code that allowed for direct output of images from a running simulation.
The Data Exacell
Presenters: Nick Nystrom (PSC)
The Data Exacell (DXC) is an accelerated development pilot project to create, deploy, and test software building blocks and hardware implementing functionalities specifically designed to support data-analytic capabilities for data-intensive scientific research. Supported by NSF through its Data Infrastructure Building Blocks (DIBBs) program, the DXC focuses on data storage mechanisms, their coupling to specialized, powerful engines for data analytics, and enabling transformational application architectures powered by cutting-edge database technologies.
June 17, 2014
Perspectives on Data Sharing: Two data-centric gateways at NCAR
Presenters: Don Middleton, Eric Nienhouse, Nathan Wilhelmi (NCAR)
The U.S. government and the NSF have substantially elevated the importance of scientific data management, sharing, and openness as a national priority. In addition to providing access to computational resources for scientific communities, science gateways can also be an ideal place for communities to share data using common infrastructure that¹s been tuned to their specific needs. In this presentation, we will briefly review some of the salient NSF policy shifts regarding data, touch on related emerging trends including Big Data and EarthCube, demonstrate two data-centric Science Gateways (climate modeling and Arctic science), and finish up by providing an overview of our architecture and software engineering process.
Bio of Speakers:
Don Middleton leads the Visualization and Enabling Technologies (VETS) program in NCAR¹s Computational and Information Systems Laboratory (CISL). This program includes the development and delivery of data collections and cyberinfrastructure to a broad, national and global community. The project portfolio includes the NCAR Command Language (NCL) and the PyNGL/PyNIO toolkit, the Community Data Portal (CDP), the NSF-sponsored Advanced Cooperative Arctic Data and Information Service (ACADIS), the Earth System Grid (ESG) data system, NSF¹s XSEDE project, the DOE-sponsored Parvis effort, the UCSD-led Chronopolis digital preservation project, and the multi-agency sponsored National Multimodel Ensemble (NMME) project. Middleton is active in NSF¹s EarthCube activity and also contributes to an expert team on federated data management systems for the World Meteorological Organization Information System (UN/WMO-WIS).
Eric Nienhouse is a software engineer and Agile Scrum Product Owner for the Science Gateway Framework (SGF) software, which supports the ESG-NCAR Science Gateway and the ACADIS Arctic science data management system. Eric is passionate about building products that enable the scientific user community to focus on its science. As product owner, Eric identifies and prioritizes project requirements to ensure the SGF software and services meet the needs of stakeholders.
Nathan Wilhelmi is a software engineer and the Scrum Master for the Science Gateway Framework (SGF) software, which supports the ESG-NCAR Science Gateway and the ACADIS Arctic science data management system. As the Scrum Master, Nathan is responsible for facilitating and improving the Scrum process, ensuring the improvement of code quality, and researching and adopting new technologies.
May 20, 2014
CoSSci High Performance Computing for Anthropology and the Social Sciences
Principal Investigator and Presenter: Douglas White, (UC Irvine)
Presenter: Lucasz Lacinski, (U Chicago)
Douglas White and his co-authors, mathematical anthropologist Malcolm Dow and sociocultural econometrician Anthon Eff, editing the Wiley Companion to Cross-Cultural Research, designed R software functions (the Dow-Eff functions) that solved the crucial problems of controls for autocorrelation needed for vastly more accurate research in the social sciences as well as any of the other observational sciences. They extended on-line access to the four large anthropological datasets that now cover 3-5,000 coded variables for nearly all of the ethnographic literatures that apply to specific times and locations. They also implemented the most powerful statistical tools for imputation of missing data.
Under an ECSS award, XSEDE science gateway developers at Argonne National Lab (Tom Uram) and then the University of Chicago (Lacinski and Rachana Ananthakrishnan) designed the Complex Social Science Gateway (CoSSci
;
April 15, 2014
MPI_IO Optimization for Compressible Turbulence Simulations
Presenters: Dr. Vincent Betro (NICS)
Principal Investigator: Dr. Diego Donzis (Texas A&M)
This ECSS project was undertaken by Dr. Vincent Betro with Dr. Diego Donzis of Texas A&M in order to improve the I/O of his CFD code for cyber-enabled investigations of compressible turbulence and mixing and study the effect of thermal non-equilibrium on turbulent processes. In previous work on XSEDE resources, Dr. Donzis developed a new, highly scalable code to perform direct numerical simulations of compressible turbulence and had started obtaining results at resolutions up to 512^3 with a newly developed forcing scheme to maintain a stationary state. Further analysis and new simulations at 1024^3 provided definite answers to pressing important issues about the scaling of different components in which compressible fields can be decomposed, namely solenoidal and dilatational component. The new simulations will be unprecedented in detail and along with the accumulated database, important aspects of small scale intermittency and mixing in compressible turbulence will be, for the first time, investigated. However, this higher resolution also requires more I/O and as a consequence the need for faster parallel I/O.
In this presentation, Dr. Betro will discuss how MPI_IO environment variables in combination with file striping successfully increased performance. For instance, 32,000 and 64,000 core jobs which could previously not run within a 24 hour walltime can now run successfully with MPI_IO. Also, as one grows the core count, the wall time does not grow as rapidly while still retaining the use of the subarray data types, thus allowing memory use per core to scale better than it could have with a root process having to control all I/O.
Parallelizing a Conditional Random Fields Code in Java
Presenters: Joel Welling (PSC)
Principal Investigators: Jana Diesner, Brent Fegley (UIUC)
Jana Diesner and Brent Fegley of UIUC are machine learning researchers using a method called "conditional random fields" (CRF) to identify sentence components. Their code is based on a very elegant Java implementation of the problem by Sunita Sarawagi of ITT Bombay- Sarawagi's code is extremely well structured and flexible, but very much designed without concern for parallelism. We undertook an ECSS project to develop a version of the code which would be thread-parallel over the training examples. I will give a brief overview of CRF, describe Java's tools for parallelism and Sarawagi's CRF implementation, and report on the performance improvements we achieved.
March 18, 2014
Science Gateway Support and SoftWare Spinoffs
Presenters: Marie Ma (Yu Ma) and Lahiru Gunathilake (Indiana University)
Science gateways enable broad communities of scientists to use XSEDE resources through Web browser and similar user interfaces. XSEDE's Extended Collaborative Support Services (ECSS) has staff available to work with science gateway developers to help them integrate their gateways with XSEDE. Frequently, a solution for one gateway's problems can be reused by other gateways. In this two-part presentation, we describe a range of gateway support activities and some reusable software nuggets that we have derived. An XSEDE-compatible web-based authentication for gateway users is a common problem, especially given the wide range of programming languages and frameworks used to build gateways. We summarize support activities for the General Automated Atomic Model Parameterization (GAAMP) computational chemistry gateway, NCGAS Galaxy-based bioinformatics gateway, the ParamChem computational chemistry gateway, and the UltraScan biophysics gateway and describe three common requirements: the need to perform XSEDE-compatible Web authentication, the need to manage job executions securely, and the need to monitor jobs through the gateway. This has led our group to develop small, open source gateway code nuggets that can be easily used in other projects. As open source software, these are open for any to use but also, just as importantly, open for code contributions. We conclude with information on how to obtain, use, and contribute to the software.
February 18, 2014
Postponed to a later date (TBD)
Perspectives on Data Sharing: Two data-centric gateways at NCAR
Presenters: Don Middleton, Eric Nienhouse, Nathan Wilhelmi (NCAR)
The U.S. government and the NSF have substantially elevated the importance of scientific data management, sharing, and openness as a national priority. In addition to providing access to computational resources for scientific communities, science gateways can also be an ideal place for communities to share data using common infrastructure that¹s been tuned to their specific needs. In this presentation, we will briefly review some of the salient NSF policy shifts regarding data, touch on related emerging trends including Big Data and EarthCube, demonstrate two data-centric Science Gateways (climate modeling and Arctic science), and finish up by providing an overview of our architecture and software engineering process.
Bio of Speakers:
Don Middleton leads the Visualization and Enabling Technologies (VETS) program in NCAR¹s Computational and Information Systems Laboratory (CISL). This program includes the development and delivery of data collections and cyberinfrastructure to a broad, national and global community. The project portfolio includes the NCAR Command Language (NCL) and the PyNGL/PyNIO toolkit, the Community Data Portal (CDP), the NSF-sponsored Advanced Cooperative Arctic Data and Information Service (ACADIS), the Earth System Grid (ESG) data system, NSF¹s XSEDE project, the DOE-sponsored Parvis effort, the UCSD-led Chronopolis digital preservation project, and the multi-agency sponsored National Multimodel Ensemble (NMME) project. Middleton is active in NSF¹s EarthCube activity and also contributes to an expert team on federated data management systems for the World Meteorological Organization Information System (UN/WMO-WIS).
Eric Nienhouse is a software engineer and Agile Scrum Product Owner for the Science Gateway Framework (SGF) software, which supports the ESG-NCAR Science Gateway and the ACADIS Arctic science data management system. Eric is passionate about building products that enable the scientific user community to focus on its science. As product owner, Eric identifies and prioritizes project requirements to ensure the SGF software and services meet the needs of stakeholders.
Nathan Wilhelmi is a software engineer and the Scrum Master for the Science Gateway Framework (SGF) software, which supports the ESG-NCAR Science Gateway and the ACADIS Arctic science data management system. As the Scrum Master, Nathan is responsible for facilitating and improving the Scrum process, ensuring the improvement of code quality, and researching and adopting new technologies.
January 21, 2014
Pushing the Integration Envelope of Cyberinfrastructure to Realize the CyberGIS Vision
Presenter: Shaowen Wang, (NCSA)
CyberGIS, geographic information science and systems (GIS) based on advanced cyberinfrastructure, has emerged during the past several years as a vibrant interdisciplinary field. It has played essential roles in enabling computing- and data-intensive research and education across a broad swath of academic disciplines with significant societal impact. However, fulfilling such roles is increasingly dependent on the ability to simultaneously process and visualize complex and very large geospatial data sets and conduct associated analyses and simulations, which often require tight integration of collaboration, computing, data, and visualization capabilities. This presentation addresses this requirement as a set of challenges and opportunities for advancing cyberinfrastructure and related sciences while discussing the state of art of CyberGIS.