ECSS Symposium

ECSS staff share technical solutions to scientific computing challenges monthly in this open forum.

The ECSS Symposium allows the over 70 ECSS staff members to exchange on a monthly basis information about successful techniques used to address challenging science problems. Tutorials on new technologies may be featured. Two 30-minute, technically-focused talks are presented each month and include a brief question and answer period. This series is open to everyone.

Symposium coordinates

Day and Time: Third Tuesdays @ 1 pm Eastern / 12 pm Central / 10 am Pacific
Add this event to your calendar.

Webinar (PC, Mac, Linux, iOS, Android): Launch Zoom webinar

iPhone one-tap (US Toll): +16468769923,,114343187# (or) +16699006833,,114343187#

Telephone (US Toll): Dial(for higher quality, dial a number based on your current location):

US: +1 646 876 9923 (or) +1 669 900 6833 (or) +1 408 638 0968

Meeting ID: 114 343 187

Upcoming events are also posted to the Training category of XSEDE News.

Due to the large number of attendees, only the presenters and host broadcast audio. Attendees may submit chat questions to the presenters through a moderator.

Key Points
Monthly technical exchange
ECSS community present
Open to everyone
Tutorials and talks with Q & A
Contact Information

Previous years' ECSS seminars may accessed through these links:

2017

2016

2015

2014

September 16, 2014

Establishing TauDEM as a Science Gateway Service on XSEDE for Scalable Hydrological Terrain Analysis
Presenters: Yan Liu, Ye Fan(NCSA)
Principal Investigator: David Tarboton (Utah State)

Presentation Slides

Finer resolutions on Digital Elevation Models (DEM) data have been shown to have significant impact on hydrologically important variables and improve the accuracy and reliability of terrain analysis using DEM. TauDEM is a parallel computing solution to watershed delineation and the extraction of hydrological information from high-resolution DEMs. The first part of the talk will introduce a multi-institutional effort that leverages expertise in multiple disciplines (i.e., hydrology, computational science, geographic information science, and geography) through XSEDE ECSS to scale TauDEM from local cluster to supercomputers on national cyberinfrastructure (e.g., XSEDE) through rigorous computational performance profiling and analysis. The second part of the talk will present the data and software integration and science gateway application development experience obtained in establishing TauDEM as a CyberGIS Gateway application.


August 19, 2014

ParaView Coprocessing Visualization of Differential Equations
Presenters: Mark Vanmoer (NCSA)
Principal Investigator: Benson Muite (at U of Michigan during project, now at U of Tartu, Estonia)

Presentation Slides

This ESRT project investigated two in-situ visualization approaches for highly scalable differential equation codes. These codes investigated Rayleigh-Benard convection, an idealized problem which is of physical interest and can serve as a model for larger high resolution studies in computational fluid dynamics using spectral methods. A ParaView coprocessing adaptor was integrated with the code that allowed for direct output of images from a running simulation.

 

The Data Exacell
Presenters: Nick Nystrom (PSC)

Presentation Slides

The Data Exacell (DXC) is an accelerated development pilot project to create, deploy, and test software building blocks and hardware implementing functionalities specifically designed to support data-analytic capabilities for data-intensive scientific research. Supported by NSF through its Data Infrastructure Building Blocks (DIBBs) program, the DXC focuses on data storage mechanisms, their coupling to specialized, powerful engines for data analytics, and enabling transformational application architectures powered by cutting-edge database technologies.


June 17, 2014

Perspectives on Data Sharing: Two data-centric gateways at NCAR
Presenters: Don Middleton, Eric Nienhouse, Nathan Wilhelmi (NCAR)

Presentation Slides

The U.S. government and the NSF have substantially elevated the importance of scientific data management, sharing, and openness as a national priority. In addition to providing access to computational resources for scientific communities, science gateways can also be an ideal place for communities to share data using common infrastructure that¹s been tuned to their specific needs. In this presentation, we will briefly review some of the salient NSF policy shifts regarding data, touch on related emerging trends including Big Data and EarthCube, demonstrate two data-centric Science Gateways (climate modeling and Arctic science), and finish up by providing an overview of our architecture and software engineering process.

Bio of Speakers:

Don Middleton leads the Visualization and Enabling Technologies (VETS) program in NCAR¹s Computational and Information Systems Laboratory (CISL). This program includes the development and delivery of data collections and cyberinfrastructure to a broad, national and global community. The project portfolio includes the NCAR Command Language (NCL) and the PyNGL/PyNIO toolkit, the Community Data Portal (CDP), the NSF-sponsored Advanced Cooperative Arctic Data and Information Service (ACADIS), the Earth System Grid (ESG) data system, NSF¹s XSEDE project, the DOE-sponsored Parvis effort, the UCSD-led Chronopolis digital preservation project, and the multi-agency sponsored National Multimodel Ensemble (NMME) project. Middleton is active in NSF¹s EarthCube activity and also contributes to an expert team on federated data management systems for the World Meteorological Organization Information System (UN/WMO-WIS).

Eric Nienhouse is a software engineer and Agile Scrum Product Owner for the Science Gateway Framework (SGF) software, which supports the ESG-NCAR Science Gateway and the ACADIS Arctic science data management system. Eric is passionate about building products that enable the scientific user community to focus on its science. As product owner, Eric identifies and prioritizes project requirements to ensure the SGF software and services meet the needs of stakeholders.

Nathan Wilhelmi is a software engineer and the Scrum Master for the Science Gateway Framework (SGF) software, which supports the ESG-NCAR Science Gateway and the ACADIS Arctic science data management system. As the Scrum Master, Nathan is responsible for facilitating and improving the Scrum process, ensuring the improvement of code quality, and researching and adopting new technologies.


May 20, 2014

CoSSci High Performance Computing for Anthropology and the Social Sciences
Principal Investigator and Presenter: Douglas White, (UC Irvine)
Presenter: Lucasz Lacinski, (U Chicago)

Presentation Slides

Douglas White and his co-authors, mathematical anthropologist Malcolm Dow and sociocultural econometrician Anthon Eff, editing the Wiley Companion to Cross-Cultural Research, designed R software functions (the Dow-Eff functions) that solved the crucial problems of controls for autocorrelation needed for vastly more accurate research in the social sciences as well as any of the other observational sciences. They extended on-line access to the four large anthropological datasets that now cover 3-5,000 coded variables for nearly all of the ethnographic literatures that apply to specific times and locations. They also implemented the most powerful statistical tools for imputation of missing data.

Under an ECSS award, XSEDE science gateway developers at Argonne National Lab (Tom Uram) and then the University of Chicago (Lacinski and Rachana Ananthakrishnan) designed the Complex Social Science Gateway (CoSSci ). This was designed using the Galaxy framework, is hosted at UC Irvine, and will be replicated at the Santa Fe Institute and elsewhere for classroom use.

Currently, the Dow-Eff functions (DEf) are aimed at best practices in finding the hows and whys of variations in human culture and behavior, in the most general but also very specific terms that include environments internal or external to human communities, or that manifest in disease or other and biological or biosocial processes. Such findings may be of immeasurable value worldwide given that autocorrelation controls help to establish the equivalents of randomly chosen rather than clustered samples that yield biased significance tests that distort research findings. The samples vary from foragers to a full range of human societies that with contributions of other datasets may be complemented by cross-national, regional, or other types of units of study.

With recent awards to others of White's research colleagues engaged in coding databases for longitudinal research on historical economics (Culture and Economic Growth; Evolution of World Religious), working out of the Evolution Institute, later developments of the XSEDE CoSSci Galaxy project will help to explore the temporal dynamics of behavioral, cultural, economic, and political aspects of human societies across time and including historical as well as archaeological studies.

The next two years of this project will utilize large sets of variables that are imputed in the process of DEf modeling and can be analyzed as networks of variables where analysis benefits from HPC methods. These are complex models that may use Akaike's AiCc Information Criterion in modeling and, especially, Marco Scutari's (2014) new library (bnlearn) applied to networks of related variables that can reveal precisely which complex interrelated subsets of theoretically defined observed variables form potentially causal networks. This is illustrated for world-scale religions in today's discussion.

Whereas simpler models can be computed as an aspect of students' work in the courseware that the project facilitates, storing modeling histories that can be reviewed by an instructor (and then compared to earlier attempts at similar models in the literature which are often seen to fail without controls for autocorrelation). The use of high-end HPC analyses of the more complex interactions in observed Bayesian networks of variables, however, allows experts to study more complex relationships among multiple observations with HPC.

In learning how to test theories based on massive amounts of data, the "Galton's problem" that has plagued the analysis of samples based on naturalistic observations is of paramount importance, at both the simpler and more complex levels of analysis. At the end of the next two years it should be possible to see a new florescent of coursework and research publications (including the Wiley Companion) that are likely to have transformed many of the subfields of cross-cultural studies in providing new discoveries.

;


April 15, 2014

MPI_IO Optimization for Compressible Turbulence Simulations
Presenters: Dr. Vincent Betro (NICS)
Principal Investigator: Dr. Diego Donzis (Texas A&M)

Presentation Slides

This ECSS project was undertaken by Dr. Vincent Betro with Dr. Diego Donzis of Texas A&M in order to improve the I/O of his CFD code for cyber-enabled investigations of compressible turbulence and mixing and study the effect of thermal non-equilibrium on turbulent processes. In previous work on XSEDE resources, Dr. Donzis developed a new, highly scalable code to perform direct numerical simulations of compressible turbulence and had started obtaining results at resolutions up to 512^3 with a newly developed forcing scheme to maintain a stationary state. Further analysis and new simulations at 1024^3 provided definite answers to pressing important issues about the scaling of different components in which compressible fields can be decomposed, namely solenoidal and dilatational component. The new simulations will be unprecedented in detail and along with the accumulated database, important aspects of small scale intermittency and mixing in compressible turbulence will be, for the first time, investigated. However, this higher resolution also requires more I/O and as a consequence the need for faster parallel I/O.

In this presentation, Dr. Betro will discuss how MPI_IO environment variables in combination with file striping successfully increased performance. For instance, 32,000 and 64,000 core jobs which could previously not run within a 24 hour walltime can now run successfully with MPI_IO. Also, as one grows the core count, the wall time does not grow as rapidly while still retaining the use of the subarray data types, thus allowing memory use per core to scale better than it could have with a root process having to control all I/O.

 

Parallelizing a Conditional Random Fields Code in Java
Presenters: Joel Welling (PSC)
Principal Investigators: Jana Diesner, Brent Fegley (UIUC)

Presentation Slides

Jana Diesner and Brent Fegley of UIUC are machine learning researchers using a method called "conditional random fields" (CRF) to identify sentence components. Their code is based on a very elegant Java implementation of the problem by Sunita Sarawagi of ITT Bombay- Sarawagi's code is extremely well structured and flexible, but very much designed without concern for parallelism. We undertook an ECSS project to develop a version of the code which would be thread-parallel over the training examples. I will give a brief overview of CRF, describe Java's tools for parallelism and Sarawagi's CRF implementation, and report on the performance improvements we achieved.


March 18, 2014

Science Gateway Support and SoftWare Spinoffs
Presenters: Marie Ma (Yu Ma) and Lahiru Gunathilake (Indiana University)

Presentation Slides

Science gateways enable broad communities of scientists to use XSEDE resources through Web browser and similar user interfaces. XSEDE's Extended Collaborative Support Services (ECSS) has staff available to work with science gateway developers to help them integrate their gateways with XSEDE.  Frequently, a solution for one gateway's problems can be reused by other gateways. In this two-part presentation, we describe a range of gateway support activities and some reusable software nuggets that we have derived.  An XSEDE-compatible web-based authentication for gateway users is a common problem, especially given the wide range of programming languages and frameworks used to build gateways.  We summarize support activities for the General Automated Atomic Model Parameterization (GAAMP) computational chemistry gateway, NCGAS Galaxy-based bioinformatics gateway, the ParamChem computational chemistry gateway, and the UltraScan biophysics gateway and describe three common requirements: the need to perform XSEDE-compatible Web authentication, the need to manage job executions securely, and the need to monitor jobs through the gateway.  This has led our group to develop small, open source gateway code nuggets that can be easily used in other projects.  As open source software, these are open for any to use but also, just as importantly, open for code contributions.  We conclude with information on how to obtain, use, and contribute to the software.


February 18, 2014

Postponed to a later date (TBD)

Perspectives on Data Sharing: Two data-centric gateways at NCAR
Presenters: Don Middleton, Eric Nienhouse, Nathan Wilhelmi (NCAR)

The U.S. government and the NSF have substantially elevated the importance of scientific data management, sharing, and openness as a national priority. In addition to providing access to computational resources for scientific communities, science gateways can also be an ideal place for communities to share data using common infrastructure that¹s been tuned to their specific needs. In this presentation, we will briefly review some of the salient NSF policy shifts regarding data, touch on related emerging trends including Big Data and EarthCube, demonstrate two data-centric Science Gateways (climate modeling and Arctic science), and finish up by providing an overview of our architecture and software engineering process.

Bio of Speakers:

Don Middleton leads the Visualization and Enabling Technologies (VETS) program in NCAR¹s Computational and Information Systems Laboratory (CISL). This program includes the development and delivery of data collections and cyberinfrastructure to a broad, national and global community. The project portfolio includes the NCAR Command Language (NCL) and the PyNGL/PyNIO toolkit, the Community Data Portal (CDP), the NSF-sponsored Advanced Cooperative Arctic Data and Information Service (ACADIS), the Earth System Grid (ESG) data system, NSF¹s XSEDE project, the DOE-sponsored Parvis effort, the UCSD-led Chronopolis digital preservation project, and the multi-agency sponsored National Multimodel Ensemble (NMME) project. Middleton is active in NSF¹s EarthCube activity and also contributes to an expert team on federated data management systems for the World Meteorological Organization Information System (UN/WMO-WIS).

Eric Nienhouse is a software engineer and Agile Scrum Product Owner for the Science Gateway Framework (SGF) software, which supports the ESG-NCAR Science Gateway and the ACADIS Arctic science data management system. Eric is passionate about building products that enable the scientific user community to focus on its science. As product owner, Eric identifies and prioritizes project requirements to ensure the SGF software and services meet the needs of stakeholders.

Nathan Wilhelmi is a software engineer and the Scrum Master for the Science Gateway Framework (SGF) software, which supports the ESG-NCAR Science Gateway and the ACADIS Arctic science data management system. As the Scrum Master, Nathan is responsible for facilitating and improving the Scrum process, ensuring the improvement of code quality, and researching and adopting new technologies.


January 21, 2014

Pushing the Integration Envelope of Cyberinfrastructure to Realize the CyberGIS Vision
Presenter: Shaowen Wang, (NCSA)

Presentation Slides

CyberGIS, ­geographic information science and systems (GIS) based on advanced cyberinfrastructure, ­has emerged during the past several years as a vibrant interdisciplinary field. It has played essential roles in enabling computing- and data-intensive research and education across a broad swath of academic disciplines with significant societal impact. However, fulfilling such roles is increasingly dependent on the ability to simultaneously process and visualize complex and very large geospatial data sets and conduct associated analyses and simulations, which often require tight integration of collaboration, computing, data, and visualization capabilities. This presentation addresses this requirement as a set of challenges and opportunities for advancing cyberinfrastructure and related sciences while discussing the state of art of CyberGIS.


December 17, 2013

Presenter(s):
Nate Coraor, (Penn State)
Philip Blood, (Pittsburgh Supercomputing Center)
Rich LeDuc, (National Center for Genome Analysis Support)
Yu Ma, (Indiana University)
Ravi Madduri, (Argonne National Laboratory)

Presentation Slides

We will present a symposium describing current and planned efforts to enable more scientists to easily, transparently, and reproducibly analyze and share large-scale next-generation sequencing data with the Galaxy framework. 

Topics and speakers are as follows:

  • James Taylor (Galaxy Team): The future of Galaxy.
  • Philip Blood (Pittsburgh Supercomputing Center): Integrating Galaxy Main with XSEDE and establishing an XSEDE Galaxy Gateway.
  • Rich LeDuc and Yu Ma (National Center for Genome Analysis Support (NCGAS), Indiana U): Utilizing Galaxy at NCGAS with integrated InCommon authentication.
  • Ravi Madduri (Argonne National Laboratory): Experiences in building a next-generation sequencing analysis service using Galaxy, Globus Online, and Amazon Web Services.


October 15, 2013

Research Data Management-as-a-Service with Globus Online
Presenter: Rachana Ananthakrishnan , (University of Chicago)

Presentation Slides

As science becomes more computation- and data-intensive, there is an increasing need for researchers to move and share data across institutional boundaries. Managing massive volumes of data throughout their lifecycle is rapidly becoming an inhibitor to research progress, due in part to the complex and costly IT infrastructure required – infrastructure that is typically out of reach for the hundreds of thousands of small and medium labs that conduct the bulk of scientific research. Globus Online is a powerful system that aims to provide easy-to-use services and tools for research data management – as simple as the cloud-hosted Netflix for streaming movies, or Gmail for e-mail – and make advanced IT capabilities available to any researcher with access to a Web browser. Globus Online provides software-as-a-service (SaaS) for research data management, including data movement, storage, sharing, and publication. We will describe how researchers can deal with data management challenges in a simple and robust manner. Globus Online makes large-scale data transfer and synchronization easy by providing a reliable, secure, and highly-monitored environment with powerful and intuitive interfaces. Globus also provides federated identity and group management capabilities for integrating Globus services into campus systems, research portals, and scientific workflows. New functionality includes data sharing, simplifying collaborations within labs or around the world. Tools specifically built for IT administrators on campuses and computing facilities give additional features, controls, and visibility into users' needs and usage patterns. We will present use cases that illustrate how Globus Online is used by campuses (e.g. University of Michigan), supercomputing centers (e.g., Blue Waters, NERSC), and national cyberinfrastructure providers (e.g. XSEDE) to facilitate secure, high-performance data movement among local computers and HPC resources. We will also outline the simple steps required to create a Globus Online endpoint and make the service available to all facility users without specialized hardware, software or IT expertise. There will be a live demonstration of how to use Globus Online.


Showing 41 - 50 of 68 results.
Items per Page 10
of 7