ECSS Symposium Archive

ECSS staff share technical solutions to scientific computing challenges monthly in this open forum.

Previous years' ECSS seminars may accessed through these links:

October 20, 2020

Introducing Neocortex

Presenter(s): Sergiu Sanielevici (PSC)

Presentation Slides

Neocortex will be a highly innovative resource at PSC that will accelerate AI-powered scientific discovery by vastly shortening the time required for deep learning training, foster greater integration of artificial deep learning with scientific workflows, and provide revolutionary new hardware for the development of more efficient algorithms for artificial intelligence and graph analytics.

Introducing Bridges-2

Presenter(s): Shawn Brown (PSC)

Presentation Slides

Bridges-2, PSC's newest supercomputer, will provide transformative capability for rapidly evolving, computation-intensive and data-intensive research, creating opportunities for collaboration and convergent research. It will support both traditional and non-traditional research communities and applications. Bridges-2 will integrate new technologies for converged, scalable HPC, machine learning and data; prioritize researcher productivity and ease of use; and provide an extensible architecture for interoperation with complementary data-intensive projects, campus resources, and clouds.

September 15, 2020

High Resolution Spatial Temporal Analysis of Whole-Head 306-Channel Magnetoencephalography & 66-Channel Electroencephalography Brain Imaging in Humans During Sleep

Presenter(s): David Shannahoff-Khalsa (UCSD) Mona Wong (SDSC) Jeff Sale (SDSC)

Presentation Slides

In chronobiology, the circadian rhythm is known as the 24-hr sleep-wake cycle. The ultradian rhythm has a shorter cycle with approximately a 1-3 hour periodicity, with considerable variability. This project's goal is to follow up on our earlier EEG work during sleep, and that of others, that has identified a rhythm of how the two cerebral hemispheres alternate in dominance with coupling to the ultradian rhythm of the rapid eye movement (REM) and non-rapid eye movement (NREM) sleep cycle. Here we are also comparing whole head and regional variations in cerebral dominance to gain better insight to this novel rhythm during sleep. This rhythm of alternating cerebral hemispheric dominance also manifests during the waking state, and it is apparently coupled to every major bodily system and now presents as a novel rhythm regulated by the central and autonomic nervous systems via the hypothalamus. With the support of XSEDE ECSS, this project has processed 306-channel magnetoencephalography that includes 3 signal types (1 magnetometer, 2 opposing gradiometers) and 66-channel EEG recordings from 4 normal healthy sleep subjects. We are analyzing the data to compare the 4 signal types filtered into 6 frequency bands, over the whole head and 6 discrete regions of the head to see how they vary with the REM and NREM sleep stages. Our analysis includes a relatively new algorithm called Fast Orthogonal Search that is well suited for analyzing the periodicity in nonlinear dynamical systems. Our analysis also includes unique methods in visualization for observing how these patterns of left minus right hemisphere power exhibit during sleep stages.

August 18, 2020

RDA Recommendations and Outputs

Presenter(s): Anthony Juehne (RDA Foundation)

Presentation Slides

The RDA was launched in 2013 to fill the identified need for a neutral, collaborative space gathering the diverse data communities and, through informed consensus, building the social and technical bridges to enable open data sharing. Since its founding, the RDA principles - Open, Consensus, Balance, Harmonization, Community-driven, Non-profit, and Technology-Neutral - have resonated across research communities. RDA membership includes currently over 11,000 participants representing 144 countries from all populated continents collaborating in 97 working or interest groups. The RDA is focused on actively building outcomes to accelerate the work to support open data interoperability, sharing, and use. This happens through the development and deployment of two primary output categories: i) technical infrastructure (e.g., tools, models, registries); and ii) social infrastructure (e.g., common standards, best practices, policies). This presentation will discuss an approach to implementing RDA developed outputs and recommendations across multiple areas of organizational operation, including human development and education, data laws and policies, research practices, data and metadata formats and standards, data sharing workflows, and infrastructure management for enhanced interoperability.

FAIR Data and SEAGrid Gateway a Research Data Alliance Adoption Project

Presenter(s): Rob Quick (Indiana University)

The Science and Engineering Grid (SEAGrid) Gateway has been an active resource for the computational community since 2016. During this time the utility of persistent identifiers for research data products has become prevalent in research communities as defined in the FAIR principles for open data. At the beginning of 2020 the Research Data Alliance funded an adoption project to integrate RDA outputs and recommendations focused on PID issuance to data and software components that make up a science workflow within the SEAGrid environment. This presentation will summarize this project and describes the gateway and data infrastructure components required for integration along with the details of the integration process. The work done in this adoption project can be used to inform future gateway projects that adopt the technical components of FAIR which are reliant on a persistent identifier resolution infrastructure.

June 16, 2020

Scalable Research Automation using Globus

Presenter(s): Rachana Ananthakrishnan (Globus)

Presentation Slides

REST APIs exposed by the Globus service, combined with high-speed networks and Science DMZs, create a data management platform that can be leveraged to increase efficiency in research workflows. In many cases, current ad hoc or human centered processes fall short of addressing the needs of researchers as their work becomes more data intensive. As data volumes grow, the overhead introduced by such non-scalable processes hampers core research activities, sometimes to the point where research takes a back seat to wrangling with IT infrastructure. However, technologies exist for reducing this burden and reengineering processes such that they can easily cope with growing data velocity and volume. One such technology is the Globus platform-as-a-service that facilitates access to advanced data management capabilities, and enables integration of these capabilities into existing and new scientific workflows to automate repetitive tasks: data replication, ingest from instruments, backup, archival, data distribution, etc. We will present real-world examples that illustrate how Globus can be used to perform data management tasks at scale, with no or minimal effort on the part of the researcher. Examples include streamlined data flows at the Advanced Photon Source data sharing system, used to distribute data from light source experiments. We will describe how the Globus platform provides intuitive access to authentication, authorization, sharing, transfer, and synchronization capabilities that can be included in simple scripts or integrated into more full-featured applications.

Building Source-to-Source Tools for High-Performance Computing

Presenter(s): Chunhua "Leo" Liao (LLNL)

Presentation Slides

Computational scientists face numerous challenges when trying to exploit powerful and complex high-performance computing (HPC) platforms. These challenges arise in multiple aspects including productivity, performance, correctness and so on. In this talk, I will introduce a source-to-source approach to addressing HPC challenges. Our work is based on a unique compiler framework named ROSE. Developed at Lawrence Livermore National Laboratory, ROSE encapsulates advanced compiler analysis and optimization technologies into easy-to-use library APIs so developers can quickly build customized program analysis and transformation tools for C/C++/Fortran and OpenMP programs. Several example tools will be introduced, including the AST inliner, outliner, and a variable move tool. I will also briefly mention ongoing work related to benchmarks, composable tools, and training for compiler/tool developers. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-ABS-810981).

May 19, 2020

Gateway Production Monitoring

Presenter(s): Kenneth Yoshimoto (SDSC)

Presentation Slides

In order to monitor the function of a production gateway, Neuroscience Gateway (NSG), two programs were developed to test gateway functions: data upload, job submission, and output retrieval. NSG uses the Workbench Framework (WF) code base. Other gateways using WF are COSMIC2 and CIPRES. The WF gateway can provide both a non-API web interface and a RESTful API. NSG makes both these interfaces available to users. For routine monitoring of production status, programs were written to do a daily test of both interfaces. The programs and testing process will be presented.

Essentials for a Successful XRAC Proposal: Code Performance and Scaling

Presenter(s): Lars Koesterke (TACC)

Presentation Slides

Many PI's struggle with putting together a sound computational plan based on code performance and scaling information. In fact, for first-time PI's the most common reason for rejection is an insufficient computational plan. With this new training module we are trying to address this problem. The training module attempts to answer two questions: Why is scaling and performance data important and how is it used by reviewers, and how to use this data to put together a computational plan? Currently the module is geared towards traditional HPC communities and we are working on extending the content towards new communities. The purpose of my talk at the ECSS symposium is to bring staff members on the same page and to raise awareness that there is a new resource available that may help educating users struggling with writing a successful XRAC proposal.

April 21, 2020

Beginners tutorial on cloud devops on Jetstream focused on Kubernetes and JupyterHub

Presenter(s): Andrea Zonca (SDSC)

Presentation Slides

This symposium assume no previous knowledge of cloud technologies and will cover the following topics: * Example virtual machine setup with Openstack command line tools * Deploying a Kubernetes Cluster on Jetstream * How Kubernetes works, architecture, differences between containers and Virtual Machines * Deploying JupyterHub on Jetstream for a workshop

March 17, 2020

AMP Gateway: An portal for atomic, molecular and optical physics simulations.

Presenter(s): Sudhakar Pamidighantam (Indiana University)

Presentation Slides

We describe the creation of a new Atomic and Molecular Physics science gateway (AMPGateway). The gateway is designed to bring together a subset of the AMP community to work collectively to make their software suites available and easier to use by the partners as well as others. By necessity, a project such as this requires the developers to work on issues of portability, documentation, ease of input, as well as making sure the codes can run on a variety of architectures. The gateway was built using Apache Airavata gateway middleware framework. Initially it was deployed using the Airavata PHP client on the web but has since been redeployed under a Django web framework. Here we outline the organization and facility of the Django deployment and how it has been used discuss future directions for the AMP gateway.

Bursting into the public Cloud – Sharing my experience doing it at large scale for IceCube

Presenter(s): Igor Sfiligoi (SDSC)

Presentation Slides

When compute workflow needs spike well in excess of the capacity of a local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. I have recently helped IceCube expand their resource pool by a few orders of magnitude, first to 380 PFLOP32s for a few hours and later to 170 PFLOP32s for a whole workday. In the process we moved O(50 TB) of data to and from the clouds, showing that networking is not a limiting factor, either. While there was a non-negligible dollar cost involved with each, the effort involved was quite modest. In this session I will explain what was done and how, alongside an overview of why IceCube needs so much compute.

January 21, 2020

CUDA-Python and RAPIDS for blazing fast scientific computing

Presenter(s): Abe Stern (NVIDIA)

Presentation Slides

We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming.