ECSS staff share technical solutions to scientific computing challenges monthly in this open forum.
The ECSS Symposium allows the over 70 ECSS staff members to exchange on a monthly basis information about successful techniques used to address challenging science problems. Tutorials on new technologies may be featured. Two 30-minute, technically-focused talks are presented each month and include a brief question and answer period. This series is open to everyone.
Day and Time: Third Tuesdays @ 1 pm Eastern / 12 pm Central / 10 am Pacific
Add this event to your calendar.
Note – Symposium not held in July and November due to conflicts with PEARC and SC conferences.
Webinar (PC, Mac, Linux, iOS, Android): Launch Zoom webinar
iPhone one-tap (US Toll): +16468769923,,114343187# (or) +16699006833,,114343187#
Telephone (US Toll): Dial(for higher quality, dial a number based on your current location):
US: +1 646 876 9923 (or) +1 669 900 6833 (or) +1 408 638 0968
Meeting ID: 114 343 187
Upcoming events are also posted to the Training category of XSEDE News.
Due to the large number of attendees, only the presenters and host broadcast audio. Attendees may submit chat questions to the presenters through a moderator.
June 16, 2020
Scalable Research Automation using Globus
Presenter(s): Rachana Ananthakrishnan (Globus)
REST APIs exposed by the Globus service, combined with high-speed networks and Science DMZs, create a data management platform that can be leveraged to increase efficiency in research workflows. In many cases, current ad hoc or human centered processes fall short of addressing the needs of researchers as their work becomes more data intensive. As data volumes grow, the overhead introduced by such non-scalable processes hampers core research activities, sometimes to the point where research takes a back seat to wrangling with IT infrastructure. However, technologies exist for reducing this burden and reengineering processes such that they can easily cope with growing data velocity and volume. One such technology is the Globus platform-as-a-service that facilitates access to advanced data management capabilities, and enables integration of these capabilities into existing and new scientific workflows to automate repetitive tasks: data replication, ingest from instruments, backup, archival, data distribution, etc. We will present real-world examples that illustrate how Globus can be used to perform data management tasks at scale, with no or minimal effort on the part of the researcher. Examples include streamlined data flows at the Advanced Photon Source data sharing system, used to distribute data from light source experiments. We will describe how the Globus platform provides intuitive access to authentication, authorization, sharing, transfer, and synchronization capabilities that can be included in simple scripts or integrated into more full-featured applications.
Building Source-to-Source Tools for High-Performance Computing
Presenter(s): Chunhua "Leo" Liao (LLNL)
Computational scientists face numerous challenges when trying to exploit powerful and complex high-performance computing (HPC) platforms. These challenges arise in multiple aspects including productivity, performance, correctness and so on. In this talk, I will introduce a source-to-source approach to addressing HPC challenges. Our work is based on a unique compiler framework named ROSE. Developed at Lawrence Livermore National Laboratory, ROSE encapsulates advanced compiler analysis and optimization technologies into easy-to-use library APIs so developers can quickly build customized program analysis and transformation tools for C/C++/Fortran and OpenMP programs. Several example tools will be introduced, including the AST inliner, outliner, and a variable move tool. I will also briefly mention ongoing work related to benchmarks, composable tools, and training for compiler/tool developers. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-ABS-810981).
May 19, 2020
Gateway Production Monitoring
Presenter(s): Kenneth Yoshimoto (SDSC)
In order to monitor the function of a production gateway, Neuroscience Gateway (NSG), two programs were developed to test gateway functions: data upload, job submission, and output retrieval. NSG uses the Workbench Framework (WF) code base. Other gateways using WF are COSMIC2 and CIPRES. The WF gateway can provide both a non-API web interface and a RESTful API. NSG makes both these interfaces available to users. For routine monitoring of production status, programs were written to do a daily test of both interfaces. The programs and testing process will be presented.
Essentials for a Successful XRAC Proposal: Code Performance and Scaling
Presenter(s): Lars Koesterke (TACC)
Many PI's struggle with putting together a sound computational plan based on code performance and scaling information. In fact, for first-time PI's the most common reason for rejection is an insufficient computational plan. With this new training module we are trying to address this problem. The training module attempts to answer two questions: Why is scaling and performance data important and how is it used by reviewers, and how to use this data to put together a computational plan? Currently the module is geared towards traditional HPC communities and we are working on extending the content towards new communities. The purpose of my talk at the ECSS symposium is to bring staff members on the same page and to raise awareness that there is a new resource available that may help educating users struggling with writing a successful XRAC proposal.
April 21, 2020
Beginners tutorial on cloud devops on Jetstream focused on Kubernetes and JupyterHub
Presenter(s): Andrea Zonca (SDSC)
This symposium assume no previous knowledge of cloud technologies and will cover the following topics: * Example virtual machine setup with Openstack command line tools * Deploying a Kubernetes Cluster on Jetstream * How Kubernetes works, architecture, differences between containers and Virtual Machines * Deploying JupyterHub on Jetstream for a workshop
March 17, 2020
AMP Gateway: An portal for atomic, molecular and optical physics simulations.
Presenter(s): Sudhakar Pamidighantam (Indiana University)
We describe the creation of a new Atomic and Molecular Physics science gateway (AMPGateway). The gateway is designed to bring together a subset of the AMP community to work collectively to make their software suites available and easier to use by the partners as well as others. By necessity, a project such as this requires the developers to work on issues of portability, documentation, ease of input, as well as making sure the codes can run on a variety of architectures. The gateway was built using Apache Airavata gateway middleware framework. Initially it was deployed using the Airavata PHP client on the web but has since been redeployed under a Django web framework. Here we outline the organization and facility of the Django deployment and how it has been used discuss future directions for the AMP gateway.
Bursting into the public Cloud – Sharing my experience doing it at large scale for IceCube
Presenter(s): Igor Sfiligoi (SDSC)
When compute workflow needs spike well in excess of the capacity of a local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. I have recently helped IceCube expand their resource pool by a few orders of magnitude, first to 380 PFLOP32s for a few hours and later to 170 PFLOP32s for a whole workday. In the process we moved O(50 TB) of data to and from the clouds, showing that networking is not a limiting factor, either. While there was a non-negligible dollar cost involved with each, the effort involved was quite modest. In this session I will explain what was done and how, alongside an overview of why IceCube needs so much compute.
January 21, 2020
CUDA-Python and RAPIDS for blazing fast scientific computing
Presenter(s): Abe Stern (NVIDIA)
We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming.
December 17, 2019
Extracting Domain Information using Deep Learning
Presenter(s): Amit Gupta (TACC)
In this session we will present an overview of our exploration of using Deep Learning in extracting entities of interest from journal article text. Over various scientific domains, extracting and curating new knowledge from large bodies of text remains a challenging task. To this end, we have developed a computational tool, named DIVE (Domain Informational Vocabulary Extraction) to provide entity extraction and expert curation functionality. The tool has been integrated with the publication pipeline used by American Society of Plant Biologists. Using the author feedback mechanism in our deployed tool we were able to create an expert user annotated dataset based on articles submitted over an entire year. This new gold standard dataset for supervised training now enables us to contrast several methods for the entity extraction task. We use the NeuroNER tool to investigate the effectiveness of deep neural network in this task and also contrast it with other tools using a variety of different methods such as ABNER (using CRF) and DIVE (using an ensemble of regular expression rules, keyword dictionaries and ontology files). Our early results from NeuroNER training with author annotations shows very promising improvement on predicting the important words from the documents. This makes it an excellent candidate for future development and integration into the DIVE tool.
The Distant Reader: Reading at scale
Presenter(s): Eric Lease Morgan (Notre Dame)
The Distant Reader is a tool for reading. It takes an arbitrary amount of unstructured data (text) as input, and it outputs sets of structured data for analysis -- reading. Given a corpus of just about any size (hundreds of books or thousands of journal articles), the Distant Reader analyzes the corpus, and outputs a myriad of reports enabling the researcher to use & understand the corpus. Designed with college students, graduate students, scientists, or humanists in mind, the Distant Reader is intended to supplement the traditional reading process. This presentation outlines the problems the Reader is intended to address as well as the way it is implemented on the Jetstream platform with the help of both software and personnel resources from XSEDE. The Distant Reader is freely available for anybody to use at https://distantreader.org
October 15, 2019
On Developing Reusable Software Components for the Advanced Cyberinfrastructure
Presenter(s): Ritu Arora (TACC)
Developing reusable software components that can be integrated in unforeseen software projects has the potential of enhancing the productivity of the programmers who are reusing the software. However, the initial cost of developing such components can be higher than developing components for a single use-case. In this talk, we will discuss a couple of reusable software components that were developed for the BOINC@TACC and Gateway-In-a-Box (GIB) projects. One software component is named as Greyfish and it is a portable, cloud-based filesystem. Another software component is named as Midas, which is a tool for automating the generation of Docker images from source code. Both these software components were initially prototyped for predefined needs and were tightly coupled with other components they interoperated with. However, after determining that the amount of effort involved in teasing out these components and making them available as stand-alone software is insignificant and can help with the sustainability goals of the aforementioned projects, we refactored these software components, and wrote clear documentation for installing and using them. Doing this helped us in improving the software quality - people in the community started using the software, and helped us in fixing some bugs and improving the documentation. In summary, there is often a direct or indirect cost involved in making software reusable, and this cost may vary from project to project. However, the long-term sustainability and maintenance needs of the project may far outweigh the cost associated with software reusability.
Exploring the Dynamics of a Quantum-Mechanical Compton Generator
Presenter(s): Marty Kandes (SDSC)
In 1913, while he was still was an undergraduate, American physicist Arthur Compton invented a simple way to measure the rotation rate of the Earth with a tabletop-sized experiment, independent of any astronomical observation. The experiment consisted of a large diameter circular ring of thin glass tubing filled with water and oil droplets. After placing the ring in a plane perpendicular to the surface of the Earth and allowing the fluid mixture of oil and water to come to rest, Compton then abruptly rotated the ring, flipping it 180 degrees about an axis passing through its own plane. The result of the experiment was that the water acquired a measurable drift velocity due to the Coriolis effect arising from the daily rotation of the Earth about its own axis. Compton measured this induced drift velocity by observing the motion of the oil droplets in the water with a microscope. This device, now named after him, is known as a Compton generator. The fundamental research objective of this XSEDE project is to explore the dynamics of a quantum-mechanical analogue to the classical Compton generator experiment through the use of numerical simulations. In this presentation, I describe how the physics of the problem itself drives many of the computational challenges in the simulations; what numerical methods and computational techniques were implemented in the custom simulation code written to explore the problem (and other quantum systems in rotating frames of reference); the performance characteristics and limitations of this code; some challenges in creating a post-simulation visualization pipeline; as well as the latest results and future directions of the project.
September 17, 2019
The "Morelli Machine": A Proposal Testing a Critical, Algorithmic Approach to Art History
Presenter(s): Paul Rodriguez (SDSC)
The Morelli Machine refers to an algorithmic approach to characterizing authorship from the late 19th century which proposed that fine details of minor items in a painting would reveal particular styles. The PIs set out to test the hypothesis that contemporary computer vision techniques could perform this sort of "stylistic" matching. In order to do this, they sought to mechanize a method that is indigenous to art history and that uses details as a proxy for style. This project approached the question of "style" as one of extracting features that have some discriminatory power for distinguishing paintings or groups of paintings. We used feature discovery from a pretrained convolution network (VGG19) for object recognition. We processed both whole images and some class of image parts (ie mouths), and performed clustering. In this presentation I will review the image preparation steps, extraction steps, clustering results, and cluster evaluation. The upshot is that all convolution layers indeed have discriminatory features, and different layers might have different kinds of features, with different interpretability that may be hard to define.
Improving Science Gateways usage reporting for XSEDE
Presenter(s): Amit Chourasia (SDSC)
Science domain-specific gateways have gained wide use by providing easy web-based access to complex cyberinfrastructure. Science Gateways are consuming an increasing proportion of computational capacity provided by XSEDE. A typical approach used by Science Gateways is to use a single community account with a compute allocation to process compute jobs on behalf of their end users. The computation usage for Science Gateways is compiled from batch job submission systems and reported by the XSEDE service providers. However, this reporting does not capture and provide information about the user who actually initiated the computation, as the batch systems do not have this information. To overcome this reporting limitation, Science Gateways utilize a separate pipeline to submit job-specific attributes to XSEDE, which is then later co-joined with batch system information submitted by the Service Providers to create detailed usage reports. In this presentation I will describe improvements to the Gateway attribute reporting system, which better serves the needs of the growing Science Gateway community and provides them with a simpler and streamlined way to report usage and ultimately publish this information via XDMoD.
August 20, 2019
Hadoop and Spark on a Shared Resource
Presenter(s): Byron Gill (PSC)
Hadoop, Spark, and the ecosystem of other software that interacts with them are in demand, but many of the assumptions about the typical use case for these programs don't apply to the typical user on a shared HPC cluster. This talk will explore some of the challenges in creating a workable environment within the confines of a shared cluster and describe some of the approaches we've used at PSC to accommodate the needs of our users.
Lessons learned in Developing a coupling interface between Kinetic PUI code (Fortran) and a Global MHD code (C++)
Presenter(s): Laura Carrington (SDSC)
The objective of the PI's team was to obtain a quantitative understanding of the dynamical heliosphere, from its solar origin to its interaction with the LISM, by creating a data-driven suite of models of the Sun-to-LISM connection. To accomplish this, I worked to develop a coupling interface between a Kinetic PUI code (Fortran) and a Global MHD code (C++). The kinetic PUI code models the nonthermal (pickup) ions (PUIs) created as new populations of neutral atoms are born in the SW and LISM. The PUIs generate turbulence that heats up the thermal ions. PUIs are further accelerated to create anomalous cosmic rays (ACRs). This code was originally serial and designed to compute a single trajectory of a particle. The coupling allows the PUI code to get magnetic field data from a large Global MHD parallel simulation code and compute ~5000 trajectories in a single run. The challenges of parallelizing the PUI code and coupling its Fortran77 and Fortran90 code with the C++ Global MHD code is presented along with lessons learn in working with mixed mode codes and on TACC Stampede2.
June 18, 2019
HPC+Jupyter for Computational Chemistry
Presenter(s): Albert Lu (TACC)
Methods of computational chemistry have demonstrated remarkable power in predicting materials properties, and therefore are widely utilized in academic researches and industrial applications. In 2018, at TACC for example, over 30% of the computational time used on the supercomputer Stampede2 were chemistry/materials science related applications. Providing a more intuitive way of performing simulations can not only help lower the learning curve for new users, but also create a different user experience and value. In this presentation, Albert Lu (TACC) will give an overview of interactive computing with Jupyter notebook, and demonstrate how to setup and run interactive simulation jobs (of LAMMPS) on Stampede2. Related tools for parallel computing (IPython Parallel) and workflow managing (Parsl) will also be discussed in this talk.
The Development of a Mobile Augmented Reality Application for Visualizing the Protein Data Bank
Presenter(s): Max Collins (UC Irvine)
Principal Investigator(s): Alan Craig (U. Illinois and Shodor)
In 2015-2016, then undergraduate student Max Collins was in the Blue Waters Student Internship Program. In that internship, he received training in high performance computing and developed a project in conjunction with his mentor, Alan Craig. His project was to create a mobile augmented reality application to visualize the Protein Data Bank. This presentation will discuss the technical details and development process of that application. In addition, Max will address how the internship and this application has affected his schooling and career choices. An early version of the application can be seen in the video on this page: http://www.ncsa.illinois.edu/news/story/blue_waters_intern_visualizes_a_career_in_app_development