Science Gateways Symposium

Many recorded presentations from the XSEDE Science Gateways and Scientific Workflows Symposium series are available for viewing. They will be available for download in the near future. Please check back soon.

If you would like to participate in the series, please subscribe to the Science Gateways and ECSS Workflow mailing lists.

To subscribe to the Gateways mailing list, email majordomo@xsede.org with "subscribe gateways" in the body of the message.

To subscribe to the Workflows mailing list, email majordomo@xsede.org with "subscribe workflows" in the body of the message.

More information is available on the Science Gateways and ECSS Workflows pages.

Key Points
Available for viewing online
Subscribe for more info
Contact Information
XSEDE Science Gateways Expert
Science Gateways Community Institute

August 19, 2016

Data Sharing and Streaming with Airavata

Presenter(s): Jeff Kinnison (University of Notre Dame)

Presentation Slides Slides for both Presentations

Video of talk can be seen here

Abstract: Despite progress in making scientific computing accessible, science gateways still face the challenges of providing feedback to and sharing research among users. To address these challenges, Apache Airavata has recently added the capability to stream data from remote computing nodes and share projects and experiments with users.

Data streaming allows for application-level remote monitoring using secure communications protocols. Data to be streamed is defined at the application-level and may be incorporated into gateways using a WebSockets server deployed next to Airavata and JavaScript client-side code. Project and experiment sharing allows multiple users to access experiment inputs and outputs in addition to allowing users to clone shared projects. User permissions are set coarsely at the project level and can be fine-tuned on a per-experiment basis to allow easy, secure collaboration.

Biography: Jeff Kinnison is a PhD student at the University of Notre Dame studying Computer Vision and Computational Neuroscience under Dr. Walter Scheirer. He is currently completing a Google Summer of Code project for the Science Gateways Group at Indiana University.

Interactive scripting of Airavata API using Interactive Jupyter Notebooks

Presenter(s): Pradyut Madhavaram (CUNY)

Abstract: Apache Airavata is science gateway middleware with a well-defined API that can be integrated with both Web and desktop clients. These user interfaces are static: the client code must be changed in order to accommodate a new requested feature. Online notebooks such as Jupyter enable their users to in effect script the Web interface, providing much greater user level flexibility. Here we demonstrate how to build a notebook-style gateway, with both regular user and administrator features, through Jupyter notebooks by scripting with the Airavata API.

Such approaches allow greater flexibility. From a user perspective, these notebooks would help them launch, monitor experiments where many parameters have to be provided as inputs especially in atmospheric sciences, molecular dynamics etc. These notebooks could also help in interactive distributed computing on real scientific data and scientific outputs. For gateway administrators, a Jupyter-based frontend enables them to do a thorough analysis of the functionality of the gateway.

Bio: Pradyut Madhavaram is currently pursuing Masters in Business Administration from Baruch College in City University of New York, with an intended major in Computer Information Systems and a minor in Statistics. He is working with the Science Gateways Group at Indiana University, Bloomington as an intern for the summer.


July 29, 2016

Generating Reliable Predictions of Batch Queue Wait Time

Presenter(s): Rich Wolski (UCSB)

Presentation Slides

Link for Video of Talk

In this talk we will discuss the process of generating batch queue wait times predictions that take the form of a "guaranteed" maximum delay before an individual user's job begins executing. The methodology uses QBETS (Quantile Bounds Estimation from Time Series) to make predictions for each job submitted to a batch queue in real time (i.e. at the time the job is submitted). To further improve prediction accuracy, QBETS can be applied to different queues (normal, debug, etc.) for the same machine. It also attempts to correct for hidden scheduler parameters (e.g. backfilling thresholds) using fast, on-line clustering.

We show the effectiveness of QBETS using historian trace data gathered from TeraGrid during its live operation, and XSEDE traces gathered by XDMoD.


July 8, 2016

Science Gateways Community Institute

Presenter(s): Nancy Wilkins-Diehr (SDSC)

Presentation Slides

Video link

Abstract: Science gateways, also known as web portals, virtual research environments, virtual laboratories, are a fundamental part of today's research landscape. But they can be difficult to develop in a sustainable fashion. This talk will provide an overview of the Science Gateways Community Institute, which aims to address these challenges by offering services to and building community among the research communities developing gateways. The institute is comprised of five areas to support gateways throughout their lifecycle:
  • Incubator will provide shared expertise in business and sustainability planning, cybersecurity, user interface design, and software engineering practices.
  • · Extended Developer Support will provide expert developers for up to one year to projects that request assistance and demonstrate the potential to achieve the most significant impacts on their research communities.
  • · Scientific Software Collaborative will offer a component-based, open-source, extensible framework for gateway design, integration, and services, including gateway hosting and capabilities for external developers to integrate their software into Institute offerings.
  • · Community Engagement and Exchange will provide a forum for communication and shared experiences among gateway developers, user communities, within NSF, across federal agencies, and internationally.
  • · Workforce Development will increase the pipeline of gateway developers with training programs, including special emphasis on recruiting underrepresented minorities, and by helping universities form gateway support groups.
We envision this work as an extension of the XSEDE ECSS gateway program, which focuses on connecting existing science gateways with XSEDE resources.


June 24, 2016

Improving Karnak's Wait-time Predictions

Presenter(s): Jungha Woo (Purdue University)

Presentation Slides

Abstract: Karnak (http://karnak.xsede.org/karnak/index.html) is the prediction service of job queue wait time for the XSEDE resources including Comet, Darter, Gordon, Maverick and Stampede. Karnak users include individual researchers and science gateways that consult wait time predictions to decide where to submit their computation within XSEDE. Based on feedback from the community, this XSEDE Software Development and Integration (SD& I) project aims at improving the Karnak service to increase the accuracy of its predictions. This talk will describe Karnak's design, the machine learning technique used, and the accuracy improvement made through this SD&I project.

Bio – Jungha Woo is a Software Engineer in the Research Computing at the Purdue University. His Ph.D. work included analyzing investors' behavioral biases in the U.S. stock markets and implementing profitable strategies utilizing irrational behaviors. His experience and interests lie in the statistical analysis of scientific data, and software development. Jungha develops scientific software to help high-performance computational communities run modelling, prediction jobs. Jungha holds a Ph.D. in Electrical and Computer Engineering, a M.Sc. and B.Sc. in Computer Science.


May 20, 2016

Intelligent Sensors for the Internet of Things: Parallel Computing on Chicago Street Poles

Presenter(s): Dr. Pete Beckman (ANL)

Presentation Slides

Video Link

Abstract: Sensors and embedded computing devices are being woven into buildings, roads, household appliances, and light bulbs. And while the Internet of Things (IoT) is at the peak of its hype curve, there are challenging science questions and multidisciplinary research problems as the technology pushes into society. Waggle (www.wa8.gl) -- an open source, open hardware research project at Argonne National Laboratory -- is developing a novel wireless sensor system to enable a new breed of smart city research and sensor-driven environmental science. Our new IoT sensor platform is focused on sensing and actuation that requires in-situ computation, such as is needed for image recognition via packages such as OpenCV, audio classifiers, and autonomous control — essentially a parallel, distributed computing environment in a small box. Waggle is the core technology for the Chicago ArrayOfThings (AoT) project (https://arrayofthings.github.io). The AoT will deploy 500 Waggle-based nodes on the streets of Chicago beginning in 2016. Prototype versions are already deployed on a couple campuses. Sensor boards are being tested for deployment in solar-powered trash cans (http://bigbelly.com), and we are currently exploring a test deployment in street kiosks in New York City. The presentation will outline the current progress of designing and deploying the current platform, and our progress on research topics in computer science, including parallel computing, operating system resilience, and data aggregation.

Bio: Pete Beckman is the co-director of the Northwestern-Argonne Institute for Science and Engineering. From 2008-2010 he was the director of the Argonne Leadership Computing Facility, where he led the Argonne team working with IBM on the design of Mira, a 10 petaflop Blue Gene/Q. Pete joined Argonne in 2002, serving first as director of engineering and later as chief architect for the TeraGrid, where he led the design and deployment team that created the world's most powerful Grid computing system for linking production HPC computing centers for the National Science Foundation. After the TeraGrid became fully operational, Pete started a research team focusing on petascale high-performance system software, wireless sensors, and operating systems. Pete also coordinates the collaborative research activities in extreme-scale computing between the US Department of Energy and Japan's ministry of education, science, and technology. He is the founder and leader of the Waggle project to build intelligent attentive sensors. The Waggle technology and software framework is being used by the Chicago Array of Things project to deploy 500 sensors on the streets of Chicago beginning in 2016. Pete also has experience in industry. After working at Los Alamos National Laboratory on extreme-scale software for several years, he founded a Turbolinux-sponsored research laboratory in 2000 that developed the world's first dynamic provisioning system for cloud computing and HPC clusters. The following year, Pete became vice president of Turbolinux's worldwide engineering efforts, managing development offices in the US, Japan, China, Korea, and Slovenia. Dr Beckman has a Ph.D. in computer science from Indiana University (1993) and a BA in Computer Science, Physics, and Math from Anderson University (1985).


May 13, 2016

Stream Data Processing at Scale

Presenter(s): Roger Barga (Amazon)

Presentation Slides

Streaming Video

Abstract

Streaming analytics is about identifying and responding to events happening in your business, in your service or application, or environment in near real-time. Sensors, mobile and IoT devices, social networks, and online transactions are all generating data that can be monitored constantly to enable one to detect and then act on events and insights before they lose their value. The need for large scale, real-time stream processing of big data in motion is more evident than ever before but the potential remains largely untapped. It's not the size but rather the speed at which this data must be processed that presents the greatest technical challenges. Streaming analytics systems can enable business to inspect, correlate and analyze data in real-time to extract insights in the same manner that traditional analytics tools have allowed them to do with data at rest. In this talk I will draw upon our experience with Amazon Kinesis data streaming services to highlight use cases, discuss technical challenges and approaches, and look ahead to the future of stream data processing and role of cloud computing.

Biography

Roger Barga is General Manager and Director of Development at Amazon Web Services, responsible for Kinesis data streaming services including Kinesis Streams, Kinesis Firehose, and Kinesis Analytics. Before joining Amazon, Roger was in the Cloud Machine Learning group at Microsoft, responsible for product management of the Azure Machine Learning service. His experience and research interests include data storage and management, data analytics and machine learning, distributed systems and building scalable cloud services, with emphasis on stream data processing and predictive analytics. Roger is also an Affiliate Professor at the University of Washington, where he is a lecturer in the Data Science and Machine Learning programs. Roger holds a PhD in Computer Science, a M.Sc. in Computer Science with an emphasis on Machine Learning, and a B.Sc. in Mathematics and Computer Science. Roger holds over 30 patents, he has published over 100 peer-reviewed technical papers and book chapters, and authored a book on predictive analytics


March 25, 2016

Jetstream Overview

Presenter(s): Jeremy Fischer (IU)

Presentation Slides

The presentation describes the motivation behind Jetstream, its functions, hardware configuration, software environment, user interface, design, and use cases. It is a high level look at what Jetstream's capabilities are with the intent of fostering discussions about how Jetstream can fit your research needs.

Jeremy Fischer's Bio: Senior Technical Advisor. In this role, I act as the liaison between the technical staff and researchers, providing technical outreach. In addition, I act as the Swiss army knife, managing allocations and reviews, helping with technical work during system deployment, as well as helping develop the featured image sets for the Jetstream environment.


February 26, 2016

Next Generation Hydroinformatics System for Big Data Management, Analysis, and Visualization

Presenter(s): Ibrahim Demir (IIHR)

Presentation Slides

As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data and communicate the understanding to stakeholders. Recent developments in web technologies make it easy to manage, analyze, visualize and share large data sets with the public. Novel visualization techniques and dynamic user interfaces allow users to interact with data, and change the parameters to create custom views of the data to gain insight from simulations and environmental observations. This requires developing new data models and intelligent knowledge discovery techniques to explore and extract information from complex computational simulations and large data repositories. Scientific visualization will be an increasingly important part to build comprehensive environmental information platforms. The presentation includes information on sensor networks, scientific computing and visualization techniques on the web, and sample applications from hydrological and atmospheric sciences.

Speaker Bio: Ibrahim Demir an Assistant Research Professor at the IIHR – Hydroscience and Engineering, and he also has secondary appointments at the Department of Electrical and Computer Engineering, and Civil and Environmental Engineering at the University of Iowa. His research focuses on hydroinformatics, environmental information systems, scientific visualization, big data analytics, and information communication. He currently serve at various national and international informatics and cyberinfrastructure committees including the CUAHSI Informatics Committee, NSF EarthCube Technology and Architecture Committee, Unidata User Committee, and Joint Committee on Hydroinformatics (IWA/IAHR/IAHS). Dr. Demir is also main developer and architect of many popular information systems including Iowa Flood Information System, Iowa Water Quality Information System, and NASA IFloodS Information System and many others.


February 19, 2016

MPContribs and MPComplete - new research infrastructure enabling user contributions to Materials Project

Presenter(s): Patrick Huck (Berkeley Lab)

Presentation Slides

In this talk, we give an overview of two new components of research infrastructure employed in Materials Project (https://materialsproject.org): (i) MPComplete is a service underpinning MP's "Crystal ToolKit" which enables its users to suggest new materials for calculation and subsequent addition to MP's core database. As opposed to MP's other production jobs at NERSC, user-submitted calculations are re-routed to XSEDE. (ii) MPContribs allows users to annotate existing materials with complementary experimental and theoretical data, hence further expanding MP's role as a user-maintained community database. The contributed data is disseminated through our portal using a generic user interface, the functionality of which can be extended by the user via customized web apps based on MPContribs software modules and driven by MP's infrastructure.

Bio:
Patrick Huck started his scientific career as a high-energy nuclear physicist in the international STAR collaboration at the Relativistic Heavy Ion Collider hosted by Brookhaven National Laboratory. Since 2014, he is a Software Engineer on staff in Materials Project at Lawrence Berkeley National Laboratory. In his role as part of MP's core team, he develops scientific software to help MP's users explore new domains but also maintains, improves and expands its research infrastructure.


October 30, 2015

Evolution of the CIPRES Science Gateway, Lessons Learned and Next Steps

Presenter(s): Mark Miller (SDSC)

Presentation Slides

The CIPRES Science Gateway is a public resource that enables browser- and RESTful access to phylogenetics codes run on high performance compute resources available through the XSEDE program. Over the past 5 years, CIPRES has run jobs for more than 12,000 scientists around the world, and enabled more than 1800 peer reviewed publications. This talk will provide an overview of the evolution of the CIPRES Science Gateway: challenges faced and lessons learned in launching new services in a production resource with a large user base. The talk will also describe plans for implementing the CIPRES Notebook Environment, which will provide access to HPC resources via an interface based on the Jupyter notebook project. The Jupyter notebook represents an exciting new paradigm for interactive scientific computing, and incorporating it into CIPRES will enable many capabilities that are not readily available to many phylogenetics researchers.


October 23, 2015

XSEDE Software Development and Integration as XSEDE transitions to XSEDE2.

Presenter(s): J.P. Navarro (ANL) Shava Smallen (SDSC)

Presentation Slides

JP Navarro and Shava Smallen will join the call to brief us on current services, provide updates on what's in the XSEDE Software Development and Integration pipeline and also give a preview of how these processes will change as XSEDE transitions to XSEDE2. I scheduled this because I would like to see science gateways and XSEDE SD&I work more closely together.


October 16, 2015

Integrating Globus Transfer in the GridChem gateway

Presenter(s): Stu Martin (University of Chicago) Eric Blau (Argonne National Laboratory)

This presentation and demo will detail the design and implementation changes made to the GridChem gateway to integrate Globus transfer and sharing on SDSC's Comet compute resource. This new capability, available to all gateways, enables a gateway's users to transfer files between the gateway's community account on participating XSEDE compute resources and other Globus endpoints, like a user's laptop.

First, a Globus shared endpoint is created per GridChem user to set the root directory accessible to them. Globus ACLs can be set to further limit read and/or write access to this directory.

Next, the GridChem desktop client was enhanced to call the Globus transfer (REST) API to seamlessly list files and perform transfers directly from the GridChem community account on Comet to the Gateway user's laptop.

A key use case is to eliminates the need for unnecessary (2 hop) file transfers involving the GridChem middleware file store. For example, a GridChem user's job output files are on Comet and the user needs them on their laptop (or other Globus endpoint). Eliminating this extra hop is especially important when transferring large files.

Stu and Eric will be looking for other gateways that would like hands on support to add similar functionality into their gateway.


September 25, 2015

WSO2 Identity Server

Presenter(s): Prabath Siriwardena (WS02)

Presentation Slides

WSO2 Identity Server addresses sophisticated security and identity management needs of enterprise web applications, services, and APIs, and makes life easier for developers and architects with its hassle-free, minimal monitoring and maintenance requirements. Further, Identity Server can act as an Enterprise Identity Bus (EIB) — a central backbone to connect and manage multiple identities regardless of the standards on which they are based.

In this talk, Prabath Siriwardena, the Director of Security Architecture at WSO2, will provide an overview of identity management concepts and how those are implemented in WSO2 Identity Server.


September 18, 2015

XD Metrics Service: Comprehensive Management for HPC Resources using XDMoD

Presenter(s): Thomas Furlani ()

Presentation Slides

In this presentation we will discuss the utility of XDMoD for providing comprehensive management of HPC resources, including metrics for utilization, quality of service, and job level performance. Through XDMoD¹s graphical user interface, users are able to readily generate plots for a wide range of utilization and performance metrics. In terms of performance, the XDMoD system runs a series of computationally lightweight benchmarks (application kernels) to measure quality of service of a given resource. We will show how this has been useful to proactively identify underperforming hardware and software. In addition, XDMoD, through integration with TACC_Stats or Performance CoPilot, provides system support personnel with detailed job level performance data for every job running on the cluster without the need to recompile the application code. This information can be used to automatically identify poorly performing user codes as well as provide insight into how best to improve their performance.


July 24, 2015

Transition from Blacklight to Bridges via Greenfield

Presenter(s): Sergiu Sanielevici (PSC) Nick Nystrom (PSC) J Ray Scott (PSC)

Presentation Slides Transitioning from Blacklight to Bridges

Presentation Slides Bridges

Sergiu Sanielevici, Nick Nystrom, and J Ray Scott from PSC will be on the call to brief everyone on the new Bridges machine coming in early 2016, Blacklight decommissioning plans for August 15th, and Greenfield, the bridge to Bridges. Bridges will have many capabilities that will be of interest to the gateway and workflow communities.