Gateways & Workflows Symposium Series

The following are recorded presentations from the XSEDE Science Gateways and Scientific Workflows Symposium series. If you are interested, please subscribe to the Science Gateways and Scientific Workflow mailing lists.

To subscribe to the Workflows mailing list, email majordomo@xsede.org with "subscribe workflows" in the body of the message.

To subscribe to the Gateways mailing list, email majordomo@xsede.org with "subscribe gateways" in the body of the message.

Links to the Gateways and Workflows pages:

August 19, 2016

Data Sharing and Streaming with Airavata

Presenter(s): Jeff Kinnison (University of Notre Dame)

Presentation Slides Slides for both Presentations

Video of talk can be seen here

Abstract: Despite progress in making scientific computing accessible, science gateways still face the challenges of providing feedback to and sharing research among users. To address these challenges, Apache Airavata has recently added the capability to stream data from remote computing nodes and share projects and experiments with users.

Data streaming allows for application-level remote monitoring using secure communications protocols. Data to be streamed is defined at the application-level and may be incorporated into gateways using a WebSockets server deployed next to Airavata and JavaScript client-side code. Project and experiment sharing allows multiple users to access experiment inputs and outputs in addition to allowing users to clone shared projects. User permissions are set coarsely at the project level and can be fine-tuned on a per-experiment basis to allow easy, secure collaboration.

Biography: Jeff Kinnison is a PhD student at the University of Notre Dame studying Computer Vision and Computational Neuroscience under Dr. Walter Scheirer. He is currently completing a Google Summer of Code project for the Science Gateways Group at Indiana University.

Interactive scripting of Airavata API using Interactive Jupyter Notebooks

Presenter(s): Pradyut Madhavaram (CUNY)

Abstract: Apache Airavata is science gateway middleware with a well-defined API that can be integrated with both Web and desktop clients. These user interfaces are static: the client code must be changed in order to accommodate a new requested feature. Online notebooks such as Jupyter enable their users to in effect script the Web interface, providing much greater user level flexibility. Here we demonstrate how to build a notebook-style gateway, with both regular user and administrator features, through Jupyter notebooks by scripting with the Airavata API.

Such approaches allow greater flexibility. From a user perspective, these notebooks would help them launch, monitor experiments where many parameters have to be provided as inputs especially in atmospheric sciences, molecular dynamics etc. These notebooks could also help in interactive distributed computing on real scientific data and scientific outputs. For gateway administrators, a Jupyter-based frontend enables them to do a thorough analysis of the functionality of the gateway.

Bio: Pradyut Madhavaram is currently pursuing Masters in Business Administration from Baruch College in City University of New York, with an intended major in Computer Information Systems and a minor in Statistics. He is working with the Science Gateways Group at Indiana University, Bloomington as an intern for the summer.

July 29, 2016

Generating Reliable Predictions of Batch Queue Wait Time

Presenter(s): Rich Wolski (UCSB)

Presentation Slides

Link for Video of Talk

In this talk we will discuss the process of generating batch queue wait times predictions that take the form of a "guaranteed" maximum delay before an individual user's job begins executing. The methodology uses QBETS (Quantile Bounds Estimation from Time Series) to make predictions for each job submitted to a batch queue in real time (i.e. at the time the job is submitted). To further improve prediction accuracy, QBETS can be applied to different queues (normal, debug, etc.) for the same machine. It also attempts to correct for hidden scheduler parameters (e.g. backfilling thresholds) using fast, on-line clustering.

We show the effectiveness of QBETS using historian trace data gathered from TeraGrid during its live operation, and XSEDE traces gathered by XDMoD.

July 8, 2016

Science Gateways Community Institute

Presenter(s): Nancy Wilkins-Diehr (SDSC)

Presentation Slides

Video link

Abstract: Science gateways, also known as web portals, virtual research environments, virtual laboratories, are a fundamental part of today's research landscape. But they can be difficult to develop in a sustainable fashion. This talk will provide an overview of the Science Gateways Community Institute, which aims to address these challenges by offering services to and building community among the research communities developing gateways. The institute is comprised of five areas to support gateways throughout their lifecycle:
  • Incubator will provide shared expertise in business and sustainability planning, cybersecurity, user interface design, and software engineering practices.
  • · Extended Developer Support will provide expert developers for up to one year to projects that request assistance and demonstrate the potential to achieve the most significant impacts on their research communities.
  • · Scientific Software Collaborative will offer a component-based, open-source, extensible framework for gateway design, integration, and services, including gateway hosting and capabilities for external developers to integrate their software into Institute offerings.
  • · Community Engagement and Exchange will provide a forum for communication and shared experiences among gateway developers, user communities, within NSF, across federal agencies, and internationally.
  • · Workforce Development will increase the pipeline of gateway developers with training programs, including special emphasis on recruiting underrepresented minorities, and by helping universities form gateway support groups.
We envision this work as an extension of the XSEDE ECSS gateway program, which focuses on connecting existing science gateways with XSEDE resources.

June 24, 2016

Improving Karnak's Wait-time Predictions

Presenter(s): Jungha Woo (Purdue University)

Presentation Slides

Abstract: Karnak (http://karnak.xsede.org/karnak/index.html) is the prediction service of job queue wait time for the XSEDE resources including Comet, Darter, Gordon, Maverick and Stampede. Karnak users include individual researchers and science gateways that consult wait time predictions to decide where to submit their computation within XSEDE. Based on feedback from the community, this XSEDE Software Development and Integration (SD& I) project aims at improving the Karnak service to increase the accuracy of its predictions. This talk will describe Karnak's design, the machine learning technique used, and the accuracy improvement made through this SD&I project.

Bio – Jungha Woo is a Software Engineer in the Research Computing at the Purdue University. His Ph.D. work included analyzing investors' behavioral biases in the U.S. stock markets and implementing profitable strategies utilizing irrational behaviors. His experience and interests lie in the statistical analysis of scientific data, and software development. Jungha develops scientific software to help high-performance computational communities run modelling, prediction jobs. Jungha holds a Ph.D. in Electrical and Computer Engineering, a M.Sc. and B.Sc. in Computer Science.

May 20, 2016

Intelligent Sensors for the Internet of Things: Parallel Computing on Chicago Street Poles

Presenter(s): Dr. Pete Beckman (ANL)

Presentation Slides

Video Link

Abstract: Sensors and embedded computing devices are being woven into buildings, roads, household appliances, and light bulbs. And while the Internet of Things (IoT) is at the peak of its hype curve, there are challenging science questions and multidisciplinary research problems as the technology pushes into society. Waggle (www.wa8.gl) -- an open source, open hardware research project at Argonne National Laboratory -- is developing a novel wireless sensor system to enable a new breed of smart city research and sensor-driven environmental science. Our new IoT sensor platform is focused on sensing and actuation that requires in-situ computation, such as is needed for image recognition via packages such as OpenCV, audio classifiers, and autonomous control — essentially a parallel, distributed computing environment in a small box. Waggle is the core technology for the Chicago ArrayOfThings (AoT) project (https://arrayofthings.github.io). The AoT will deploy 500 Waggle-based nodes on the streets of Chicago beginning in 2016. Prototype versions are already deployed on a couple campuses. Sensor boards are being tested for deployment in solar-powered trash cans (http://bigbelly.com), and we are currently exploring a test deployment in street kiosks in New York City. The presentation will outline the current progress of designing and deploying the current platform, and our progress on research topics in computer science, including parallel computing, operating system resilience, and data aggregation.

Bio: Pete Beckman is the co-director of the Northwestern-Argonne Institute for Science and Engineering. From 2008-2010 he was the director of the Argonne Leadership Computing Facility, where he led the Argonne team working with IBM on the design of Mira, a 10 petaflop Blue Gene/Q. Pete joined Argonne in 2002, serving first as director of engineering and later as chief architect for the TeraGrid, where he led the design and deployment team that created the world's most powerful Grid computing system for linking production HPC computing centers for the National Science Foundation. After the TeraGrid became fully operational, Pete started a research team focusing on petascale high-performance system software, wireless sensors, and operating systems. Pete also coordinates the collaborative research activities in extreme-scale computing between the US Department of Energy and Japan's ministry of education, science, and technology. He is the founder and leader of the Waggle project to build intelligent attentive sensors. The Waggle technology and software framework is being used by the Chicago Array of Things project to deploy 500 sensors on the streets of Chicago beginning in 2016. Pete also has experience in industry. After working at Los Alamos National Laboratory on extreme-scale software for several years, he founded a Turbolinux-sponsored research laboratory in 2000 that developed the world's first dynamic provisioning system for cloud computing and HPC clusters. The following year, Pete became vice president of Turbolinux's worldwide engineering efforts, managing development offices in the US, Japan, China, Korea, and Slovenia. Dr Beckman has a Ph.D. in computer science from Indiana University (1993) and a BA in Computer Science, Physics, and Math from Anderson University (1985).

May 13, 2016

Stream Data Processing at Scale

Presenter(s): Roger Barga (Amazon)

Presentation Slides

Streaming Video

Abstract

Streaming analytics is about identifying and responding to events happening in your business, in your service or application, or environment in near real-time. Sensors, mobile and IoT devices, social networks, and online transactions are all generating data that can be monitored constantly to enable one to detect and then act on events and insights before they lose their value. The need for large scale, real-time stream processing of big data in motion is more evident than ever before but the potential remains largely untapped. It's not the size but rather the speed at which this data must be processed that presents the greatest technical challenges. Streaming analytics systems can enable business to inspect, correlate and analyze data in real-time to extract insights in the same manner that traditional analytics tools have allowed them to do with data at rest. In this talk I will draw upon our experience with Amazon Kinesis data streaming services to highlight use cases, discuss technical challenges and approaches, and look ahead to the future of stream data processing and role of cloud computing.

Biography

Roger Barga is General Manager and Director of Development at Amazon Web Services, responsible for Kinesis data streaming services including Kinesis Streams, Kinesis Firehose, and Kinesis Analytics. Before joining Amazon, Roger was in the Cloud Machine Learning group at Microsoft, responsible for product management of the Azure Machine Learning service. His experience and research interests include data storage and management, data analytics and machine learning, distributed systems and building scalable cloud services, with emphasis on stream data processing and predictive analytics. Roger is also an Affiliate Professor at the University of Washington, where he is a lecturer in the Data Science and Machine Learning programs. Roger holds a PhD in Computer Science, a M.Sc. in Computer Science with an emphasis on Machine Learning, and a B.Sc. in Mathematics and Computer Science. Roger holds over 30 patents, he has published over 100 peer-reviewed technical papers and book chapters, and authored a book on predictive analytics

March 25, 2016

Jetstream Overview

Presenter(s): Jeremy Fischer (IU)

Presentation Slides

The presentation describes the motivation behind Jetstream, its functions, hardware configuration, software environment, user interface, design, and use cases. It is a high level look at what Jetstream's capabilities are with the intent of fostering discussions about how Jetstream can fit your research needs.

Jeremy Fischer's Bio: Senior Technical Advisor. In this role, I act as the liaison between the technical staff and researchers, providing technical outreach. In addition, I act as the Swiss army knife, managing allocations and reviews, helping with technical work during system deployment, as well as helping develop the featured image sets for the Jetstream environment.

February 26, 2016

Next Generation Hydroinformatics System for Big Data Management, Analysis, and Visualization

Presenter(s): Ibrahim Demir (IIHR)

Presentation Slides

As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data and communicate the understanding to stakeholders. Recent developments in web technologies make it easy to manage, analyze, visualize and share large data sets with the public. Novel visualization techniques and dynamic user interfaces allow users to interact with data, and change the parameters to create custom views of the data to gain insight from simulations and environmental observations. This requires developing new data models and intelligent knowledge discovery techniques to explore and extract information from complex computational simulations and large data repositories. Scientific visualization will be an increasingly important part to build comprehensive environmental information platforms. The presentation includes information on sensor networks, scientific computing and visualization techniques on the web, and sample applications from hydrological and atmospheric sciences.

Speaker Bio: Ibrahim Demir an Assistant Research Professor at the IIHR – Hydroscience and Engineering, and he also has secondary appointments at the Department of Electrical and Computer Engineering, and Civil and Environmental Engineering at the University of Iowa. His research focuses on hydroinformatics, environmental information systems, scientific visualization, big data analytics, and information communication. He currently serve at various national and international informatics and cyberinfrastructure committees including the CUAHSI Informatics Committee, NSF EarthCube Technology and Architecture Committee, Unidata User Committee, and Joint Committee on Hydroinformatics (IWA/IAHR/IAHS). Dr. Demir is also main developer and architect of many popular information systems including Iowa Flood Information System, Iowa Water Quality Information System, and NASA IFloodS Information System and many others.

February 19, 2016

MPContribs and MPComplete - new research infrastructure enabling user contributions to Materials Project

Presenter(s): Patrick Huck (Berkeley Lab)

Presentation Slides

In this talk, we give an overview of two new components of research infrastructure employed in Materials Project (https://materialsproject.org): (i) MPComplete is a service underpinning MP's "Crystal ToolKit" which enables its users to suggest new materials for calculation and subsequent addition to MP's core database. As opposed to MP's other production jobs at NERSC, user-submitted calculations are re-routed to XSEDE. (ii) MPContribs allows users to annotate existing materials with complementary experimental and theoretical data, hence further expanding MP's role as a user-maintained community database. The contributed data is disseminated through our portal using a generic user interface, the functionality of which can be extended by the user via customized web apps based on MPContribs software modules and driven by MP's infrastructure.

Bio:
Patrick Huck started his scientific career as a high-energy nuclear physicist in the international STAR collaboration at the Relativistic Heavy Ion Collider hosted by Brookhaven National Laboratory. Since 2014, he is a Software Engineer on staff in Materials Project at Lawrence Berkeley National Laboratory. In his role as part of MP's core team, he develops scientific software to help MP's users explore new domains but also maintains, improves and expands its research infrastructure.

February 12, 2016

Globus Auth

Presenter(s): Steve Tuecke (CI)

Presentation Slides

Globus Auth is a foundational identity and access management (IAM) platform service, used for brokering authentication and authorization interactions between end-users, identity providers, resource servers (services), and clients (including web, mobile, and desktop applications, and other services). The goal of Globus Auth is to enable an extensible, integrated ecosystem of services and clients for the research and education community. In this talk I will introduce and demonstrate Globus Auth, discuss its rollout as a new XSEDE service, and examine how it can be used to enhance clients and services such as science gateways with advanced IAM functionality.

Bio:
Steven Tuecke co-leads the Globus project (www.globus.org) with Dr. Ian Foster, and is Deputy Director of the Computation Institute at The University of Chicago (UC) and Argonne National Laboratory. His focus is on the development of sustainable, cloud-based, software-as-a-service data management solutions to accelerate research. Prior to UC, Steven was co-founder, CEO and CTO of Univa Corporation from 2004-2008, providing open source and proprietary software for the high-performance computing and cloud computing markets. Before that, he spent 14 years at Argonne as research staff. Tuecke graduated summa cum laude with a B.A in mathematics and computer science from St. Olaf College.

October 30, 2015

Evolution of the CIPRES Science Gateway, Lessons Learned and Next Steps

Presenter(s): Mark Miller (SDSC)

Presentation Slides

The CIPRES Science Gateway is a public resource that enables browser- and RESTful access to phylogenetics codes run on high performance compute resources available through the XSEDE program. Over the past 5 years, CIPRES has run jobs for more than 12,000 scientists around the world, and enabled more than 1800 peer reviewed publications. This talk will provide an overview of the evolution of the CIPRES Science Gateway: challenges faced and lessons learned in launching new services in a production resource with a large user base. The talk will also describe plans for implementing the CIPRES Notebook Environment, which will provide access to HPC resources via an interface based on the Jupyter notebook project. The Jupyter notebook represents an exciting new paradigm for interactive scientific computing, and incorporating it into CIPRES will enable many capabilities that are not readily available to many phylogenetics researchers.

October 23, 2015

XSEDE Software Development and Integration as XSEDE transitions to XSEDE2.

Presenter(s): J.P. Navarro (ANL) Shava Smallen (SDSC)

Presentation Slides

JP Navarro and Shava Smallen will join the call to brief us on current services, provide updates on what's in the XSEDE Software Development and Integration pipeline and also give a preview of how these processes will change as XSEDE transitions to XSEDE2. I scheduled this because I would like to see science gateways and XSEDE SD&I work more closely together.

October 16, 2015

Integrating Globus Transfer in the GridChem gateway

Presenter(s): Stu Martin (University of Chicago) Eric Blau (Argonne National Laboratory)

This presentation and demo will detail the design and implementation changes made to the GridChem gateway to integrate Globus transfer and sharing on SDSC's Comet compute resource. This new capability, available to all gateways, enables a gateway's users to transfer files between the gateway's community account on participating XSEDE compute resources and other Globus endpoints, like a user's laptop.

First, a Globus shared endpoint is created per GridChem user to set the root directory accessible to them. Globus ACLs can be set to further limit read and/or write access to this directory.

Next, the GridChem desktop client was enhanced to call the Globus transfer (REST) API to seamlessly list files and perform transfers directly from the GridChem community account on Comet to the Gateway user's laptop.

A key use case is to eliminates the need for unnecessary (2 hop) file transfers involving the GridChem middleware file store. For example, a GridChem user's job output files are on Comet and the user needs them on their laptop (or other Globus endpoint). Eliminating this extra hop is especially important when transferring large files.

Stu and Eric will be looking for other gateways that would like hands on support to add similar functionality into their gateway.

September 25, 2015

WSO2 Identity Server

Presenter(s): Prabath Siriwardena (WS02)

Presentation Slides

WSO2 Identity Server addresses sophisticated security and identity management needs of enterprise web applications, services, and APIs, and makes life easier for developers and architects with its hassle-free, minimal monitoring and maintenance requirements. Further, Identity Server can act as an Enterprise Identity Bus (EIB) — a central backbone to connect and manage multiple identities regardless of the standards on which they are based.

In this talk, Prabath Siriwardena, the Director of Security Architecture at WSO2, will provide an overview of identity management concepts and how those are implemented in WSO2 Identity Server.

September 18, 2015

XD Metrics Service: Comprehensive Management for HPC Resources using XDMoD

Presenter(s): Thomas Furlani ()

Presentation Slides

In this presentation we will discuss the utility of XDMoD for providing comprehensive management of HPC resources, including metrics for utilization, quality of service, and job level performance. Through XDMoD¹s graphical user interface, users are able to readily generate plots for a wide range of utilization and performance metrics. In terms of performance, the XDMoD system runs a series of computationally lightweight benchmarks (application kernels) to measure quality of service of a given resource. We will show how this has been useful to proactively identify underperforming hardware and software. In addition, XDMoD, through integration with TACC_Stats or Performance CoPilot, provides system support personnel with detailed job level performance data for every job running on the cluster without the need to recompile the application code. This information can be used to automatically identify poorly performing user codes as well as provide insight into how best to improve their performance.

August 28, 2015

The XSEDE Text Analytics Gateway - Basic Text Analysis Tools Without the Programming

Presenter(s): Drew Schmidt (UTK)

Presentation Slides

The humanities and social sciences have historically been dominated by qualitative methods, often performed without the aid of computers. But as questions become more complicated and data collections grow, researchers have increasingly found themselves turning to quantitative methods and computational resources. But there is some resistance, often driven by a "technological familiarity gap". Indeed, researchers in these fields often lack a programming background, and have historically had little access to training. Worse, there is generally no institutional reward for developing technical skills. The XSEDE Text Analytics Gateway (TAG) seeks to offer a basic set of tools for the 99% of researchers in these fields working in text analytics, powered by the big iron available only through XSEDE. In this talk, we will introduce the gateway, its history and goals, its current state and progress, as well as future work.

August 21, 2015

OntoSoft: A Software Commons for Geosciences

Presenter(s): Yolanda Gil (USC)

There is a significant amount of software developed by scientists that is never published, and as a result this software cannot be reused by others and is eventually lost. This includes software for data transformations, quality control, and other data preparation software. We refer to this as "dark software", by analogy with Heidorn's "dark data". This talk will argue that this dark software represents very valuable scientific products, and will describe our work on the EarthCube OntoSoft project to lower the barriers for sharing all forms of software developed by scientists. Our work to date includes the OntoSoft ontology for describing scientific software metadata, and the OntoSoft software registry. We have also developed a training program where scientists learn to describe and cite software in their papers in addition to data and provenance. This training program is part of a Geoscience Papers of the Future Initiative, where scientists learn as they are writing a journal paper that can be submitted to a Special Section of the AGU Earth and Space Science Journal. More information about OntoSoft is at http://www.ontosoft.org/.

July 24, 2015

Transition from Blacklight to Bridges via Greenfield

Presenter(s): Sergiu Sanielevici (PSC) Nick Nystrom (PSC) J Ray Scott (PSC)

Presentation Slides Transitioning from Blacklight to Bridges

Presentation Slides Bridges

Sergiu Sanielevici, Nick Nystrom, and J Ray Scott from PSC will be on the call to brief everyone on the new Bridges machine coming in early 2016, Blacklight decommissioning plans for August 15th, and Greenfield, the bridge to Bridges. Bridges will have many capabilities that will be of interest to the gateway and workflow communities.

June 26, 2015

Jupyter (previously IPython) Notebook in HPC

Presenter(s): Andrea Zonca (SDSC)

Presentation Slides

The Jupyter (previously IPython) Notebook is a browser-based programming front-end that integrates code in diverse programming languages, formatted text, latex equations and embedded plots in the same document. It allows scientists to explore their data, share and reproduce their analysis.

In this talk I'll first introduce how the Jupyter Notebook works, how it can be used in HPC for interactive work, within workflow systems and in Scientific Gateways. Then I'll introduce Jupyterhub, the multi-user Notebook server, how a plugin I developed (RemoteSpawner) can provide an integration with Torque or SLURM to provide easy access to Supercomputing resources. Finally how from the Notebook it is possible to launch large batch jobs with IPython Parallel.

The purpose of the talk is to get feedback and brainstorm about how to best use Jupyter technologies in HPC.

Bio: Andrea Zonca has a background in Cosmology, during his PhD and PostDoc he worked on analyzing Cosmic Microwave Background data from the Planck Satellite. In order to manage and analyze large datasets, he developed expertise in Supercomputing, in particular parallel computing in Python and C++. At the San Diego Supercomputer Center he works in the Scientific Computing Applications team, he works on a CUDA/C++ molecular dynamics package, on Cosmology data analysis and on making HPC easier to access with Jupyter and Jupyterhub. Andrea is also a certified instructor of Software Carpentry and teaches automation with bash, version control with git and programming with Python to scientists.

June 12, 2015

Teamwork without teams? How much should scientific software development learn from "the open source way"?

Presenter(s): James Howison (University of Texas)

Presentation Slides

What does working in a team mean? Teamwork can be exhilarating, blending skills together for greater results. Yet teams can also be frustrating with misunderstandings, waiting for others' contributions, and endless meetings. And that's all before even trying to share the spoils. Whatever teams are, they imply interdependence: relying on others to get to one's goal. And for many reasons, particularly in science, that's a difficult thing. But what if one could work together without interdependence? What might that even mean? Surely if we're working together then we're already interdependent? In this presentation I'm going to explain a way of working that I call "open superposition," and argue that it is at the core of "the open source way". I'll draw on my empirical research in open source projects, both participant observation and archival reconstruction of work. I'll argue that work with individual payoffs that ruthlessly defers the complex work that leads to motivational interdependence can nonetheless result in software that is complex, collectively built, and useful. But not always: open superposition is possible only in particular circumstances. With open superposition in mind we'll consider the sources of motivational interdependence in scientific software work and discuss the trade-offs involved in reducing motivational interdependence.

Bio: James Howison is an Assistant Professor in the Information School of the University of Texas at Austin, where he has been since August 2011 and is part of the Information Work Research Group. Prior to UT James was a post-doctoral associate at the Institute for Software Research at the Carnegie Mellon School of Computer Science, working with Jim Herbsleb. James received his PhD in 2009 and a B. Economics (social sciences) from the University of Sydney. James studies people at work building technologies, particularly free and open source software and, more recently, scientific software development and use, seeking to develop socio-technical insights to guide theory and action. He has published in venues like MIS Quarterly, Information and Organization, ACM CSCW, and JASIST. His research has been supported by the NSF, most recently with an NSF CAREER grant to study transitions from grant-funding to open source style peer production.

May 29, 2015

The Computational Anatomy Gateway and MRICloud.org

Presenter(s): Michael Miller (Johns Hopkins University)

We will discuss the use of the CA Gateway for hosting services assocaited to MRICloud.org. Michael Miller will present MRICoud use case, and Daniel Tward will present recent results on optimization and GPUs demonstrating various calculations being run.

May 15, 2015

Diffusion of Innovations & Science Gateways

Presenter(s): Kerk Kee (Chapman University) Mona Sleiman (Chapman University)

Abstract:

As dispersed individuals and groups come together to develop new tools, these innovations spread through the social system over time. Science gateways can be conceptualized as such a generation of innovations in the scientific community. Diffusion of Innovations (Rogers, 2003) is a communication theory that explains the spread of innovations within a social system. This talk provides a broad overview of diffusion theory, including the patterns of the diffusion process, innovation attributes for successful adoption, adopter categories, and social mechanisms to promote diffusion. Based on an ongoing project (NSF ACI #1322305, 2013-2016), PI (Kerk Kee) and his graduate research assistant (Mona Sleiman) will share some preliminary findings about the attributes of computational tools, attributes of successful virtual organizations in e-science, and the macro conditions that influence the diffusion of computational tools in the general XSEDE community.

Bios:

Kerk F. Kee (Ph.D., 2010, The University of Texas at Austin) is an Assistant Professor in the Department of Communication Studies at Chapman University in Orange County, California. His research centers on the diffusion of innovations theory. He studies the spread of cyberinfrastructure and big data technologies through cross-disciplinary collaborations in scientific organizations, the flow of health information through social clusters in online communities, and more recently, the dissemination of pro-environmental behaviors through persuasive messages in modern societies. Kerk's diffusion research has been funded by the National Science Foundation and the Bill & Melinda Gates Foundation. His work has appeared in outlets such as IEEE Computer, Computer Supported Collaborative Work, Journal of Computer-Mediated Communication, Health Communication, CyberPsychology, Behavior, & Social Networking, etc.

Mona Sleiman (M.S. Candidate, 2015, Chapman University) serves as a Graduate Research Assistant on the OCT (Organizing, Communication, Technology) Research Group at Chapman University. In her research endeavors, she has immersed herself in the cyberinfrastructure community with a commitment to understanding the communication processes that bridge human organizing and emerging technologies. As a research assistant with a managerial role, she has also been able to practice what she studies in an effort to coordinate and guide the team to success. With a passion for the intersection of theoretical research and real-world applications, Mona is dedicated to providing practical strategies grounded in research that will enable to communities to most efficiently communicate and organize.

May 1, 2015

Application Generation with GenApp

Presenter(s): Emre Brookes (UTHSC)

Presentation Slides

GenApp is a new open framework generating code on a set of scientific modules that is easily extensible to new environments. For example, one can take a set of module definitions and generate a complete HTML5/PHP science gateway and a Qt4/GUI application on the identical set of modules. If a new technology comes along, the framework can easily be extended to new "target languages" by including appropriate code fragments without effecting the underlying modules. One motivation for the development was based upon observation of the life cycle of scientific lab generated code, which frequently is underfunded and developed by overburdened researchers. Many times useful code and routines are lost with the retirement or redirected interest of the scientists. One goal for this framework is to insure good scientific software be preserved in an ever evolving software landscape without the expense of a full time CS staff. This framework is currently being used to wrap scientific code performing small angle scattering computations, but is not restricted to any one discipline. A successful GSoC 2014 project integrated GenApp with Apache Airavata for execution of modules on variously managed cluster resources in the HTML5/PHP, Qt3/GUI and Qt4/GUI "target languages". In this presentation, Emre Brookes will explain the framework, demonstrate its application and discuss future plans.

Bio - Emre is an assistant professor in the Department of Biochemistry at the University of Texas Health Science Center at San Antonio, yet his degrees are in Computer Science and Mathematics. His Ph.D. work included developing and implementing new algorithms for analysis of experimental data with a new scalable constrained least squares fitting method and a novel regularization method using genetic algorithms. To provide the scientific community access to these methods, he created the first UltraScan Science Gateway, which has since migrated to Apache Airavata. These methods annually use millions of cpu hours of parallel resources supporting scientific research world wide. His work concentrates on developing tools for analysis of scientific experimental data. He is the primary developer of the US-SOMO hydrodynamic modeling suite http://somo.uthscsa.edu and is actively involved with the hydrodynamic modeling, small-angle scattering and high-performance computational communities. He has given over 30 talks at conferences in these areas and has, as of this writing, contributed to 29 peer reviewed publications. His most recent work, GenApp, focuses on developing an open framework to ease deployment of new and legacy scientific codes.

April 17, 2015

Workflow and Gateways in the XSEDE Architecture

Presenter(s): Andrew Grimshaw (UVA)

The XSEDE project seeks to provide "a single virtual system that scientists can use to interactively share computing resources, data and experience." The underlying system implementation provides naming, authentication, data management, and job management services via a unified system architecture. The architecture consists of three layers: an access layer, a services layer, and a resources layer. Users interact with XSEDE via the access layer. Access layer tools, such as gateways and workflow engines, interact with the services layer which virtualizes the underlying logical and physical resources.

This short talk describes how the services of the XSEDE Execution Management Services (EMS) can be used by access layer and higher level service tools such as gateways and workflow engines to provide end users with capabilities that meet their needs. The talk begins with a discussion of the XSEDE use-cases that drive the architectural design, followed by the EMS architecture. We then show how higher level services, specifically a meta scheduling grid queue, a DAGMAN-like workflow engine, Airavata-based scientific gateways, and SCIBUS/WS-PGRADE/gUSE have been layered over the XSEDE EMS architecture. The objective is to demonstrate to implementation teams of other gateways and workflow engines how they could approach layering their user-focused tools onto the standards-based XSEDE EMS architecture.

April 3, 2015

IPython and Science Gateways

Presenter(s): Ian Stokes-Rees ()

Abstract: From 2007 until 2011 I was working on developing computationally intensive workflows for novel techniques protein structure studies, leveraging Open Science Grid. Throughout this period I enjoyed participating in the XSEDE Gateways community. Once a web portal interface was developed, presenting the GB of data and tens of thousands of files became a challenge. IPython Notebooks to the rescue. By programmatically constructing an IPython Notebook it was possible to present the analysis "results" along with the opportunity to interact with the analysis process and parameters, recomputing portions or producing alternate data graphs. Jump forward to 2015 and today I am working at Continuum Analytics where we have many clients in industry, government, and academia who are desperate for a collaborative analytics platform that can leverage a shared infrastructure base (compute clusters, storage, software), and provide mechanisms around IPython Notebook sharing and publishing. My contribution to this talk will be to provide perspectives on the current and future opportunities presented by the "Notebook" model.

Project Jupyter: a language-independent architecture for data science, from interactive computing to reproducible publications

Presenter(s): Fernando Perez (Berkeley)

Presentation Slides https://github.com/fperez/talk-microsoft-python-2014

Project Jupyter is the evolution of the IPython interactive computing system into language-agnostic components to support all aspects of computational research. Jupyter abstracts the basic elements of interactive computing into openly documented file formats and protocols. Using these, "Jupyter kernels" can execute code in any language and communicate with clients that range from simple terminals to the rich web-based Jupyter Notebook that supports code, results, rich media, text and mathematics.

March 27, 2015

FACE-IT: Adopting Globus Galaxies for Crop Modeling

Presenter(s): Ravi Madduri (Argonne National Laboratory)

FACE-IT(Framework to Advance Climate, Economic, and Impact Investigations with Information Technology) is an effort to develop a new IT infrastructure to accelerate existing disciplinary research and enable information transfer among traditionally separate fields. At present, finding data and processing it into usable form can dominate research efforts. By providing ready access to not only data but also the software tools used to process it for specific uses (e.g., climate impact and economic model inputs), FACE-IT allows researchers to concentrate their efforts on analysis. Lowering barriers to data access allows researchers to stretch in new directions and allows researchers to learn and respond to the needs of other fields. FACE-IT accomplishes these goals by building and integrating a number of really awesome web-based software tools to enable researchers to easily develop data manipulation and analysis applications, apply those apps to their own data and to data provided by others, link multiple apps into data analysis pipelines, and share such pipelines with their collaborators and community. In this talk, we will talk about how we have adopted the Globus Galaxies Science-as-a-Service platform from FACE-IT usecases and communities.

March 13, 2015

Scientific Workflows Using Science Gateway Technology

Presenter(s): Saba Sehrish (Fermilab)

The scientific discovery process can be advanced by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally, it is important for scientists to be able to share their workflows with collaborators.

In this presentation we will describe two efforts using Galaxy, an "open web-based platform for data intensive biomedical to enable cosmological research". One is a Portal for Data Analysis Services for Cosmological Simulations (PDACS). The main purpose of the PDACS project is to maximize the science output from large simulated cosmological datasets. The main simulation data that will be available is the Hybrid/Hardware Accelerated Cosmology Code (HACC) simulation sample currently stored at the National Energy Research Scientific Computing Center (NERSC). The initial thrust of PDACS is to permit the running of complex scientist-contributed analysis tools that operate on this large dataset, without specialized knowledge of the underlying systems that are required to operate those tools on such a dataset. The second effort is the production of a demonstration analysis portal for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC). Following upon the development of several detailed use cases, we have begun to use Galaxy to implement an analysis portal that would allow scientists to run complicated workflows that involve the use of a variety of computational resources (including grid resources, supercomputing resources at NERSC, and local compute nodes) for the execution of workflows on simulations of LSST images. We will present a brief description of the Galaxy framework, and describe the kinds of extensions to the system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community.

February 27, 2015

NumFOCUS: A foundation for sustaining Open Science Software Tools

Presenter(s): Andy Terrel (Continuum Analytics)

Ever notice how many science software projects die after the graduate student leaves? At SciPy 2011, a group of scientific software developers tackled the question of how to get our tools better funding. We know they get used in academia and industry, but neither gives them the recognition that can keep a healthy sustainable software ecosystem alive. Thus we have seen our colleagues take jobs in the tech sector, leaving science for good. NumFOCUS was formed to be a response to the massive brain drain from our most important software tools. We help project raise money and educate people about open source tools.

February 13, 2015

Enabling Cloud Bursting for Life Sciences within Galaxy

Presenter(s): Enis Afgan (Johns Hopkins University)

To keep up with the growth of data analysis needs in life sciences, it is becoming necessary to utilize distributed and federated compute and storage resources. The Galaxy application can be used as a locally deployed service, in the Cloud or via any of the public sites. In this talk, we'll look at the ongoing efforts on how to unify compute resources available to Galaxy to enable higher throughput of user jobs.

January 30, 2015

CyberGIS Workflow for Collaborative, Interactive, and Scalable Knowledge Discovery

Presenter(s): Shaowen Wang (UIUC)

CyberGIS represents an interdisciplinary field combining advanced cyberinfrastructure, geographic information science and systems (GIS), spatial analysis and modeling, and a number of geospatial domains to improve research productivity and enable scientific breakthroughs. It has also emerged as a fundamentally new GIS modality in the era of geospatial big data. This presentation discusses how a cutting-edge cyberGIS software environment enables visual analytical workflows for supporting collaborative, interactive, and scalable knowledge discovery through processing and visualizing complex and massive amounts of geospatial data and performing associated analysis, simulation, and visualization.

October 10, 2014

Enterprise Continuum - Building Reusable Infrastructure

Presenter(s): George Turner (Indiana University)

Enterprise Architecture (EA) at Indiana University (IU) utilizes The Open Group Architecture Forum's (TOGAF) framework. One of the primary components of TOGAF is the concept of an Enterprise Repository (ER). The ER is composed of Reference Architectures (RA) which are built upon an organization's Architectural Building Blocks (ABB) and Solutions Building Blocks (SBB). The over arching view of the Architectural Repository with its component building blocks is called the Enterprise Continuum. This talk will focus on the properties of effective building blocks and establishing useful Reference Architectures.

September 18, 2014

XSEDE and the NIST Digital Repository of Mathematical Formulae

Presenter(s): Howard Cohl (NIST)

In this talk, Dr. Howard Cohl from NIST describes the NIST Digital Repository of Mathematical Formulae (DRMF) Project. This project has been greatly facilitates through use of network enabled Linux instances allocated through the XSEDE project. This talk with introduce the DRMF which is designed for a mathematically literate audience and has the following goals: (1) facilitate interaction among a community of mathematicians and scientists interested in compendia formulae data for orthogonal polynomials and special functions; (2) be expandable, allowing the input of new formulae from the literature; (3) represent the context-free full semantic information concerning individual formulas; (4) have a user friendly, consistent, and hyperlinkable viewpoint and authoring perspective; (5) contain easily searchable mathematics; and (6) take advantage of modern MathML tools for easy to read, scalable rendered content driven mathematics. We will also discuss the current state and future planned efforts of the DRMF

April 4, 2014

Complex Social Sciences Gateway

Presenter(s): Douglas White (University of California, Irvine)

This week's XSEDE Science Gateway Community call features a talk by Dr. Douglas White from University of California, Irvine introducing the Complex Social Sciences Gateway.