XSEDE14 is hosted by

XSEDE, Extreme Science and Engineering Discovery Environment, xsede.org.

In cooperation with

SigApp Logo

XSEDE is supported by the 
National Science Foundation

In partnership with

 

Platinum Sponors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Non-profit Silver Sponsor

Non-profit Bronze Sponsors

 

 

 

Patron Sponsors

 

In-kind Sponsors

Media Sponsor

Tutorials

 

Programming the Xeon Phi

Length: Full Day

Level: Intermediate

Over the past few years, the use of heterogeneous architectures in HPC at the large scale has become increasingly common. One exciting new technology for HPC is the Intel Xeon Phi co-processor also known as the MIC. The Xeon Phi is x86 based, hosts its own Linux OS, and is capable of running most codes with little porting effort. However, the MIC architecture has significant features that are different from that of current x86 CPUs, and attaining optimal performance requires an understanding of possible execution models and the architecture.


This is an expanded tutorial based on the full day tutorial presented at XSEDE 2013. Experienced C/C++ and Fortran programmers will be introduced to techniques essential for utilizing the MIC architecture efficiently. Multiple lectures and hands-on exercises will be used to acquaint attendees with the MIC platform and to explore the different execution modes as well as parallelization and optimization through example testing and reports. All exercises will be executed on the Stampede system at the Texas Advanced Computing Center (TACC). Stampede features more than 2PF of performance using 100,000 Intel Xeon E5 cores and an additional 7+ PF of performance from more than 6,400 Xeon Phi.

 

Why XSEDE?

Length: Half Day, Morning

Level: Beginner

Prerequisites:

1.     Browser – preferably Firefox or Safari

2.     Tableau Desktop which can be downloaded at http://www.tableausoftware.com.  It's a fully functioning Tableau trial. Downloads available for Windows and Mac.

Are you wondering why you should be interested in XSEDE? Do you want to know how using XSEDE, High Performance Computing, Analytics and Informatics tools, and other advanced digital tools and services can benefit you, your students, and your institution? Explore the possibilities through a combination of example project presentations and hand-on activities that let you test drive some basic tools! Learn what types of projects humanities and social science researchers are doing.  Interact with an XSEDE gateway and get a taste of what visualization can do for your research and teaching. Find out how you can get started. The tutorial will be presented in four sections:

  • High Performance Computing for Humanities, Arts, and Social Science
  • Easy Access: Gateways and Portals with a Hands On GIS demo
  • Hands-One with Tableau: A Simple Tool for Information Visualization
  • How Students, Faculty, and Institutions Can Engage with XSEDE

 

Introduction to Modeling in Sage

Length: Half Day, Morning

Level: Beginner

We propose a half day workshop on using the open source mathematical software system Sage. The first portion of the workshop would be introductory, proceeding to an intermediate level in the second portion.

Prerequisites for the workshop are some familiarity with programming, and knowledge of mathematics comparable to an undergraduate calculus sequence.

The purpose of the workshop is to bring participants in the modeling competition up to speed in the use of the system which they will be using during the competition.

 

Introduction to Parallel Computing with OpenMP and MPI

Length: Half Day, Afternoon

Level: Beginner/Intermediate

Prerequisites: Attendees will need to bring their own laptops.  Windows users should install PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html).

This tutorial will introduce participants to parallel programming through two of the most common ways of doing it: OpenMP and MPI. Students will learn skills during this tutorial that can be applied to machines of all scales, from laptops to supercomputers. Instructors will engage students in a hands-on environment that will encourage guided discovery through group exercises and labs.

 

MPI Tuning or How to Save SUs by Optimizing your MPI Library!

Length: Half Day, Morning

Level: Intermediate/Advanced

With the diversity of platforms, it is impossible for MPI libraries to automatically provide the best performance for all existing applications. Through this tutorial, attendees will discover that MPI libraries are not black boxes and contain several options allowing the user to enhance MPI applications. From basic (Process Mapping, Collective Tuning...) to advanced features (Multicast, Unreliable Datagram, Kernel-assisted Approaches), this tutorial will cover a large spectrum of possibilities offered by MPI libraries to improve the performance of parallel applications on both XSEDE systems and other clusters.

This tutorial will introduce tuning  flags or features available on different MPI libraries (mainly MVAPICH2 and Intel MPI with references to Open MPI) with hands-on activities on Stampede. A summary sheet with the most important options to tune MVAPICH2, Intel MPI and Open MPI will be provided to users.

Scientific Visualization: Concept, Execution and Ubiquitous Access

Length: Half Day, Afternoon

Level: Beginner

Prerequisites: Please make sure you download and install all software and data mentioned below ahead of tutorial (Do not rely on internet connection at the conference)

Visualization is largely understood and used as an excellent communication tool by researchers. This narrow view often keeps scientists from fully using and developing their visualization skillset. This tutorial will provide a "from the ground up" understanding of visualization and its utility in error diagnostic and exploration of data for scientific insight. When used effectively visualization can provide a complementary and effective toolset for data analysis, which is one of the most challenging problems in computational domains. In this tutorial we plan to bridge these gaps by providing end users with fundamental visualization concepts, execution tools, customization and usage examples. Finally, a short introduction to SeedMe.org will be provided where users will learn how to share their visualization results ubiquitously.

 

Enhanced Campus Bridging via a Campus Data Service Using Globus and the Science DMZ

Length: Half Day, Morning

Level: Beginner (30%), Intermediate (60%), Advanced (10%)

Prerequisites: We would like to ask participants to bring a laptop with a "modern" web browser (and, ideally, an SSH client) installed. 

Existing campus data services are limited in their reach and utility due, in part, to unreliable tools and a wide variety of storage systems with sub-optimal user interfaces. Further, the underlying networking and security infrastructure on most campuses is complex and inflexible, resulting in many challenges to effective campus bridging. In this session, participants will learn how to deliver robust services for managing research data that span campus systems, national cyberinfrastructure, and public cloud resources.

An increasingly common solution to campus bridging comprises Globus operating within the Science DMZ, enabling reliable, secure file transfer and sharing, while optimizing use of existing high-speed network connections and campus identity infrastructures. The combined solution allows research computing resource owners and systems administrators to deliver enhanced campus data services to end users at optimal quality of service, while ensuring predictable network performance and integrity. Globus is installed at most XSEDE resource providers and we will draw on experiences from research computing centers (Cornell University) and HPC facilities (SDSC) to highlight the challenges such facilities face in delivering scalable campus data services. Attendees will be introduced to Globus and to the components of the Science DMZ, and have the opportunity for hands-on interaction with these systems

Campus Bridging Technologies – using the Basic XSEDE Cluster Stack

Length: Half Day, Afternoon

Level: Intermediate

This tutorial session focuses on using the Basic XSEDE Cluster Stack to create a system using Rocks, or using the XSEDE Cluster YUM repo to extend an existing campus cluster to use XSEDE software.  Users will learn how to install the Rocks rolls for XSEDE, add the software repository to existing RPM-based machines, and learn about the Open-Source Software XSEDE offers in the Cluster Stack and YUM repo.  IU will either provide a small cluster to work on from their SAM iDataplex resource. 

XSEDE New User Tutorial

Length: Half Day, Afternoon

Level: Beginner

This tutorial will provide training and hands-on activities to help new users learn and become comfortable with the basic steps necessary to first obtain, and then successfully employ an XSEDE allocation to accomplish their research or educational goals. The tutorial will consist of four sections: The first part of the tutorial will explain the XSEDE allocations process and how to write and submit successful allocation proposals. The instructor will describe the contents of an outstanding proposal and the process for generating each part. Topics covered will include the scientific justification, the justification of the request for resources, techniques for producing meaningful performance and scaling benchmarks, and navigating the POPS system through the XSEDE Portal for electronic submission of proposals. The second section, "Information Security Training for XSEDE Researchers," will review basic information security principles for XSEDE users including: how to protect yourself from on-line threats and risks, how to secure your desktop/laptop, safe practices for social networking, email and instant messaging, how to choose a secure password and what to do if your account or machine have been compromised. The third part of the tutorial will cover the New User Training material that is been delivered remotely quarterly, but will delve deeper into these topics. New topics will be covered, including how to troubleshoot a job that has not run, and how to improve job turnaround by understanding differences in batch job schedulers on different platforms. Finally, we will discuss how to transfer files between different resources using an easy and powerful tool, Globus Online. We will demonstrate how to perform the various tasks with live, hands-on activities and personalized help. The practice section of the tutorial will consist of hands-on activities including submitting a job, figuring out why it has not run and transferring files between supercomputers. In the event of network issues we will have demos available as a backup. We anticipate significant interest from Campus Champions, and therefore we will explain how attendees can assist others, as well as briefly describe projects that are being currently carried out in non-traditional HPC disciplines.

Parallel I/O - for Reading and Writing Large Files in Parallel

Length: Half Day, Morning

Level: Intermediate

Developing an understanding of efficient parallel I/O and adapting your application accordingly can result in orders of magnitude of performance gains without overloading the parallel file system. This half-day tutorial will provide an overview of the practices and strategies for the efficient utilization of parallel file systems through parallel I/O for achieving high performance. The target audience is analysts and application developers who do not have prior experience with MPI I/O, HDF5, and T3PIO. However, they should be familiar with C/C++/Fortan programming and basic MPI. A brief overview of the related basic concepts will be included in the tutorial where needed.

All the concepts related to the tutorial will be explained with examples and there will be a laboratory/hands-on session. In the hands-on session, the audience will be given four exercises in a time-period of one hour. They will be provided with the skeleton programs written in C/Fortran and the instructions to modify the programs such that the modified programs can do parallel I/O. The programs provided for the hands-on session will include comments/place-holders to guide the audience in modifying the code. The hands-on session will help the audience to test the knowledge gained during the tutorial. By the end of the tutorial, the audience will have learnt to do parallel I/O (through MPI I/O and the high-level libraries discussed in this tutorial) and will be motivated to apply the knowledge gained to get much higher I/O performance from their applications than earlier.

Introduction to the Latest Features in MPI-3

Length: Half Day, Afternoon

Level: Intermediate/Advanced

In an effort to position MPI strongly for multi-core and highly scalable systems, and to address the compelling needs of the end-users, the MPI forum released the major extension to MPI in the form of Version 3.0 in September 2012. This latest version of MPI, referred to as MPI-3, includes several new features like nonblocking collectives, new one-sided communication operations, and Fortran 2008 bindings. Unlike MPI-2.2, this standard is considered a major update to the MPI standard. This half-day tutorial will include discussions and hands-on sessions on the following topics: a general overview of the main features added to MPI and a detailed overview of one-sided communication or RMA, nonblocking collectives, and version detection. This tutorial is meant for intermediate to advanced MPI programmers who are interested in learning about some of the latest additions to the widely used MPI standard in order to increase the performance of their applications and to reduce energy consumption.

ADIOS 101: How to achieve high performance I/O

Length: Half Day, Afternoon

Level: Beginner (20%), Intermediate (40%), Advanced (40%)

As scientific applications begin to scale on more cores on HPC systems, they need to take advantage of the underlying parallel storage system without worrying about huge levels of complexity from complex I/O systems. One of the major roadblocks to address this challenge is how to write and read scientific data quickly and efficiently on high-end machines. One of the common stumbling blocks is in terms of data formats and operations performed (e.g., files, streams) and optimizations on different computer systems with different architectures. Our group has researched, developed, and created an I/O framework, ADIOS (R&D 100 winner 2013), which abstracts the API from the implementation, and allows users to write and read data efficiently on large levels of concurrency. In this tutorial we will show how users can achieve high performance with ADIOS for large-scale simulation.

Hadoop Based Data Analysis Tools on SDSC Gordon Supercomputer

Length: Half Day, Morning

Level: Intermediate

The Hadoop framework is extensively used for scalable distributed processing of large datasets. There has been considerable interest in workshops illustrating the use of Hadoop on the SDSC Gordon cluster. Gordon is ideally suited to running Hadoop with fast SSD drives enabling HDFS performance and the high speed Infiniband interconnect to provide scalability. All users interested in utilizing Hadoop on XSEDE resources are invited to attend. The tutorial will provide a short introduction to Hadoop, overview of myHadoop, info on how to run Hadoop jobs via the PBS queues on Gordon, and Hadoop based data analysis tools. Hands on examples will include working with Hadoop streaming, PIG, and Apache Mahout.

We begin the tutorial with an introduction to Hadoop architecture and configuration with a simple illustrative example. The HDFS architecture and configuration details are covered next. Once the configuration info is completed, we present info on the scripts, procedures, and configuration details required to run Hadoop via the PBS queue on Gordon. The scripts and configuration details will be illustrated via hands on examples including the TestDFSIO and TeraSort benchmarks. The map-reduce approach will be further illustrated with a, hands on, anagram finder example.

The use of Hadoop streaming to develop tools will be illustrated with hands on examples. Finally, the various Hadoop based tools for data intensive computing such as HIVE, Hbase, PIG, and Apache Mahout (scalable machine learning algorithms) will be detailed. Hands on examples using PIG and Apache Mahout will be included.

SDSC Staff will be available to meet with individual users, to further discuss the Hadoop environment on Gordon, at the conclusion of the tutorial.

Secure Coding Practices

Length: Half Day, Morning

Level: Beginner (50%), Intermediate (25%), Advanced (25%)

Security is crucial to the software that we develop and use. With the growth of both Grid and Cloud services, security is becoming even more critical. This tutorial is relevant to anyone wanting to learn about minimizing security flaws in the software they develop. We share our experiences gained from performing vulnerability assessments of critical middleware. You will learn skills critical for software developers and analysts concerned with security.

This tutorial presents coding practices subject to vulnerabilities, with examples of how they commonly arise, techniques to prevent them, and exercises to reinforce them. Most examples are in Java, C, C++, Perl and Python, and come from real code belonging to Cloud and Grid systems we have assessed. This tutorial is an outgrowth of our experiences in performing vulnerability assessment of critical middleware, including Google Chrome, Wireshark, Condor, SDSC Storage Resource Broker, NCSA MyProxy, INFN VOMS Admin and Core, and many others.

Optimization and Tuning of MPI and PGAS Applications using MVAPICH2 and MVAPICH2-X

Length: Half Day, Afternoon

Level: Beginner (40%), Intermediate (35%), Advanced (25%)

MVAPICH2 software, supporting the latest MPI 3.0 standard, delivers best performance, scalability and fault tolerance for high-end computing systems and servers using InfiniBand, 10GigE/iWARP and RoCE networking technologies. MVAPICH2-X software package provides support for hybrid MPI+PGAS (UPC and OpenSHMEM) programming models with unified communication runtime. MVAPICH2 and MVAPICH2-X software libraries (http://mvapich.cse.ohio-state.edu) are powering several supercomputers in the XSEDE program including Gordon, Keenland, Lonestar4, Trestles and Stampede. These software libraries are being used by more than 2,150 organizations world-wide in 72 countries to extract the potential of these emerging networking technologies for modern systems. As of March '14, more than 205,000 downloads have taken place from this project's site. These software libraries are also powering several supercomputers in the TOP 500 list like Stampede, Tsubame 2.5 and Pleiades.

A large number of XSEDE users are using these libraries on a daily-basis to run their MPI and PGAS applications. However, many of these users and the corresponding system administrators are not fully aware of all features, optimizations and tuning techniques associated with these libraries. This tutorial is aimed to address these concerns. We will start with an overview of the MVAPICH2 and MVAPICH2-X libraries and their features. Next, we will focus on installation guidelines, runtime optimizations and tuning flexibility in-depth. An overview of configuration and debugging support in MVAPICH2 and MVAPICH2-X will be presented. Advanced optimization and tuning of MPI applications using the new MPI-T feature (as defined by MPI-3 standard) in MVAPICH2 will also be discussed. The impact on performance of the various features and optimization techniques will be discussed in an integrated fashion.

Network Performance Tutorial featuring perfSONAR

Length: Full Day

Level: Beginner/Intermediate

Participants will learn about network measurement through lecture and hands-on activities. Goals include:

• Use and interpret measurement tools to diagnose performance problems

• Learn the causes of performance abnormalities, and mitigation techniques

• Operate, install, configure, and maintain measurement infrastructure

A similar version of this tutorial workshop was presented at XSEDE13 with great success. We feel that offering material this year will provide an opportunity for another group of students to learn about network measurement and monitoring and help broaden the footprint of perfSONAR deployments for XSEDE and the larger REN (Research and Education Network) community. This year we will emphasize how to use perfSONAR to solve end-to-end problem with the hands on section and more real world examples.

Efficient Data Analysis with the IPython Notebook

Length: Half Day, Afternoon

Level: Beginner

Prerequisites: Laptop, Install the Anaconda Scientific Python Distribution: https://store.continuum.io/cshop/anaconda/

There are many recent additions to Python that make it an excellent programming language for data analysis. This tutorial has two goals. First, we introduce several of the recent Python modules for data analysis. We provide hands-on exercises for manipulating and analyzing data using pandas, scikit-learn, and other modules. Second, we execute examples using the IPython notebook, a web-based interactive development environment that facilitates documentation, sharing, and remote execution. Together these tools create a powerful, new way to approach scientific workflows for data analysis on HPC systems.

SciGaP Tutorial: Developing Science Gateways using Apache Airavata

Length: Half Day, Morning

Level: Intermediate

Science gateways, or Web portals, are an important mechanism for broadening and simplifying access to computational grids, clouds, and campus resources. Gateways provide science-specific user interfaces to end users who are unfamiliar with or need more capabilities than provided by command-line interfaces. In this tutorial, we present SciGaP, which includes software from the CIPRES, UltraScan, and Neuroscience Gateways combined with the Apache Airavata server-side system for managing jobs and data. Our goal is to show participants how to build and run gateways using both software and collected experience from some of the most heavily used XSEDE science gateways.

Eclipse and the Parallel Tools Platform

Length: Full Day

Level: Intermediate

Prerequisites:

·       We would like attendees to assure they have Java 1.7 on their machines (Mac, Linux, or Windows).

·        Go to http://eclipse.org/downloads and install "Eclipse for Parallel Application Developers" ahead of time if possible.

For many HPC developers, developing, analyzing and tuning parallel scientific applications, on a variety of target platforms, involves a hodgepodge of disparate command line tools. Based on the successful open-source Eclipse integrated development environment, the Eclipse Parallel Tools Platform

(PTP) combines tools for coding, debugging, job scheduling, monitoring, error detection, tuning, revision control and more into a single tool with a streamlined graphical user interface. PTP helps manage the complexity of HPC code development, optimization and monitoring on diverse platforms.

This full-day tutorial provides a hands-on introduction to Eclipse and PTP. Early sessions introduce code development in Eclipse: editing, building, launching and monitoring parallel applications in C and Fortran, support for efficient development of code on remote machines, and developing and analyzing code with a variety of languages and libraries.

Sessions later in the day focus on parallel debugging and performance optimization tools. Participants will inspect and analyze a real application code, profiling its execution and performance. Using tools such as Valgrind, Perf, GProf, GCov, LTTng, and TAU in the Eclipse environment will be covered.

The XSEDE Global Federated File System (GFFS) — Breaking Down Barriers to Secure Resource Sharing

Length: Half Day, Morning

Level: Intermediate

Prerequisites: Laptop, Familiarity with Unix. Having an XSEDE/MyProxy ID is preferred.

The GFFS offers scientists a simplified means through which they can interact with and share resources. Currently, many scientists struggle to exploit distributed infrastructures because they are complex, unreliable, and require the use of unfamiliar tools. For many scientists, such obstacles interfere with their research; for others, these obstacles render their research impossible. It is therefore essential to lower the barriers to using distributed infrastructures.

The first principle of the GFFS is simplicity. Every researcher is familiar with the directory-based paradigm of interaction; the GFFS exploits this familiarity by providing a global shared namespace. The namespace appears to the user as files and directories so that the scientist can easily organize and interact with a variety of resources types. Resources can include compute clusters, running jobs, directory trees in local file systems, groups, as well as storage resources at geographically dispersed locations. Once mapped into the shared namespace, resources can be accessed by existing applications in a location-transparent fashion, i.e., as if they were local.

For example, users can map (i.e., export) resources (such as a directory structure, on-campus cluster, or research group machines) into the GFFS. Collaborators at other institutions can (using a GFFS-aware FUSE driver) map the GFFS into their local file system in order to securely access remote resources (such as files and directories) in their local environment. In other words, the GFFS allows authorized clients to access (create, read, write, delete) locally-exported resources from anywhere in XSEDE. The result is a globally-shared file system that spans centers, campuses, and research groups.

In another example, two collaborating research groups might map their PBS/SGE controlled clusters into the GFFS to create a shared grid queue in the global namespace. The research groups (and other selected collaborators) can then use the shared grid queue. The grid queue can schedule jobs on just their private clusters or on their private clusters as well as an XSEDE resource (provided they have an allocation). The creator of the queue controls access to this new, shared grid queue. Naturally, jobs running via the shared queue can access data and executables via the GFFS.

Tutorial details: During the hands-on training, the student will deploy client-side tools on Windows, MacOS, or Linux. Server installation will be on cloud servers provided by the GFFS team.

The tutorial consists of five modules: 1) system model and overview; 2) client installation, GUI and shell usage, file-system access to the GFFS; 3) defining and running jobs, including parameter sweeps and workflows; 4) access control including identity and group creation/management; and 5) server installation and configuration, sharing data and compute resources.

The intended audience includes application developers, scientists, and computational support personnel who work with end users.

A Beginner's Guide to Visualization

Length: Half Day, Morning

Level: Beginner

Prerequisites: The tutorial will be easy to follow; if you prefer to bring your own laptop the following technical setup is required:

·       You will need internet access and a web browser to download sample data files, a computer mouse (optional, but recommended).

·       You will need to download and install the following visualization software on your laptop (preferably before the workshop begins):

o   Gephi 0.7 Beta https://launchpad.net/gephi/+download

o   ParaView 4.1: http://www.paraview.org/paraview/resources/software.html

·       If you need help installing the software, you should arrive a few minutes early to allow time for installing software. 

This tutorial provides an introduction to visualization by exploring underlying principles used in information and scientific visualization. Hands-on exercises using Gephi (Information Visualization) and ParaView (Scientific Visualization) are designed to give participants experience in discerning what type of visualization tool would be most effective in gaining insight into different ways of interpreting data. The visualization process is presented as a vehicle for using visualization as a tool for knowledge discovery, gaining insight, making better informed decisions when analyzing data. The format will serve both those who wish to participate hands-on (using their own laptop) and those who wish to observe and ask questions.

 

Information will be posted in the coming weeks, please check back regularly and follow XSEDE on Twitter (@XSEDEscience) and on Facebook (Facebook.com/XSEDEscience).