XSEDE13 is hosted by
In cooperation with
Non-Profit Silver Sponsors
Non-Profit Bronze Sponsors
XSEDE is supported by the
National Science Foundation
Scheduled for Wednesday, July 24, 3-4:30 pm. Each of the 10 talks will be 8 minutes long.
Optimizing the PCIT algorithm on Stampede's Xeon and Xeon Phi processors for faster discovery of biological networks
Lars Koesterke, Kent Milfeld, Dan Stanzione, Matt Vaughn, James Koltes, James Reecy and Nathan Weeks
The PCIT method is an important technique for detecting interactions between networks. The PCIT algorithm has been used in the biological context to infer complex regulatory mechanisms and interactions in genetic networks, in genome wide association studies, and in other similar problems. In this work, the PCIT algorithm is re-implemented with exemplary parallel, vector, I/O, memory and instruction optimizations for today's multi/manycore architectures. The evolution and performance of the new code targets the processor architectures of the Stampede supercomputer, but will also benefit other architectures. The Stampede system consists of an Intel Xeon E5 processor base system with an innovative component comprised of Intel Xeon Phi Coprocessors. Optimized results and an analysis are presented for both the Xeon and the Xeon Phi.
Embedding CIPRES Science Gateway Capabilities in Phylogenetics Software Environments
Mark Miller, Terri Schwartz and Wayne Pfeiffer
The explosive growth in DNA sequence data over the past decade makes it possible to clarify evolutionary relationships among all living things at an unprecedented level of resolution. Phylogenetic inference codes are computationally intensive, so turning this wealth of DNA sequence data into new insights about evolution requires access to high performance computing (HPC) resources. The CIPRES Science Gateway (CSG) was designed to meet this need by providing browser-based access to phylogenetic codes run on XSEDE compute resources. The CSG has lowered the barrier for access to HPC resources for biologists worldwide, supporting more than 6,100 users and enabling more than 480 publications over the past three years. Here we describe plans to create a new set of public CSG web services that can be accessed by any developer through a programmatic interface. These services will allow us to embed access to XSEDE resources within well-established phylogenetics software packages, thus leveraging the investments by developers in creating these rich work environments and by users in learning to use them. The services will also allow any developer with modest scripting skills to access and use CSG capabilities outside of the current browser interface. Our goal in creating these services is to allow scientists to conduct analyses without leaving their preferred work environment, whether that is a complex desktop application, a set of ad hoc scripted workflows, or the existing CSG browser interface. This paper describes the architectural design of the CSG web services, identifies potential issues that will be addressed in exposing programmatic access to HPC resources, and describes plans to embed the CSG web services in eight popular community applications.
Comprehensive Job-Level Resource Usage Measurement and Analysis for XSEDE HPC Systems
Charng-Da Lu, Jim Browne, Robert L. Deleon, John Hammond, Bill Barth, Thomas R. Furlani, Steven M. Gallo, Matthew D. Jones and Abani K. Patra
This work presents a methodology for comprehensive job-level resource use measurement and analysis and applications of the analyses to planning for HPC systems and a case study application of the methodology to the XSEDE Ranger and Lonestar4 systems at the University of Texas. The steps in the methodology are: System-wide collection of resource use and performance statistics at the job and node levels, mapping and storage of the resultant job-wise data to a relational database which eases further implementation and transformation of data to the formats required by specific statistical and analytical algorithms. Analyses can be carried out at different levels of granularity: job, user, or system-wide basis. Measurements are based on a novel lightweight job-centric measurement tool "TACC_Stats," which gathers a comprehensive set of metrics on all compute nodes. The data mapping and analysis tools will be an extension to the XDMoD project for the XSEDE community. This work also reports the preliminary results from the analysis of measured data for Texas Advanced Computing Center's Lonestar4 and Ranger supercomputers. The case studies presented indicate the level of detailed information that will be available for all resources when TACC_Stats is deployed throughout the XSEDE system. The methodology can be applied to any system that runs the TACC_Stats measurement tool.
Getting Started With High-Performance Computing for Humanities, Arts, and Social Science
This presentation addresses the question "Why would someone in humanities, arts, or social science be interested in high performance computing?" and discusses the resources and assistance that are available to humanists, artists, and social scientists who are interested in high performance computing. XSEDE provides a network of high performance computing resources that are available to researchers. In this talk I will discuss the resources that are available, who is eligible for these resources, and assistance that is available to help you use those resources. My role within XSEDE is to help you get started on XSEDE as well as to help you after you get resources allocated. In this talk I will walk you through the process of applying for an XSEDE startup account and let you know what to expect as you begin using the resources. I will also discuss some of the different types of projects that have been done by humanities, arts, and social science researchers which range from large scale analysis of texts, images and videos, network analysis (including social media), map based problems, simulations, and others. Finally, I will address some of the lessons I have learned from working with humanities, arts, and social science researchers who are using XSEDE resources. Whether you need computational power, storage, assistance with analysis of large datasets, or are just curious of what these types of resources can do for you, this talk will provide answers that you are looking for.
Using Lucene to Index and Search the Digitized 1940 U.S. Census
Liana Diesendruck, Rob Kooper, Luigi Marini and Kenton McHenry
An improved approach toward enabling search capabilities over large digitized document archives is described, in which Lucene indices were incorporated in a framework developed to provide automatic searchable access to the 1940 U.S. Census, a collection composed of digitized handwritten forms. As an alternative to trying to recognize the handwritten text in the images, Word Spotting feature vectors are used to describe each cell's content. Instead of querying the system using regular ASCII text, any query is rendered as an image and a ranked list of matching results is presented to the user. Among other pre-processing steps required by the framework, an index must be compiled to provide fast access to the feature vectors. The advantages and drawbacks of using Lucene to index these vectors instead of other indexing methods are discussed in light of the challenges confronted when dealing with digitized document collections of considerable size.
Providing Resource Information to Users of a National Computing Center
Matthew Hanlon, Warren Smith and Stephen Mock
The Texas Advanced Computing Center provides a variety of high-end resources to local, state, national, and international computational scientists and engineers. Many of these users obtain information about these resources via the TACC User Portal or the TACC mobile interfaces. We recently completed a redesign of the resource information gathering, distribution, and presentation components of our infrastructure. This paper describes our new design, including our new mobile interfaces, and details the improvements it makes over the previous design.
CILogon: A Federated X.509 Certification Authority for CyberInfrastructure Logon
Jim Basney, Terry Fleury and Jeff Gaynor
CILogon provides a federated X.509 certification authority for secure access to cyberinfrastructure such as the Extreme Science and Engineering Discovery Environment (XSEDE). CILogon relies on federated authentication (SAML and OpenID) for determining user identities when issuing certificates. Federated authentication enables users to conveniently obtain certificates using existing (university, Google, etc.) identities. Federated authentication also enables CILogon to serve a national-scale user community without requiring a large network of registration authorities performing manual user identification. CILogon supports multiple levels of assurance and custom interfaces for specific user communities. In this article we introduce the CILogon service and describe experiences and lessons learned from the first three years of operation.
Opposites Attract: Computational and quantitative outreach through artistic expressions
Amy Szczepanski, Christal Yost, Norman Magden, Evan Meaney and Carolyn Staples
Staff from the University of Tennessee's Joint institute for Computational Sciences, National Institute for Computational Sciences, and Remote Data Analysis and Visualization Center have teamed up with faculty from UT's School of Art to engage with students, the public, and the research community on a number of projects that connect the arts with the science and computing disciplines. These collaborations have led to coursework for students, videos about scientific discovery, and the production of novel, computer-mediated artwork. Both the arts and the sciences have gained from these collaborations.
Supercomputer Assisted Generation of Machine Learning Agents for the Calibration of Building Energy Models
Jibonananda Sanyal, Joshua New and Richard Edwards
Building Energy Modeling (BEM) is an approach to model the energy usage in buildings for design and retrofit purposes. EnergyPlus is the flagship Department of Energy software that performs BEM for different types of buildings. The input to EnergyPlus can often extend in the order of a few thousand parameters which have to be calibrated manually by an expert for realistic energy modeling. This makes it challenging and expensive thereby making building energy modeling unfeasible for smaller projects. In this paper, we describe the "Autotune" research which employs machine learning algorithms to generate agents for the different kinds of standard reference buildings in the U.S. building stock. The parametric space and the variety of building locations and types make this a challenging computational problem necessitating the use of supercomputers. Millions of EnergyPlus simulations are run on supercomputers which are subsequently used to train machine learning algorithms to generate agents. These agents, once created, can then run in a fraction of the time thereby allowing cost-effective calibration of building models.
A Thousand Words: Advanced Visualization for Humanities
Samuel Moore, Rob Turknett and Brad Westing
With funding from the National Endowments for the Humanities, TACC's digital media, arts, and humanities coordinator joined with TACC's visualization laboratory manager and collaborators from the department of English, the department of Linguistics, the School of Information Science, and the College of Education to create the next step in interactive, tiled display megapixel scale imaging The implications for the project extends across training, education, and outreach as the programming software used for the project was originally developed to teach fundamentals of computer programming within a visual context. Educators, scholars, and creatives will be interested in the Thousand Words project and the enabling software and libraries.