Bird of a Feather

TeraGrid'10 BoFs will engage attendees in exciting, informal, interactive sessions devoted to forward-looking topics drawn from the areas described in the submission guidelines for the Science, Technology, Gateways, and EOT tracks.


Richard Moore. New Compute Systems in the TeraGrid Pipeline
Abstract: Users of TeraGrid resources are eagerly awaiting the launch of many new research HPC and visualization/data analysis systems that are entering the TeraGrid repertoire during 2010-2011. To spotlight how each of these new systems will accelerate multi-disciplinary research, we propose two BOF sessions so that potential users and other stakeholders can easily learn about and compare these systems. As envisioned, the sessions will focus on innovative systems recently funded under NSF’s Track 2D and RVDAS (remote visualization and data analysis) grants. These talks will provide scientists, engineers, educators, and students with salient details about each of these new systems, with insights into their unique capabilities and architectures, and an overview of potential applications that might be targeted to each.

David Lifka. “MATLAB on the TeraGrid” Experimental Resource: Information, Getting Started & Success Stories
Abstract: Cornell University in partnership with Purdue University has deployed an NSF-funded experimental TeraGrid resource called "MATLAB on the TeraGrid". This initiative provides seamless parallel MATLAB computational services to remote desktop and Science Gateway users with complex analytic and computational requirements. MATLAB is an important data analysis tool for many TeraGrid users and, as a parallel resource, it has the potential to expand the high performance computing user community. In this BOF, the Cornell Center for Advanced Computing (CAC) and its project partners will provide information on this experimental resource including: configuration; how it can be used; how users can apply for access; and, how to get started using the resource once your account has been approved. Current users of the system have been encouraged to attend TeraGrid to share their stories on how they are using the system. CAC staff will also share success stories and discuss optimal use strategies.

Chris Jordan, J Ray Scott and Stephen Simms. Data Curation and Management in the TeraGrid
Abstract: Computational Science techniques are increasingly reliant on large-scale data, both in terms of large files and large numbers of files. The research community is also becoming more organized in their approach to data sources and outputs, and putting increased emphasis on managing this large and diverse ecosystem of data for effective research, education, and preservation. Development of research expertise and communication of research findings to future generations of scientists, engineers, and the lay public will be dependent on effectively addressing the challenges of large-scale data with hardware, software, services and practices managed and utilized by a community with diverse expertise. Current approaches to handling large-scale data in the research context include the development of reference databases such as GenBank, virtual organizations and collections such as the National Virtual Observatory, and large-scale infrastructure organizations such as the TeraGrid and the Open Science Grid; all of these approaches have important contributions to make to the challenges of 21st –century, data-centric science. This session will provide an informal opportunity for experts in TeraGrid Data Infrastructure and Data management techniques to interact with users and others in the community with an interest in and/or a need for data management resources. Brief presentations of current TeraGrid facilities and projects focused on data management infrastructure will be followed by an open discussion of data curation challenges faced by the community, interactions between scientists and engineers to solve significant data management and curation issues, and the future of datacentric infrastructure and practice in the computational science community.

Kay Hunt. Extending Cyberinfrastructure Beyond Its Own Boundaries -- Campus Champion Program
Abstract: TeraGrid is a national-scale high performance computing facility funded by the Office of Cyberinfrastructure at the National Science Foundation to provide resources and services in support of advancing scientific research and education. TeraGrid (TG) includes 11 Resource Providers and is a leader and major component of the emerging national and international cyberinfrastructure (CI). The TG (soon to be followed by eXtreme Digital or XD) provides free access to computing resources and services to all researchers and educators. To extend beyond the current user base to support broader national participation, the Campus Champions Program was formed just over two years ago. The Campus Champions program pro-actively engages researchers, educators, administrators, staff and students on campuses across the country to facilitate awareness of and access to TeraGrid's resources and services.
This session will help facilitate information about the program and encourage attendees to provide insight concerning improvements and upgrades to the program.

Shawn Brown. Common User Environments Working Group - Progress Report of Users
Abstract: This BOF will consist of a progress report to the User community to inform them of the implementation of the CUE. This BOF will give users a chance to comment and provide feedback on the current state of the implementation.

Paul Nowoczynski, Jared Yanovich and Zhihui Zhang. SLASH2: Next Generation Filesystem for Cooperating Scientific Institutions
Abstract: Achieving global data availability and access transparency has emerged as a central challenge to cooperating scientific computing institutions. Next generation data management systems must be capable of providing computational scientists with powerful means for accessing data and controlling locality amidst specialized classes of storage resources. In addition, efficient access to a global name space will be a likely requirement. To meet this challenge, researchers at the Pittsburgh Supercomputing Center have been developing a filesystem, SLASH2, suited for storage environments which span the wide-area and encompass heterogeneous storage clients and servers. SLASH2 is built from the ground up to fill the technology gap in the WAN-fs area where no existing solution is entirely suitable for the problem at hand. SLASH2 is specifically designed with features to address wide-area data management issues occurring within an inter-agency context. The following are design points of SLASH2:

  • POSIX I/O through a mountable filesystem
  • Supports multi-resident data at the block level
  • System managed, policy-based replication which supports parallelism and load-balancing
  • Inline data checksumming, system stored checksums
  • Support for “eventually consistent” metadata replication
  • Fully usermode
  • Highly portable I/O service which exports local storage via a common object-based interface
  • Protocol design features which emphasize asynchrony and minimization of RPCs

By targeting these specific design elements, SLASH2 has positioned itself as a fully featured filesystem with characteristics which are unique amongst its peers. More specifically, in the area of data replication and multi-residency,
SLASH2 has demonstrated an incredibly powerful mechanism for enabling users to instruct the system to manage complex operations with little effort or oversight by the user. The replication tools contain the ability to perform recursive operations, set policies, and allow for independent replication of files' subregions. In addition to this powerful flexibility, SLASH2 replication activities can be processed in parallel when the resources are available. This feature should be critical to users with large scientific data sets. Further, the highly portable nature of SLASH2 is meant to encourage the inclusion of many types of storage systems, regardless of their vendor or class. These and the other features of SLASH2 would be discussed in detail during the BoF.
The team of SLASH2 developers would like to use TeraGrid'10 as a platform for the formal introduction of the the technology to TeraGrid users and resource providers alike. Due to the interactive nature of a BoF, the teams feels it would be the most appropriate forum for discussing topics such as: wide-area filesystem features; inter-site security issues / ideas and mechanisms for providing secure filesystem access; and the prospects of a community supported filesystem. The proposed BoF would also provide of brief demonstration of the SLASH2 filesystem.

Philip Bogden, William Michener, Sayeed Choudhury, John Cobb and Tim DiLauro. NSF DataNet Program overview and community input solicitation
Abstract: The NSF DataNet program is a program designed to promote sustainable Digital data preservation and access. It is being instantiated with up to 5 5-year awards to initiate DataNet activities. Currently, two DataNet projects have been awarded and are underway: The DataONE project (PI: Bill Michener, University of New Mexico) and the Data Conservancy Project (PI: Sayeed Choudrey, Johns Hopkins University).
This BoF will be an opportunity to provide a brief to and receive input from the TeraGrid (and future XD) user and provider communities. It will begin with a brief:

  • summary of the program by the NSF program manager, Bogden,
  • summary of the existing awards by the PI’s Michener and Choudrey.
  • discussion of potential future awards and competitions
  • discussion of opportunities for collaboration and complentary activities with TeraGrid (and XD in the future)

Finally, the BoF will wrap up with a summary of the discussion and a draft action plan for further interactions.

Contact BoF Chair Sergiu Sanielevici (PSC).