Partnerships

The National Science Foundation's eXtreme Digital (XD) program is making new infrastructure and next-generation digital services available to researchers and educators. They'll use that infrastructure to handle the huge volumes of digital information that are now a part of their work--the results of supercomputing simulations, the data generated by large scientific instruments such as telescopes, and the existing data that can be mined from a host of public sources.

Many of the supercomputers and high-end visualization and data analysis resources connected by the XSEDE project are supported by NSF's eXtreme Digital program.

Other projects that are part of the NSF eXtreme Digital program include:

  • The XD Technology Database, which is publicly available, allows XSEDE users and other technology providers to submit their tools for evaluation by the XSEDE team and suggest other technologies that would be of use to the XSEDE user community.

  • XSEDE Metrics on Demand offers tools to benchmark user satisfaction and resource use across the XSEDE project.

  • FutureSystems, a distributed, high-performance test-bed, allows scientists to collaboratively develop and test innovative approaches to parallel, grid, and cloud computing.

XSEDE also partners with organizations outside of the NSF eXtreme Digital program. These relationships improve the quality and diversity of the resources and services available to the open scientific research community through XSEDE. They also help expand the XSEDE community to new groups and research teams. These XSEDE partners include:

  • Open Science Grid brings together computing and storage resources from U.S. campuses and research communities into a common, shared-grid infrastructure. OSG will help provide the high-throughput resources that many research teams need.

  • PRACE supports a pan-European computing infrastructure and includes 19 member countries. By partnering, PRACE and XSEDE will provide the technical and administrative means for international scientific collaborations. The organizations also will work together on joint user support and training activities.

NSF Proposal Title Start Date Ende Date Abstract
Mainstreaming Volunteer Computing October 1, 2011 September 30, 2016
This award funds the continued operations
and further development of Einstein@Home and its software infrastructure, the Berkeley Open Infrastructure for Network Computing (BOINC). Einstein@Home is one of the largest and most powerful computers on the planet. It searches astrophysical data for the weak signals from spinning neutron stars. Unlike a normal supercomputer, the computing power of Einstein@Home comes from ordinary home computers and laptops that have been "signed up" by about 300,000 members of the general public. When otherwise idle, these computers automatically download observational data over the Internet from Einstein@Home servers, search the data for the weak signals from spinning neutron stars, and return the results of the analysis to the servers.

Neutron stars are exotic objects: they represent the most compact form that a star can take before it collapses into a black hole. Since they were discovered in 1967, about two thousand neutron stars have been found (including several discovered in 2010 and 2011 by Einstein@Home). Neutron star observations provide a unique view into the behavior of matter at extreme pressures and densities, and into the nature of gravitation when gravity is very strong. Under certain circumstances, neutron stars can be emitters of pulsing radio waves (pulsars). Einstein@home exploits the unique capabilities of the Arecibo Radio Observatory, the largest and most sensitive single-dish radio telescope in the world, to search for these signals. It is possible that neutron stars can also emit gravitational waves. Gravitational waves were first predicted by Einstein in 1917 but have never been directly detected. Einstein@home can search the data from gravitational wave detectors such as those of the Laser Interferometer Gravitational-wave Observatory (LIGO) for these signals. Einstein@home also supports the BOINC software infrastructure to benefit dozens of computationally intensive projects in other areas of science, that also exploit volunteer distributed computing. And it is a remarkable tool for scientific outreach: Einstein@Home allows hundreds of thousands of ordinary citizens from around the world to participate in and make meaningful contribution to cutting-edge scientific research.
SI2-SSI: SciDaaS -- Scientific data management as a service for small/medium labs April 1, 2012 March 31, 2016
The SciDaaS project will develop and operate
a suite of innovative research data management services for the NSF community. These services, to be accessible at www.globusonline.org, will allow research laboratories to outsource a range of time-consuming research data management functions, including storage and movement, publication, and metadata management. SciDaaS research will investigate what services are most needed by NSF researchers; how best to present these services to integrate with diverse research laboratory environments; and how these services are used in practice across different research communities.

SciDaaS will greatly reduce the cost to the individual researcher of acquiring and operating sophisticated scientific data management capabilities. In so doing, it has the potential to dramatically expand use of advanced information technology in NSF research and thus accelerate discovery across many fields of science and engineering. By providing a platform for researchers to publicly share data at an incremental cost, SciDaaS will also reduce barriers to free exchange among researchers and contribute to the democratization of science.
SUPREMM tool July 1, 2012 June 30, 2015
Todays high-performance computing systems
are a complex combination of software, processors, memory, networks, and storage systems characterized by frequent disruptive technological advances. In this environment, system managers, users and sponsors find it difficult if not impossible to know if optimal performance of the infrastructure is being realized, or even if all subcomponents are functioning properly. Users of such systems are often engaged in science at the extreme where system uncertainties can significantly delay or even confound the scientific investigations. Critically, for systems based on open source software systems which includes a large fraction of XSEDE resources, the data and information necessary to use and manage these complex systems is not available. HPC centers and their users, are to some extent flying blind, without a clear understanding of system behavior. Anomalous behavior has to be diagnosed and remedied with incomplete and sparse data. It is difficult for users to assess the effectiveness with which they are using the available resources to generate knowledge in their sciences. NSF lacks a comprehensive knowledge base to evaluate the effectiveness of its investments in HPC systems.

This award will address this problem through the creation of a comprehensive set of tools for developing the needed knowledge bases. This will be accomplished by building on and combining work on HPC systems monitoring and reporting currently underway at the University at Buffalo under the Technology Audit Service (TAS) of the XSEDE project and University of Texas/ Texas Advance Computing Center (TACC) as part of the Ranger Technology Insertion effort with many elements of existing monitoring and analysis tools. The PIs will provide the knowledge bases required to understand the current operations of XSEDE, to enhance and increase the productivity of all of the stakeholders of XSEDE (service providers, users and sponsors), and ultimately to provide open source tools to greatly increase the operational efficiency and productivity of HPC systems in general.
Center for Trustworthy Scientific Cyberinfrastructure (CTSC) October 1, 2012 June 30, 2015
The Center for Trustworthy Scientific
Cyberinfrastructure (CTSC) will transform and improve the practice of cybersecurity and hence trustworthiness of NSF scientific cyberinfrastructure. CTSC will provide readily available cybersecurity expertise and services, as well as leadership and coordination across a broad range of NSF scientific cyberinfrastructure projects via a series of engagements with NSF cyberinfrastructure projects and a broader ongoing education, outreach and training effort.

Intellectual Merit: CTSC will advance the state of cybersecurity practice across the community by analyzing gaps in cybersecurity technology to provide guidance to researchers and developers, addressing the application of software assessment to complicated cyberinfrastructure software stacks, and fostering broadly the transition of cybersecurity research to practice. Broader Impact: Scientific computing and confidence in its results relies on trustworthy cyberinfrastructure. The CTSC mission is to help provide the trustworthy cyberinfrastructure that science requires across the ecosystem of NSF. CTSC's work will impact science through dozens of cyberinfrastructure projects over the project's lifetime. Additionally, CTSC will perform workforce development in the area of cyberinfrastructure cybersecurity through EOT activities: training, undergraduate curriculum development, and student education.
Latin America-US Institute 2013: Methods in Computational Discovery for Multidimensional Problem Solving November 1, 2012 October 31, 2013
This Pan-American Advanced Studies Institutes
(PASI) award, jointly supported by the NSF and the Department of Energy (DOE), will take place July 2013 at the Universidad del Valle in Guatemala. Organized by Dr. Marshall S. Poole, Professor in the Department of Communications at the University of Illinois, Urbana-Champaign, the PASI aims to introduce junior researchers to methods in computation-based discovery (CBD). In searching for solutions to major problems (e.g., biodiversity, modeling of natural systems, water ecology, and others), researchers across the natural and social sciences as well as the humanities and arts are generating massive and/or highly complex data sets that extend well-beyond humans' capacities to perceive or analyze without sophisticated technological augmentation. CBD allows researchers to gather, transform, and analyze data from a range of sources, including, for example, sensors, video archives, telescopes, and supercomputers. Thus, access to advanced computational resources and also to sophisticated skills in data acquisition, management, transformation, visualization, analytics, and preservation, are highly valued by researchers. For example, sophisticated visualization tools and techniques enhance human understanding of extreme, complex and/or abstract data sets, making it easier to see patterns and relationships and to form or test hypotheses.

The Institute will focus on CBD technical and analytical methods and help investigators apply these to their own research. Key goals are to (1) expand participants' knowledge of high performance computing (HPC) and specialized tools and techniques that support CBD involving massive or complex data sets; (2) provide hands-on experience in exploring large and complex data sets using easily accessible desktop open source tools; (3) bring researchers from underrepresented populations into the CBD field; and (4) foster new collegial partnerships that stimulate both national and international co-operative research among the presenters and attendees. In addition, the PASI will also provide up-to-date information on the deliberations of the PASI to a wider audience through a web page to disseminate results and reports of the meeting.
EAGER proposal: Toward a Distributed Knowledge Environment for Research into Cyberinfrastructure: Data, Tools, Measures, and Models for Multidimensional Innovation Network Analysis September 1, 2013 August 31, 2015
Although many virtual organizations (VO) are
quite effective, not all VO practitioners are effective in each area, and there is no organized body of knowledge or set of ?best practices? among VOs to draw upon for key issues. Therefore centers are likely not as effective as they could be. This proposal involves the creation of an online knowledge exchange. This Virtual Organization Resources and Toolkits Exchange (VORTEX) would provide leaders of virtual organizations with resources about running virtual organizations and access to relevant organizational scientists. VORTEX is intended to aid in building a community among virtual organization leaders so that they can collaborate, share, and learn with and from each other.

Specific Objectives of the work include development, evaluation, and improvement of an online Virtual Organization Resources and Toolkits Exchange (VORTEX) environment to aid scientists and engineers to more effectively lead virtual organizations. This type of environment is necessary in order to:

(1) Connect leaders of virtual organizations with appropriate organization scientists;

(2) Provide online educational and reference materials for issues associated with managing virtual organizations; and

(3) Establish a center for leaders of virtual organization to share and collaborate with each other.
Multiscale Software for Quantum Simulations in Materials Design, Nano Science and Technology September 1, 2013 August 31, 2016
The emergence of petascale computing platforms
brings unprecedented opportunities for transformational research through simulation. However, future breakthroughs will depend on the availability of high-end simulation software, which will fully utilize these unparalleled resources and provide the long-sought third avenue for scientific progress in key areas of national interest. This award will deliver a set of open source petascale quantum simulation tools in the broad areas of materials design, nano science and nanotechnology. Materials prediction and design are key aspects to the recently created Materials Genome initiative, which seeks to "deploy advanced materials at least twice as fast, at a fraction of the cost." Computational materials design is the critical aspect of that initiative, which relies on computation guiding experiments. The outcomes of the latter will in turn lead to follow-up computation in an iterative feedback loop. Nanoscience, which studies properties of materials and processes on fundamental scale of nanometers, promises development of materials and systems with radically new properties. However, the nanoscale properties are hard to measure and even harder to predict theoretically. Only simulations that can fully account for the complexity and variability at that fundamental scale stand a chance of predicting and utilizing the macroscopic properties that emerge. This truly requires petascale resources and efficient petascale software tools.

This award will develop software tools build on the real-space multigrid (RMG) software suite and distribute them to the national user community. The RMG code already scales to 128,000 CPU cores and 18,000 GPU nodes. The award will further enhance RMG through development of new iterative methods with improved convergence, optimization of additional modules for existing and new petascale computing platforms, and creation of ease-to-use interfaces to the main codes. Workshops in RMG usage will be conducted at XSEDE workshops and other meetings of NSF supercomputing centers. RMG will be distributed through a web portal, which will also contain user forums and video tutorials, recorded at live user sessions. A library of representative examples for the main petascale platforms will be maintained. RMG will enable quantum simulations of unprecedented size, enabling studies of the building blocks of functional nano or bio-nano structures, which often involve thousands of atoms and must be described with the requisite fidelity. The development of petascale quantum simulation software and its user community will lead to cross-fertilization of ideas both within and across fields. Students and postdocs trained in this area will have significant opportunities for advancement and making substantial impact on their own.
MRI: Acquisition of SuperMIC-- A Heterogeneous Computing Environment to Enable Transformation of Computational Research and Education in the State of Louisiana October 1, 2013 September 30, 2016
This is an award to acquire a compute cluster at
LSU. The computer is a heterogeneous HPC cluster named SuperMIC containing both Intel Xeon Phi and NVIDIA Kepler K20X GPU (graphics processing unit) accelerators. The intent is to conduct research on programming such clusters while advancing projects that are dependent on HPC. The efforts range from modeling conditions which threaten coastal environments and test mitigation techniques; to simulating the motions of tumors/organs in cancer patients due to respiratory actions to aid radiotherapy planning and management. The burden of learning highly complex hybrid programming models presents an enormous software development crisis and demands a better solution. SuperMIC will serve as the development platform to extend current programming frameworks, such as Cactus, by incorporating GPU and Xeon Phi methods. Such frameworks allow users to move seamlessly from serial to multi-core to distributed parallel platforms without changing their applications, and yet achieve high performance. The SuperMIC project will include training and education at all levels, from a Beowulf boot camp for high school students to more than 20 annual LSU workshops and computational sciences distance learning courses for students at LONI (Louisiana Optical Network Initiative) and LA-SiGMA (Louisiana Alliance for Simulation-Guided Materials Applications) member institutions. These include Southern University, Xavier University, and Grambling State University - all historically black colleges and universities (HBCU) which have large underrepresented minority enrollments. The SuperMIC cluster will be used in the LSU and LA-SiGMA REU and RET programs. It will impact the national HPC community through resources committed to the NSF XSEDE program and the Southeastern Universities Research Association SURAgrid. The SuperMIC will commit 40% of the usage of the machine to the XSEDE XRAC allocation committee.
Open Gateway Computing Environments Science Gateways Platform as a Service (OGCE SciGaP) October 1, 2013 September 30, 2018
Science Gateways are virtual environments that
dramatically accelerate scientific discovery by enabling scientific communities to utilize distributed computational and data resources (that is, cyberinfrastructure). Successful Science Gateways provide access to sophisticated and powerful resources, while shielding their users from the resources' complexities. Given Science Gateways' demonstrated impact on progress in many scientific fields, it is important to remove barriers to the creation of new gateways and make it easier to sustain them. The Science Gateway Platform (SciGaP) project will create a set of hosted infrastructure services that can be easily adopted by gateway providers to build new gateways based on robust and reliable open source tools. The proposed work will transform the way Science Gateways are constructed by significantly lowering the development overhead for communities requiring access to cyberinfrastructure, and support the efficient utilization of shared resources.

SciGaP will transform access to large scale computing and data resources by reducing development time of new gateways and by accelerating scientific research for communities in need of access to large-scale resources. SciGaP's adherence to open community and open governance principles of the Apache Software Foundation will assure open source software access and open operation of its services. This will give all project stakeholders a voice in the software and will clear the proprietary fog that surrounds cyberinfrastructure services. The benefits of SciGaP services are not restricted to scientific fields, but can be used to accelerate progress in any field of endeavor that is limited by access to computational resources. SciGaP services will be usable by a community of any size, whether it is an individual, a lab group, a department, an institution, or an international community. SciGaP will help train a new generation of cyberinfrastructure developers in open source development, providing these early career developers with the ability to make publicly documented contributions to gateway software and to bridge the gap between academic and non-academic development.
Sustaining Globus Toolkit for the NSF Community (Sustain-GT) October 1, 2013 September 30, 2018
Science and engineering depend increasingly on the ability to
collaborate and federate resources across distances. This observation holds whether a single investigator is accessing a remote computer, a small team is analyzing data from an engineering experiment, or an international collaboration is involved in a multi-decade project such as the Large Hadron Collider (LHC). Any distributed collaboration and resource federation system requires methods for authentication and authorization, data movement, and remote computation. Of the many solutions that have been proposed to these problems, the Globus Toolkit (GT) has proven the most persistently applicable across multiple fields, geographies, and project scales. GT resource gateway services and client libraries are used by tens of thousands of people every day to perform literally tens of millions of tasks at thousands of sites, enabling discovery across essentially every science and engineering discipline supported by the NSF. As new, innovative techniques and technologies for collaboration and scientific workflows are developed, and as new computing and instrument resources are added to the national cyberinfrastructure, these technologies and other improvements must be added and integrated into GT so that it can continue to provide an advanced and robust technology for solving scientific research problems.

The Sustain-GT project builds on past success to ensure that GT resource gateway services will continue to meet the challenges faced by NSF science and engineering communities. These challenges include: multiple-orders-of-magnitude increases in the volume of data generated, stored, and transmitted; much bigger computer systems and correspondingly larger and more complex computations; much faster networks; many more researchers, educators, and students engaged in data-intensive and computational research; and rapidly evolving commodity Web and Cloud computing environments. With the help of a new User Requirements Board, Sustain-GT will respond to community demands to evolve the GT resource gateway services with superior functionality, scalability, availability, reliability, and manageability. Sustain-GT will also provide the NSF community with high quality support and rapid-response bug fix services, as is required to sustain a heavily used, production system like GT.
CC-NIE Integration: Developing Applications with Networking Capabilities via End-to-End SDN (DANCES) January 1, 2014 December 31, 2015
The DANCES project team of network engineers,
application developers, and research scientists is implementing a software-defined networking (SDN)-enabled end-to-end environment to optimize support for scientific data transfer. DANCES accomplishes this optimization by integrating high performance computing job scheduling, network control capabilities offered by SDN along with data movement applications in an end-to-end network infrastructure. This integration provides access to control mechanisms for managing network bandwidth. The control of network resources enabled by SDN enhances application stability, predictability and performance, thereby improving overall network utilization. Motivation for the DANCES project is to apply the advantages of advanced network services to the problem of congested metropolitan and campus networks. DANCES uses XSEDENet across Internet2 in conjunction with OpenFlow-enabled network switches installed at the collaborating sites as the end-to-end hardware and software substrate.

Knowledge gained through DANCES is being disseminated through educational programs offered by the participating institutions and at existing community workshops, meetings, and conferences. The insights and experience obtained through DANCES will promote a better understanding of the technical requirements for supporting end-to-end SDN across wide area and campus cyberinfrastructure. The resulting SDN-enabled applications will make the request and configuration of high bandwidth connections easily accessible to end users and improve network performance and predictability for supporting a wide range of applications.
A Large-Scale, Community-Driven Experimental Environment for Cloud Research October 1, 2014 September 30, 2017
A persistent problem facing academic cloud research is the
lack of infrastructure and data to perform experimental research: large-scale hardware is needed to investigate the scalability of cloud infrastructure and applications, heterogeneous hardware is needed to investigate algorithmic and implementation tradeoffs, fully-configurable software environments are needed to investigate the performance of virtualization techniques and the differences between cloud software stacks, and data about how clouds are used is needed to evaluate virtual machine scheduling and data placement algorithms.

The Chameleon project will addresses these needs by providing a large-scale, fully configurable experimental testbed driven by the needs of the cloud research and education communities. The testbed, and the ecosystem associated with it, will enable researchers to explore a range of cloud research challenges, from large scale to small scale, including exploring low-level problems in hardware architecture, systems research, network configuration, and software design, or at higher levels of abstraction looking at cloud scheduling, cloud platforms, and cloud applications.

Chameleon will significantly enhance the ability of the computing research community to understand the behavior of Internet scale cloud systems, and to develop new software, ideas and algorithms for the cloud environment. As the tremendous shift to cloud as the primary means of providing computing infrastructure continues, a large-scale testbed tailored to researchers' needs is essential to the continued relevance of a large fraction of computing research.

The project is led by the University of Chicago and includes partners from the Texas Advanced Computing Center (TACC), Northwestern University, the Ohio State University, and the University of Texas at San Antonio, comprising a highly qualified and experienced team, with research leaders from the cloud and networking world blended with providers of production quality cyberinfrastructure. The team includes members from the NSF-supported FutureGrid project and from the GENI community, both forerunners of the NSFCloud solicitation under which this project is funded.

The Chameleon testbed, will be deployed at the University of Chicago (UC) and the Texas Advanced Computing Center (TACC) and will consist of 650 multi-core cloud nodes, 5PB of total disk space, and leverage 100 Gbps connection between the sites. While a large part of the testbed will consist of homogenous hardware to support large-scale experiments, a portion of it will support heterogeneous units allowing experimentation with high-memory, large-disk, low-power, GPU, and co-processor units. The project will also leverage existing FutureGrid hardware at UC and TACC in its first year to provide a transition period for the existing FutureGrid community of experimental users.

To support a broad range of experiments emphasizing a range of requirements ranging from a high degree of control to ease of use the project will support a graduated configuration system allowing full user configurability of the stack, from provisioning of bare metal and network interconnects to delivery of fully functioning cloud environments. In addition, to facilitate experiments, Chameleon will support a set of services designed to meet researchers needs, including support for experimental management, reproducibility, and repositories of trace and workload data of production cloud workloads.

To facilitate the latter, the project will form a set of partnerships with commercial as well as academic clouds, such as Rackspace and Open Science Data Cloud (OSDC). It will also partner with other testbeds, notably GENI and INRIA's Grid5000 testbed, and reach out to the user community to shape the policy an direction of the testbed.

The Chameleon project will bring a new dimension and scale of resources to the CS community who wish to educate their students about design, implementation, operation and applications of cloud computing, a critical skillset for future computing professionals. It will enhance the understanding and application of experimental methodology in computer science and generate new educational materials and resources, with the participation of, and for, Minority Serving Institution (MSI) students.
MRI: Acquisition of a National CyberGIS Facility for Computing- and Data-Intensive Geospatial Research and Education October 1, 2014 September 30, 2017
Collaborative, interactive, and scalable knowledge discovery,
in the form on processing and visualizing massive amounts of complex geospatial data and performing associated analysis and simulation, have become essential to fulfilling the important role of the emerging and vibrant interdisciplinary field of CyberGIS -- geographic information science and systems (GIS) based on advanced cyberinfrastructure -- in enabling computing- and data-intensive research and education across a broad swath of academic disciplines with significant societal impacts.

This project supports these activities by establishing the CyberGIS Facility as an innovative instrument equipped with capabilities that include high-performance data access with large disk storage, cutting-edge computing configured with advanced graphics processing units, and visualization supported with fast network and dynamically provisioned cloud computing resources. The CyberGIS Facility represents a groundbreaking advance in the broad context of advanced cyberinfrastructure and geospatial sciences and technologies. The Facility enables researchers to solve a diverse set of major and complex scientific problems (e.g., climate and weather predictions, emergency management, and environmental and energy sustainability) in multidisciplinary, bio, engineering, geo, and social sciences that would otherwise be impossible or difficult to tackle. Extensive advances in various education and training efforts (e.g., new courses, cross-disciplinary curricula, and online learning materials) help to produce a next-generation workforce for fostering CyberGIS-enabled discoveries and innovations. Facility users represent a wide range of disciplines and conduct leading-edge research sponsored by various agencies and organizations (e.g., NSF, Environmental Protection Agency, National Institutes of Health, National Aeronautics and Space Administration, and U.S. Geological Survey), which highlight the impact that this project has in enabling broad and significant scientific advances.
Acquisition of an Extreme GPU cluster for Interdisciplinary Research October 1, 2014 September 30, 2017
Stanford University requests $3,500,000 over 36 months to
acquire an extreme GPU HPC cluster, called X-GPU, comprising 54 compute nodes built using the Cray Hydra technology with FDR Infiniband. Each node has Intel Haswell 12-cores; 8 NVIDIA Kepler cards; 128 GB of DDR4 memory; a 120 GB SSD and two 1 TB hard drives. energy-efficient, computational facility providing almost a petaflop of computational power. It will be used by 1) at least 25 research groups representing more than 100 students and postdoctorals at Stanford across 15 departments and 4 schools, 2) at least 8 collaborators from at least 7 other institutions across the nation, and 3) by as many as hundreds of national researchers through the NSF-sponsored XSEDE allocation system. The PIs plan to offer 25% of X-GPU to XSEDE to offset the impacts from the planned retiring of Keeneland, the current XSEDE resource providing heterogeneous parallel computing with CPUs and GPUs to the national community.

Identified scientific outcomes enabled by this instrument include, but not limited to: astrophysics and cosmology, bioinformatics and biology, materials modeling, and climate modeling. The researchers have already invested significant efforts to develop modeling and simulation codes that can demonstrate high performance on GPU-accelerated clusters. The PIs plan develop software infrastructure and educational materials to help the national community in the transition to fine-grained parallel thinking and algorithm design, which is critical to effectively use this novel high-performance, low-cost, energy-efficient architecture.
The Centrality of Advanced Digitally-ENabled Science: CADENS October 1, 2014 September 30, 2017
Computational data science is at a turning point in its
history. Never before has there been such a challenge to meet the growing demands of digital computing, to fund infrastructure and attract diverse, trained personnel to the field. The methods and technologies that define this evolving field are central to modern science. In fact, advanced methods of computational and data-enabled discovery have become so pervasive that they are referred to as paradigm shifts in the conduct of science. A goal of this Project is to increase digital science literacy and raise awareness about the Centrality of Advanced Digitally ENabled Science (CADENS) in the discovery process. Digitally enabled scientific investigations often result in a treasure trove of data used for analysis. This project leverages these valuable resources to generate insightful visualizations that provide the core of a series of science education outreach programs targeted to the broad public, educational and professional communities. From the deep well of discoveries generated at the frontiers of advanced digitally enabled scientific investigation, this project will produce and disseminate a body of data visualizations and scalable media products that demonstrate advanced scientific methods. In the process, these outreach programs will give audiences a whole new look at the world around them. The project calls for the production and evaluation of two principal initiatives. The first initiative, HR (high-resolution) Science, centers on the production and distribution of three ultra-high-resolution digital films to be premiered at giant screen full-dome theaters; these programs will be scaled for wide distribution to smaller theaters and include supplemental educator guides. The second initiative, Virtual Universe, includes a series of nine high-definition (HD) documentary programs. Both initiatives will produce and feature data visualizations and the CADENS narratives to support an integrated set of digital media products. The packaged outreach programs will be promoted and made available to millions through established global distribution channels. Expanding access to data visualization is an essential component of the Project. Through a call for participation (CFP), the Project provides new opportunities for researchers to work with the project team and technical staff for the purpose of creating and broadly distributing large-scale data visualizations in various formats and resolutions. The project will feature these compelling, informative visualizations in the outreach programs described above. A Science Advisory Committee will participate in the CFP science selections and advise the Project team. The project calls for an independent Program Evaluation and Assessment Plan (PEAP) to iteratively review visualizations and the outreach programs that will target broad, diverse audiences. The project launches an expansive outreach effort to increase digital science literacy and to convey forefront scientific research while expanding researchers access to data visualization. The project leverages and integrates disparate visualization efforts to create a new optimized large-scale workflow for high-resolution museum displays and broad public venues. The PEAP evaluations will measure progress toward project goals and will reveal new information about visualization's effectiveness to move a field forward and to develop effective outreach models. The project specifically targets broad audiences in places where they seek high-quality encounters with science: at museums, universities, K-16 schools, and the web. This distribution effort includes creating and widely disseminating the project outreach programs and supplemental educator guides. The project visualizations, program components, HD documentaries, educational and evaluation materials will be promoted, distributed and made freely available for academic, educational and promotional use. Dissemination strategies include proactively distributing to rural portable theaters, 4K television, professional associations, educators, decision-makers, and conferences. To help address the critical challenge of attracting women and underrepresented minorities to STEM fields, the Project will support a Broadening Participation in Visualization workshop and will leverage successful XSEDE/Blue Waters mechanisms to recruit under-represented faculty and students at minority-serving and majority-serving institutions and to disseminate the Project programs and materials among diverse institutions and communities.
CloudLab: Flexible Scientific Infrastructure to Support Fundamental Advances in Cloud Architectures and Applications October 1, 2014 September 30, 2017
Many of the ideas that drive modern cloud computing, such as
server virtualization, network slicing, and robust distributed storage, arose from the research community. But because today's clouds have particular, non-malleable implementations of these ideas "baked in," they are unsuitable as facilities in which to conduct research on future cloud architectures. This project creates CloudLab, a facility that will enable fundamental advances in cloud architecture. CloudLab will not be a cloud; CloudLab will be large-scale, distributed scientific infrastructure on top of which many different clouds can be built. It will support thousands of researchers and run hundreds of different, experimental clouds simultaneously. The Phase I CloudLab deployment will provide data centers at Clemson (with Dell equipment), Utah (HP), and Wisconsin (Cisco), with each industrial partner collaborating to explore next-generation ideas for cloud architectures

CloudLab will be a place where researchers can try out ideas using any cloud software stack they can imagine. It will accomplish this by running at a layer below cloud infrastructure: it will provide isolated, bare-metal access to a set of resources that researchers can use to bring up their own clouds. These clouds may run instances of today's popular stacks, modest modifications to them, or something entirely new. CloudLab will not be tied to any particular particular cloud stack, and will support experimentation on multiple in parallel.

The impact of cloud computing outside the field of computer science has been substantial: it has enabled a new generation of applications and services with direct impacts on society at large. CloudLab is positioned to have an immediate and substantial impact on the research community by providing access to the resources it needs to shape the future of clouds. Cloud architecture research, enabled by CloudLab, will empower a new generation of applications and services which will bring direct benefit to the public in areas of national priority such as medicine, smart grids, and natural disaster early warning and response.
RUI: CAREER Organizational Capacity and Capacity Building for Cyberinfrastructure Diffusion August 1, 2015 August 31, 2020
The vision behind advanced cyberinfrastructure (CI) is that
its development, acquisition, and provision will transform science and engineering in the 21st century. However, CI diffusion is full of challenges, because the adoption of the material objects also requires the adoption of a set of related behavioral practices and philosophical ideologies. Most critically, CI-enabled virtual organizations (VOs) often lack the full range of organizational capacity to effectively integrate and support the complex web of objects, practices, and ideologies as a holistic innovation.

This project examines the various manifestations of CI related objects, practices, and ideologies, and the ways they support CI implementation in scientific VOs. Using grounded theory analysis of interviews and factor analysis of survey data, this project will develop and validate a robust framework/measure of organizational capacity for CI diffusion. The project's empirical focus will be the NSF-funded Extreme Science and Engineering Discovery Environment (XSEDE; https://www.xsede.org/), a nationwide network of distributed high-performance computing resources. Interviews and surveys will solicit input from domain scientists, computational technologists, and supercomputer center administrators (across e-science projects, institutions, and disciplines) who have experience with adopting and using CI tools within the XSEDE ecosystem. The project will generate a series of capacity building strategies to help VOs increase the organizational capacity necessary to fully adopt CI. Findings will help NSF and other federal agencies to improve existing and future CI investments. This project may also have implications for open-source and commercial technologies that harness big data for complex simulations, modeling, and visualization analysis.
MRI Collaborative Consortium: Acquisition of a Shared Supercomputer by the Rocky Mountain Advanced Computing Consortium September 1, 2015 August 31, 2018
A cluster supercomputer is deployed by the
University of Colorado Boulder (CU-Boulder) and Colorado State University (CSU) for the Rocky Mountain Advanced Computing Consortium (RMACC). This high-performance computing (HPC) system supports multiple research groups across the Rocky Mountain region in fields including astrophysics, bioinformatics, chemistry, computational fluid dynamics, earth system science, life science, material science, physics, and social sciences with advanced computing capabilities. It also provides a platform to investigate and address the impact of many-core processors on the applications that support research in these fields.

The system integrates nodes populated with Intel's conventional multicore Xeon processors and Many-Integrated-Core (MIC) 'Knights Landing' Phi processors interconnected by Intel's new Omni-Path networking technology. Users of the new HPC system have access to existing data management services including data storage, data sharing, metadata consulting, and data publishing, leveraging the NSF-funded high-performance networking infrastructure and long term storage system, as well as additional cyberinfrastructure, at CU-Boulder and CSU. The many-core feature of this HPC system enhances graduate and undergraduate students' education and training as they develop, deploy, test, and run optimized applications for next generation many-core architectures. Training for researchers and students is provided through workshops appropriate for introducing diverse audiences to the efficient and effective use of HPC systems, the challenges of vectorization for single core performance, shared memory parallelism, and issues of data management. Additionally, advanced workshops on large-scale distributed computing, high-throughput computing, and data-intensive computing are offered during the year and at the annual RMACC student-centric HPC Symposium. The Symposium brings together hundreds of students, researchers, and professionals from universities, national laboratories and industry to exchange ideas and best practices in all areas of cyberinfrastructure. For-credit HPC classes will be delivered for online participation, educating the next generation of computational scientists in state-of-the-art computational techniques.
EarthCube RCN: Collaborative Research: Research Coordination Network for HighPerformance Distributed Computing in the Polar Sciences September 1, 2015 August 31, 2017
One of the major current challenges with polar
cyberinfrastructure is managing and fully exploiting the volume of high-resolution commercial imagery now being collected over the polar regions. This data can be used to understand the changes in polar regions due to climate change and other processes. The potential of global socio-economic costs of these impacts make it an urgent priority to better understand polar systems. Understanding the mechanisms that underlie polar climate change and the links between polar and global climate systems requires a combination of field data, high-resolution observations from satellites, airborne imagery, and computer model outputs. Computational approaches have the potential to support faster and more fine-grained integration and analysis of these and other data types, thus increasing the efficiency of analyzing and understanding the complex processes. This project will support advances in computing tools and techniques that will enable the Polar Sciences Community to address significant challenges, both in the short and long-term.

The impact of this project will be in the improvements in the ability to utilize advanced cyberinfrastructure and high-performance distributed computing to fundamentally alter the scale, sophistication and scope of polar science problems that will be addressed. This project will not implement those changes but will identify and lay the groundwork for such impact across the Polar Sciences. The Project personnel will identify primary barriers to the uptake of high-performance and distributed computing and will help alleviate them through a combination of community based solutions and training. The project will also produce a roadmap detailing a credible and effective way to meet the long-term computing challenges faced by the Polar Science community and possible plans to effectively address them. This project will establish mechanisms for community engagement which include, gathering technical requirements for polar cyberinfrastructure and supporting and training early career scientists and graduate students.
Fostering Successful Innovative Large-Scale, Distributed Science and Engineering Projects through Integrated Collaboration September 1, 2015 August 31, 2016
Large-scale, innovative science and engineering requires
collaboration across geographically-distributed, multidisciplinary teams; however, it is very difficult for projects to maintain intellectual cohesion, tight coordination, and integration necessary to manage scalable ?virtual? work that is distributed.

The goal of this project is to help teams to succeed - maximizing efficiency, effectiveness, and innovativeness. This proposal is to develop the capacity for leaders of Centers, Institutes, Labs and other collaborations to plan and pursue transformative research agendas in order to truly create breakthroughs in smart and connected health, cyberphysical systems, smart cities, cybersecurity, big data, environmental sustainability, and across the domains of basic research. Training in design and management of such collaborations will be provided and tools and techniques will be developed. This work will draw on lessons learned from organization science to develop a customized curriculum to help large-scale science and engineering teams effectively and efficiently collaborate at this scale. Using the developed materials, the first of a series of workshops targeting potential principle investigators interested in large-scale Computer and Information Science and Engineering (CISE)-related projects will be conducted in the spring of 2016.
BD Hubs: Midwest: "SEEDCorn: Sustainable Enabling Environment for Data Collaboration that you are proposing in response to the NSF Big Data Regional Innovation Hubs (BD Hubs): Accelerating the Big Data Innovation Ecosystem (NSF 15-562) solicitation October 1, 2015 September 30, 2018
Catalyzed by the NSF Big Data Hub program, the Universities
of Illinois, Indiana, Michigan, North Dakota, and Iowa State University have created a flexible regional Midwest Big Data Hub (MBDH), with a network of diverse and committed regional supporting partners (including colleges, universities, and libraries; non-profit organizations; industry; city, state and federal government organizations who bring data projects from multiple private, public, and government sources and funding agencies). The NSF-funded SEEDCorn project will be the foundational project to energize the activities of MBDH, leveraging partner activities and resources, coordinating existing projects, initiating 20-30 new public-private partnerships, sharing best practices and data policies, starting pilots, and helping to acquire funding. The result of SEEDCorn will be a sustainable hub of Big Data activities across the region and across the nation that enable research communities to better tackle complex science, engineering, and societal challenges, that support competitiveness of US industry, and that enable decision makers to make more informed decisions on topics ranging from public policy to economic development.

The MBDH is focusing on specific strengths and themes of importance to the Midwest across three sectors: Society (including smart cities and communities, network science, business analytics), Natural & Built World (including food, energy, water, digital agriculture, transportation, advanced manufacturing), and Healthcare and Biomedical Research (which spans patient care to genomics). Integrative "rings" connect all spokes and will be organized around themes of specific MBDH strengths, including (a) Data Science, where computational and statistical approaches can be developed and integrated with domain knowledge and societal considerations that support the underlying needs of "data to knowledge," (b) services, infrastructure, and tools needed to collect, store, link, serve, and analyze complex data collections, to support pilot projects, and ultimately provide production-level data services across the hub, and (c) educational activities needed to advance the knowledge base and train a new generation of data science-enabled specialists and a more general workforce in the practice and use of data science and services.
Secure Data Architecture: Shared Intelligence Platform for Protecting our National Cyberinfrastructure" that you are proposing in response to the NSF Cybersecurity Innovation for Cyberinfrastructure (NSF 15-549) solicitation December 1, 2015 November 30, 2018
This research is expected to significantly enhance the
security of campus and research networks. It addresses the emerging security challenge of open, unrestricted access to campus research networks, but beyond that it lays the foundation for an evolvable intelligence sharing network with the very real potential for national scale analysis of that intelligence. Further it will supply cyber security researchers with a rich real-world intelligence source upon which to test their theories, tools, and techniques. The research will produce a new kind of virtual security appliance that will significantly enhance the security posture of open science networks so that advanced high-performance network-based research can be carried out free of performance lags induced by more traditional security controls.

This research will integrate prior research results, expertise and security products from from both the National Science Foundation and the Department of Energy to advance the security infrastructure available for open science networks, aka Science DMZs. Further the effort will actively promote sharing of intelligence among science DMZ participants as well as with national academic computational resources and organizations that wish to participate. Beyond meeting the security needs of campus-based DMZs, the effort will lay the foundation for an intelligence sharing infrastructure that will provide a significant benefit to the cybersecurity research community, making possible the collection, annotation, and open distribution of a national scale security intelligence to help test and validate on-going security research.
CILogon 2.0 project that you are proposing in response to the NSF Cybersecurity Innovation for Cyberinfrastructure (NSF 15-549) solicitation January 1, 2016 December 31, 2018
When scientists work together, they use web sites and other
software to share their ideas and data. To ensure the integrity of their work, these systems require the scientists to log in and verify that they are part of the team working on a particular science problem. Too often, the identity and access verification process is a stumbling block for the scientists. Scientific research projects are forced to invest time and effort into developing and supporting Identity and Access Management (IdAM) services, distracting them from the core goals of their research collaboration. The "CILogon 2.0" project provides an IdAM platform that enables scientists to work together to meet their IdAM needs more effectively so they can allocate more time and effort to their core mission of scientific research. To ensure that the project makes a real contribution to scientific collaborations, the researchers have partnered with the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration, the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) Physics Frontiers Center, and the Data Observation Network for Earth (DataONE). The project also provides training and outreach to additional scientific collaborations, and the project supports integration with the Extreme Science and Engineering Discovery Environment (XSEDE), which provides a national-scale cyberinfrastructure for scientific research in the US.

Prior to the "CILogon 2.0" project, the CILogon and COmanage projects separately developed platforms for federated identity management and collaborative organization management. Federated identity management enables researchers to use their home organization identities to access cyberinfrastructure, rather than requiring yet another username and password to log on. Collaborative organization management enables research projects to define user groups for authorization to collaboration platforms (e.g., wikis, mailing lists, and domain applications). The "CILogon 2.0" project integrates and expands on the existing CILogon and COmanage software to provide an integrated Identity and Access Management (IdAM) platform for cyberinfrastructure. This IdAM platform serves the unique needs of research collaborations, namely the need to dynamically form collaboration groups across organizations and countries, sharing access to data, instruments, compute clusters, and other resources to enable scientific discovery. The project provides a software-as-a-service platform to ease integration with cyberinfrastructure, while making all software components publicly available under open source licenses to enable re-use.
DIBBs: Merging Science and Cyberinfrastructure Pathways: The Whole Tale March 1, 2016 February 28, 2021
Scholarly publications today are still mostly
disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible from scholarly publications. The third layer is broad, encompassing numerous research communities through science pathways (e.g., in astronomy, life and earth sciences, materials science, social science), and deep, using interconnected cyberinfrastructure pathways and shared technologies.

The goal of this project is to strengthen the second layer of research output, and to build a robust third layer that integrates all parts of the story, conveying the holistic experience of reproducible scientific inquiry by (1) exposing existing cyberinfrastructure through popular frontends, e.g., digital notebooks (IPython, Jupyter), traditional scripting environments, and workflow systems; (2) developing the necessary 'software glue' for seamless access to different backend capabilities, including from DataNet federations and Data Infrastructure Building Blocks (DIBBs) projects; and (3) enhancing the complete data-to-publication lifecycle by empowering scientists to create computational narratives in their usual programming environments, enhanced with new capabilities from the underlying cyberinfrastructure (e.g., identity management, advanced data access and provenance APIs, and Digital Object Identifier-based data publications). The technologies and interfaces will be developed and stress-tested using a diverse set of data types, technical frameworks, and early adopters across a range of science domains.
Associated Universities, Inc. (AUI) and the National Radio Astronomy Observatory (NRAO) April 1, 2016 September 30, 2026
To enable a world-wide multi-user community to
realize research and education programs of the highest caliber, Associated Universities, Inc. (AUI) presents a strategic vision over the next decade to manage, operate, optimize and disseminate results from the world-leading capabilities of the National Radio Astronomy Observatory (NRAO). With the successful construction of the Atacama Large Millimeter/submillimeter Array (ALMA), and the recent enhancement of the Karl G. Jansky Very Large Array (VLA), two new forefront facilities are moving into routine operation with ever-increasing scientific capability. Taken together, these iconic arrays fulfill a major milestone in modern astronomy, encompassing more than an order-of-magnitude leap in observational capabilities for astronomical sources at frequencies between 1 gigahertz and 1 terahertz.

As prioritized by multiple National Research Council Decadal Surveys in Astronomy and Astrophysics, NRAO facilities are tools for the entire scientific community that will empower discoveries across all fields of astrophysics. ALMA enables transformational research into the physics of the cold Universe, regions that are optically dark but shine brightly in the millimeter/submillimeter portion of the electromagnetic spectrum. Within the broad range of science accessible with ALMA, the top-level objectives include imaging the redshifted dust continuum and molecular line emission from evolving galaxies as early as a redshift of z~10 (500 million years after the Big Bang), determining the chemical composition and dynamics of star-forming gas in normal galaxies like the Milky Way but at z~3 (75% of the way across the Universe), and measuring the gas kinematics in young disks in nearby star-forming clouds. ALMA has already demonstrated its revolutionary impact with its dramatic images of planet, star and galaxy formation. These results will accelerate as the full array becomes operational, and with the longest baselines ALMA will achieve an angular resolution of tens of milli-arseconds. ALMA provides one to two orders-of-magnitude improvement over previous facilities in all areas of millimeter- and submillimeter-wave observations, including sensitivity, angular resolution and image fidelity.

Likewise, at centimeter wavelengths, the broadband VLA has ushered in a new era in radio astronomy, with groundbreaking results published in areas ranging from Galactic proto-stellar clouds to images of the molecular gas in the earliest galaxies. The enhanced VLA is opening new scientific frontiers and explicitly addressing four primary science themes: measuring the strength and topology of cosmic magnetic fields; imaging young stars and massive black holes in dust-enshrouded environments; following the rapid evolution of energetic phenomena; and studying the formation and evolution of stars, galaxies and active galactic nuclei. Improvement over previous performance is up to a factor of 10 in continuum sensitivity and coarsest frequency resolution, and a factor of 1000 or more in finest frequency resolution and the number of frequency channels.

In collaboration with NSF's international partners in ALMA, NRAO will transition ALMA from the current phase of commissioning and early science to full science operations. Already the most capable millimeter/submillimeter facility on the planet, in the next few years ALMA will realize significant new capabilities, further increasing ALMA's scientific productivity. The ALMA Development

Program, a key component in the plan for the coming decade, will solicit and support community input and expertise in upgrading ALMA's capabilities throughout its useful lifetime.

Under AUI management, NRAO will implement a staged VLA infrastructure maintenance and development plan to renew and support operation of the VLA beyond the end of the next decade, followed by community-based planning and technical development for the next-generation centimeter-wave facilities. AUI will expand the NRAO Central Development Laboratory (CDL) mission to enhance NSF's existing radio astronomy facilities, to develop technology and expertise needed to build the next generation of radio astronomy instruments, and to benefit the broader economy via technology transfer. In collaboration with the university community, the CDL will support development for both ALMA and VLA and conduct leading-edge, creative research in both core and exploratory technologies that will continue to be vital to the NRAO mission in the coming decade.

With plans for enhanced user support services and new data manipulation and visualization tools, AUI envisions expanding the NRAO user base beyond traditional radio astronomers and enabling multi-wavelength science by researchers and students. AUI will also ensure that NSF's investment in NRAO achieves the broadest possible impact in cutting-edge research and technical innovation, training the next generation of researchers, and inspiring students and the public.

Building on an existing framework of diversity activities, AUI will conduct ambitious programs to transform the participation of underrepresented groups in science and engineering. An Office of Diversity Initiatives will lead programs, including the National Astronomy Consortium and Physics Inspiring the Next Generation, to empower under-represented students to obtain graduate degrees in STEM fields. The enhanced International/National Exchange Program and Chilean Women Graduate Internships will support international student research in radio astronomy. A key AUI objective for the NRAO workforce in the coming decade will be to move toward achieving parity with the nation's demographics for women and people of color.

AUI embraces an integrative approach to education and public outreach (EPO), closely aligned with NRAO research. The EPO plan builds on a comprehensive suite of programs, targeting learners of all ages, broad geographic regions, and traditionally under-represented groups, and incorporates federal STEM education initiatives that identify evidence-based best practices. NRAO will support graduate and undergraduate research, and Jansky postdoctoral fellows will carry out investigations independently, or in collaboration with staff and/or university collaborators, thus building professional relationships between NRAO and academic research groups.

As part of AUI's management and oversight of NRAO, AUI will regularly review the technical, financial, and administrative functioning of NRAO as well as AUI's own governance and business practices. Key Performance Indicators and both qualitative and quantitative assessments will inform NRAO activities and AUI policies to achieve optimal management and operation of NRAO.