What resources are available to help determine my project's data storage needs for the preparation of a data management plan supporting an XSEDE storage allocation request?

A request for an allocation on an Extreme Science and Engineering Discovery Environment (XSEDE) dedicated storage system requires the submission of a detailed plan for cost-effective management of your project's data storage needs. Scientific computations and simulations frequently generate immense amounts of data (e.g., de novo genome assembly jobs can take a week to complete and generate almost a terabyte of data per run), and depending on the particular type of research you are performing, estimating your project's short- and long-term data storage needs can be challenging.

Generally, you can arrive at a fair estimate of your project's storage needs by asking:

  • How much data does a typical experiment generate (and how much of that do you need to archive long-term)?
  • How much larger do you expect data sets to grow over the duration of your project?
  • How many experiments do you plan to perform?

The following online resources can help you make a comprehensive estimate of your project's storage needs; they also provide guidelines and examples that can help you prepare your data management plan:

  • The DMPTool: The Data Management Plan Tool (DMPTool), a service of the University of California Curation Center (UC3) of the California Digital Library (CDL), provides templates and step-by-step instructions for preparing data management plans that meet the requirements of specific funding agencies, including the National Science Foundation (NSF) and National Institutes of Health (NIH), and in many cases can connect you with data management resources tailored to your specific institution.

    To see a list of participating institutions, create an account, and/or log in to begin using the DMPTool, see Institution Log In. For more, see About the DMPTool and DMP Requirements.

  • DataONE Best Practices database: The Data Observation Network for Earth (DataONE) has compiled a searchable online Best Practices database to help researchers learn to effectively work with their data through every stage of the data lifecycle. An Advanced Best Practices Search Page is also available to filter search results using one or more tags. Additionally, DataONE's Best Practices Primer (in PDF format) describes fundamental data management practices, and includes tips for describing, managing, preserving, and sharing your project's data.

    DataONE also collaborated on the creation of the DMPTool (described above), and provides several sample data management plans that conform to its best practices guidelines on its Data Management Planning page.

Additionally, the XSEDE User Portal provides Best Practice on Writing a Successful Proposal, which links to an example of a successful Research allocation request (in PDF format; XSEDE-wide login required). XSEDE also offers related online training sessions during each allocation cycle; search Training Events in the XSEDE User Portal for an upcoming webinar. For more on writing and submitting a successful XSEDE allocation request, see the "Preparing a Successful Request Document" section of the Allocation Policies page.

For an overview of storage resources available to Extreme Science and Engineering Discovery Environment (XSEDE) researchers, see the Storage page on the XSEDE website. To check the current availability of allocatable storage space, consult the Systems Monitor in the XSEDE User Portal (XUP). For specific system details, including file transfer methods, quotas, and backup and purge policies, access the user guides available on the XSEDE Resources User Guides page in the XUP.

To request an allocation on an XSEDE dedicated storage service, follow the process for submitting a Research allocation request through the XSEDE User Portal. On the Resource Request page, select the desired system and indicate the storage capacity (in gigabytes) your project requires. For more, see How do I apply for a new XSEDE allocation?

If you need help or have questions, contact the XSEDE Help Desk.

This document was developed with support from National Science Foundation (NSF) grants 1053575 and 1548562. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.