NICS HPSS User Guide

System Overview

The High Performance Storage System (HPSS) is a mass storage facility at NICS. The system consists of tape and disk components, Linux servers, and HPSS software. The system is for medium to long-term storage of data produced on NICS resource such as Kraken or Nautilus.

System Configuration

The center has 5 SL8500 tape libraries, each holding up to 10,000 cartridges. The libraries house a total of twenty-four T10K-A tape drives (500 GB cartridges, uncompressed) and sixty-four T10K-B tape drives (1TB cartridges, uncompressed)

Incoming data is written to disk and later migrated to tape for long term archival. Robotic tape libraries contain the various tape drives. Each drive has a bandwidth of 120 MB/s.

File Systems

Users can access HPSS from Kraken and Nautilus via the htar and hsi commands. HPSS has a total capacity of approximately 14PB. As of February 2013, the system has approximately 2PB of free space. Projects have a 50TB quota. Access to the system is by direct login using the hsi or htar commands.

System Access

System Availability

HPSS is in full production. Space on HPSS is for files that are not immediately needed. HPSS is a large storage system, however, space is not unlimited. Users must not store files unrelated to their NICS projects on HPSS. They must also periodically review their files and remove unneeded ones.

Allocations

HPSS is an allocated resource. Allocations must be requested separately from compute allocation requests and will be reviewed at quarterly XRAC meetings.

Methods of Access

HPSS is accessed through the hsi or htar commands. The hsi is similar to FTP and htar is similar to the tar command. hsi provides an interactive shell and most standard UNIX commands (cp, mv, rm, cd, etc.) work as expected. To initiate an interactive session enter hsi at the command prompt on Kraken or Nautilus.

NOTE Access to HPSS is via one-time password (OTP) token.

HPSS Usage Policies

  • User quotas: Projects have a maximum quota of 50TB or the amount of allocation whichever is lower.
  • Data retention: Data will be retained for six (6) months after a project has been ended. After this time, data will be deleted.
  • Purge policies : There are no purge policies on HPSS. NICS does reserve the right to request users to remove unneeded data to ensure adequate storage space to accommodate allocations for all users.
  • Passwords: HPSS access is via one-time password (OTP) token. Passcode-less access can be set up upon request.

Transferring Data to/from HPSS

Data Transfer Methods : hsi

The hsi utility allows automatic authentication and provides a user-friendly command line and interactive interface to HPSS.

If you are archiving a large directory, or collection of files, please tar the files before using hsi, or use htar which combines tar and hsi functionalities into a single command. hsi and htar are the preferred methods for accessing HPSS.

Please note: Large long-running data transfers should be submitted to the hpss queue rather than running hsi or htar interactively.

Note: hsi can only be used with the OTP token. hsi can be used in a batch job only if it is submitted to the HPSS queue from the OTP login node.

Using hsi

Issuing the command

$ hsi

will start hsi in interactive mode. Alternatively, you can use

$ hsi [options] command(s)

to execute a set of hsi commands and then return. Note that you may need to add /opt/public/bin to your $PATH to find the hsi executable.

The two most common hsi options are put and get. put transfers a file from the local file system to HPSS. get transfers a file from HPSS to the local file system. However, unlike ftp, the ordering of arguments does not change. The syntax is:

$ hsi {put | get} local_file:hpss_file

where "local_file" is the location and name of the local file and "hpss_file" is the location and name of the file on HPSS.

Data written to HPSS is automatically stored using a method appropriate to the file size, however, HPSS is not intended for storing many small files. If you want to upload several small files (less than 1 GB per file), you should tar the files before storing them in HPSS, or use htar. In general, htar should take about as long as hsi to transfer a given set of files.

Following is an example of storing a group of small files using tar and hsi. hsi can read from standard input and write to standard output.

$ tar cvf - . | hsi put - : <filename.tar>
$ hsi get - : <filename.tar> | tar xvf -

The following command can also be used.

$ hsi "cd /home/djohn/PATHTEST ; put -R *.tar.gz"

Users will be prompted for their PASSCODE when running hsi.

To exit from the hsi interactive environment of HPSS, type exit or quit. See the reference manual for command line options and other startup information.

Please note:

  1. The ideal archive file size range is between 8 GB to 256 GB. Users should refrain from archiving anything over 0.5 to 1.0 TB. Most tapes on HPSS are 6 TB, therefore,it is wise to not to write a very large file just in case it might have information existing on it already as to not have a single file overflow onto another separate tape. Data integrity is compromised by this sort of carry-over.
  2. Please do not run more than one hsi process simultaneously. Files are initially uploaded to a disk cache before being migrated to tape, this cache may be filled up with too many simultaneous transfers. Also, since transfers compete for bandwidth, additional hsi transfers may not be beneficial.

More information on hsi may be found from the NICS systems through the command

$ hsi help

Direct Transfers between HPSS and Remote Systems (e.g., User Workstations)

Because hsi is a third-party package, clients may be available for your system. However, the NICS currently supports access to the HPSS only through hsi clients on the HPC systems. To transfer data directly to or from the NICS HPSS, you will need to use a NICS resource as a staging system. For example, to transfer data from your directory on HPSS to a system outside the NICS, you will need to copy the data in reasonable chunks to a NICS system using the hsi utility. Once a portion of the data is on a NICS system, you can use a utility such as BBCP or SFTP/SCP to move the data to the system outside the NICS.

Data transfer methods: htar

The htar command combines tar and hsi functionalities into a single command. It is faster to use htar to create a tar file in HPSS than to either create a local tar file first and then do an "hsi put filename.tar" to copy it to HPSS. Using the htar command creates a separate index file, which contains the names and locations of all of the member files in the archive (tar) file.

The syntax is:

$ htar -cvf filename.tar

The reverse operation is supported which is equivalent to an "hsi get" followed by "tar -xvf".

$ htar -xvf filename.tar

Here, filename.tar is the name of the tar file on HPSS. Any further arguments list files to be extracted from the archive, that is "htar -xvf filename.tar ." will probably not extract anything because there will be no file named "." in the archive. Also, it is only possible to extract files with htar that have been created with htar and have an idx file. This is an index to the tar file, so if the archive itself resides on tape, it is still possible to list contents and extract specific files without migrating the whole archive to disk.

htar limits:

  • Maximum number of files by default: 1,000,000
  • Maximum number of files with special flags (-M 5000000): 5,000,000
  • Maximum file size for a component file: 68 GB
  • Maximum total file size for archive (limited by HPSS, not htar): 2^64 bytes
  • The htar command does not provide the ability to append, update or remove files.
  • The htar command does not support the wildcard expansion of pathnames.

Help

Please email requests for assistance with HPSS to help@xsede.org or submit a Help Desk ticket.

References

http://www.nics.tennessee.edu/computing-resources/kraken

Policies

All HPSS and storage policies can be found here: http://www.nics.tennessee.edu/policies.

Last update: March 6, 2013