NC State High Performance Computing
High Performance Computing

Getting Started with HPC storage systems

    HPC users have a number of file systems available for their use. Effective use of the HPC resources requires some understanding of the types of available file systems and their intended use.

    The general types of storage available are:

    • home directory
    • local scratch space
    • shared scratch space
    • mass storage space

    The following sections will describe each of these in some detail including the intended use of these storage resources.

  • Home Directory

    Each user has a home directory. The home directory is shared by all cluster nodes. Individual user file quotas are enforced on home directories. Total available space in the home file system is relatively small by design and these quotas are used to manage the available space. Home directories are intended to be used to hold commonly used scripts, environment configuration files, and modest size source trees. Home directories are backed up daily. Only one copy of each file is retained in the backup. Files which have been deleted for more than 5 days are subject to being deleted from the backup.

  • Scratch Space

    Scratch space is intended to be used for the storage requirements for running jobs. In particular, large input or output files should use scratch space during job execution. Scratch file systems are world writable. Users should create a directory for their use to avoid potential file name conflicts with other users.

    Scratch space is not backed up.

    • Local Scratch Space

      Local scratch space is directly connected to the compute node. On the Linux cluster the local scratch file system available to users is /scratch. Local scratch file system contents are only available to the local node to which the file system is directly connected. Use of the local scratch space must be managed from the user's LSF script (since there is no way to know ahead of time which nodes a job may be assigned) both movement of files to the space and removal of files after execution completes. Local scratch space on the cluster is subject to immediate removal of files at the completion of the LSF job.

      Local scratch space is relatively small a few GB to few dozen GB depending on the node and must be carefully managed by the user. Except for a few very special cases use of local scratch space should be avoided.

      Like all Linux systems, compute nodes have a world writable /tmp file system. This space is essential for the proper operation of the operating system and many applications. /tmp on the compute nodes is very small (~2GB) and should never be used for user file storage.

    • Shared Scratch Space

      The Linux cluster also has shared scratch space. These file systems are network attached to the login nodes and to all of the compute nodes. /share, /share2, and /share3 are currently available to any user needing scratch space.

      Shared scratch file systems are subject to periodic purge and are not backed up. A per project quota is enforced on each shared scratch file system.

      Any file in shared scratch space is subject to removal at any time. A purge is used to maintain free space in the file system. While the purge generally allows files to remain on the shared scratch file systems for a week or more, during periods of high disk use this may not be true and files that are only a day or two old may also be removed by the purge.

      As with local scratch space this storage is intended to provide large storage space required by jobs during execution.

      A GPFS file system is also available on the Linux cluster (henry2). This file system /gpfs_share has a per project quota and is also not backed up. Codes spending significant amounts of time doing parallel I/O - and any code using MPI-IO - should use /gpfs_share.

      As of spring of 2014 the shared scratch file systems (actually now file sets) /share, /share2, and /share3 are also provided via a GPFS file system, /gpfs_common.

  • Mass Storage System

    Mass storage space is intended to hold important files that are too large to be stored in users' home directories. Users requiring mass storage space should request that a mass storage directory be created for their use.

    It is anticipated that research groups will have up to a 1TB group quota for mass storage space with options to purchase additional quota if required.

    Mass storage space is available from all login nodes. Mass storage space is not available from compute nodes and can not be used as an alternative to scratch space for running jobs.

    • Configuration

      There are currently two mass storage file systems, /ncsu/volume1 and /ncsu/volume2. Users will only be provided a directory on one of these file systems.

      Separate file servers are used for /ncsu/volume1 and /ncsu/volume2. Both file systems are availabe from login nodes via NFS.

    • Backups

      Backup frequency for the HPC storage system is daily from the /home, /ncsu/volume1, and /ncsu/volume2 file systems to a tape library. One copy of each file is maintained in the tape library. When a file is modified on disk the new version of the file replaces any previous backup of that file.

      Files removed from /home, /ncsu/volume1, or /ncsu/volume2 file system will remain in the backup for at least five days.

      A consequence of the backup policy is that files that are updated with the same name will overwrite the backup version during the daily update. Files that are being modified for which previous versions may be needed should be modified using a file naming scheme to retain previous versions with unique file names.

    • HSM

      An additional level of management is utilized on /ncsu/volume1. Tivoli Space Manager is used to migrate older, larger files from the file system disk to tape. Migrated files are retrieved automatically if they are accessed.

      Space manager seeks to maintain the disk usage level for /ncsu/volume1 between 85% and 90%.

Last modified: August 01 2014 13:59:04.