High Performance Computing

NC State HPC Partner Program

    Researchers purchase compatible HPC hardware and any specialized or discipline-specific software licenses. NC State Office of Information Technology (OIT) provides space in an appropriate and secure operating environment, all necessary infrastructure (rack, chassis, power, cooling, networking), and the system administration and server support.

    In return for infrastructure and services provided by OIT, when partner compute resources are not being used by the partner they are available to the general NC State HPC user community.

    Partner Program Computational Hardware

    Partner program hardware options are compatible with the general HPC hardware being operated by OIT. Compatible hardware allows limited systems staff to effectively support a large number of systems - since the systems effort required to manage the cluster increases very little with additional compatible hardware.

    The hardware environment currently available for partners is:

    • a distributed memory Linux cluster environment based on IBM BladeCenter and Flex System hardware [Large memory nodes can be obtained to use as shared memory compute resources integrated into the henry2 cluster].

    Distributed Memory Linux Cluster
    Current partner Linux cluster compute nodes are Intel eight-core Xeon based. These compute nodes have two multicore processors (currently eight-core Xeon) and are typically configured with 64GB of memory (4GB per processing core) and a 300GB SAS disk [local drive used for operating system, swap, and tmp space only].

    Partner cost for Linux cluster compute nodes is the actual cost of the node with three years of maintenance included. OIT provides chassis space and all other necessary infrastructure for Ethernet connection to the henry2 cluster.

    InfiniBand low latency interconnect options are available for partner compute nodes (with additional cost).

    Management of Partner Compute Resources

    The Linux cluster uses Platform LSF for resource management and scheduling. LSF fair share scheduling is used to provide equitable access to compute resources accounting for resources added by partners.

    Partners have a dedicated LSF queue that provides access to their compute resources. Also, the LSF fair share value for partners reflects their participation in the overall resource. This allows partners to utilize their resources through their exclusive queue or to utilize general resources with a higher priority (based on fraction of overall resources owned by the partner).

    All access to compute nodes is through LSF. Separate, shared login nodes provide access to all HPC compute resources.

    Partner Program Storage

    HPC Partners purchase storage which is made available from HPC resources and is operated and maintained by HPC staff. There are three storage service options available:

    • Network Attached Storage This service uses a storage server running NFS to export storage to all HPC nodes. This type storage service is suitable for use by serial and small parallel jobs (provided the parallel jobs are not using MPI-IO).
    • Parallel File System This service uses several storage servers (typically four) running GPFS to provide a parallel file system available to all HPC distributed memory nodes. This type storage service is suitable for use by all jobs including large parallel jobs and jobs using MPI-IO.
    • Network Storage with Hierarchical Storage Management This service uses a storage server running NFS to export stroage to HPC login nodes. Files which have not been recently accessed are migrated to tape - while their directory information remains on disk. Any access of the file automatically recalls it from tape. This type storage service is suitable for storage of many large, infrequently used datasets. This storage service is not suitable to use as working space for running jobs.

    All three services include tape backup with a default policy of retaining one backup copy of each file. Partners can specify other policies (eg retain all versions for last two weeks) and resulting costs are passed through to the partner.

Last modified: April 19 2013 11:03:57.