Overviewhenry2 is a Linux cluster constructed using IBM blade servers. IBM Blade servers are housed in blade chassis that hold fourteen blades. The chassis provides power, cooling, management, and network connectivity for the blade servers. [Much of the design of IBM BladeCenter occurs at IBM's Research Triangle Park facility. IBM frequently hires more new graduates from NC State than they do from any other US university. A number of the BladeCenter design team members are graduates of NC State.]
|Image of IBM BladeCenter E chassis||Diagram of BladeCenter chassis|
When the central HPC service was initiated at NC State in 2003 a hardware strategy was adopted to incrementally grow the cluster rather than to periodically buy large monolithic clusters. The incremental approach was a better match with NC State's HPC funding and staffing. henry2 started with 64 nodes and 128 processors in 2003. Currently henry2 has 1163 nodes and more than 5000 processors.
henry2 nodes are typically dual Xeon blade servers. There are a mix of single-, dual-, and quad-core Xeon processors. Nodes typically have 2-3GB of memory per processor core and a modest size disk drive that is used to hold the operating system, swap space, and a small local scratch space. Nodes have two Gigabit Ethernet interfaces. For compute nodes one of these interfaces is used for a private network connecting the compute nodes to the login nodes and to cluster storage resources. On compute nodes the second Gigabit Ethernet interface is used for a private network dedicated to message passing traffic. On login nodes one interface is used for access from the Internet and the second for access to compute nodes and storage.
Typical compute node chassis network connections use two aggregated Gigabit
Ethernet links for the private HPC network connecting computed nodes to login
nodes and storage and four aggregated Gigabit Ethernet links for the message
henry2 cluster is assembled using two core Ethernet switches - one dedicated for message passing traffic and the second for other network connections needed in the cluster. The message passing switch can provide up to 384 Gigabit Ethernet ports and can support up to 96 chassis aggregating four links per chassis. Therefore, the current network architecture will support about 1344 nodes (this is an approximate number because some of the chassis have separate low latency networks and have a single Gigabit Ethernet link to the core message passing switch).
Communications between compute nodes is non-uniform due to henry2's network architecture. Nodes within a single chassis can communicate with no bandwidth restriction (full gigabit each direction) via the chassis Ethernet switch.
|Typical network connections between compute node chassis|
Communications between blades is different chassis is limited to 4 gigabits per second in each direction. So for example if eight nodes in one chassis were exchanging messages with eight nodes in another chassis they could each potentially have a gigabit per second of traffic in each direction for total of 8 gigabits of data per second in each direction. However the henry2 network architecutre would limit the achievable communication rate to four gigabits per second in each direction. The communication could also be impacted by message passing network traffic from the other blades in each chassis that share the aggregated links.
In addition to the bandwitdh effects of the network design there are also latency effects. Within a chassis messages have a single switch to traverse. For communications between chassis there are three switches that must be traversed (the chassis switch in each chassis plus the core switch). Each network switch adds some additional time it takes the message to reach its destination.
StorageThere are three types of file systems on the henry2 cluster. However only two types are generally accessed by users: Network File Systems (NFS) and General Parallel File Systems (GPFS). Directories such as /home, /usr/local, and /share are NFS mounted file systems. /gpfs_share is a GPFS file system.
File systems on henry2 use a variety of disk arrays typically fibre channel attached to file servers. In general blade servers are used for file servers on henry2. File server blades are located in chassis with an additional IO module that provides fibre channel connections to each blade server.
NFS file systems on henry2 typically have a single file server with fibre channel connection to a disk array. NFS tends to perform poorly when accessed from a large number of nodes concurrently. The typical henry2 configuration of NFS has several bottlenecks including single NFS server and single fibre channel connection to disk array.
|Typical henry2 NFS configuration|
GPFS provides capabilities to support concurrent parallel access of a single file from multiple nodes. However, GPFS tends to not perform well for accessing large numbers of small files.
|henry2 GPFS configuration for /gpfs_share|
GPFS as configured on henry2 uses four servers that create network shared disks (NSDs). These NSDs create a file system that is mounted on each henry2 login and compute node. GPFS IO operations are cached on the local node and then synchronized with the file system. Multiple NSD servers are able to support more concurrent use and also provide resiliency against failures.
Last modified: May 17 2012 17:19:59