UMBC logo
UMBC High Performance Computing Facility
System Description

maya cluster

The maya cluster is a heterogeneous cluster with equipment acquired between 2009 and 2013. It contains a total of 240 nodes, 38 GPUs and 38 Intel Phi coprocessors, and over 8 TB of main memory. The name maya is from a concept in Hinduism that means "illusion", and that the human experience comprehends only a tiny fragment of the fundamental nature of the universe. Previous cluster names tara and kali also originate from Hinduism. For more information about using your account on the system, see this page. All nodes are running Red Hat Enterprise Linux 6.4.

A schematic of the layout of maya is given below. The system is composed of two major networks, labeled IB-QDR and IB-DDR. For more information about these networks, see the Network Hardware section below. Notice that the HPCF2013 and HPCF2009 nodes are on the network IB-QDR, while the HPCF2010 nodes are on IB-DDR. Some photos of the cluster are given below.
Racks, front
Racks, back
Nodes, close up
IB-QDR, front
HPCF2013
Racks
Racks, doors opened
Nodes
IB-QDR, back
HPCF2009
Racks, doors open
Racks, side view
Racks, back
IB-DDR, back
HPCF2010

The Ivy Bridge and Nehalem CPUs in maya have some features which should benefit large-scale scientific computations. Each processor has its own memory channels to its dedicated local memory, which should offer efficient memory access when multiple cores are in use. The CPUs have also been designed for optimized cache access and loop performance. For more information about the CPUs, see:

The following schematics show the architecture of the CPUs, GPUs, and Phis for the HPCF2013. First schematic shows one of the compute nodes that consists of two eight-core 2.6~GHz Intel E5-2650v2 Ivy Bridge CPUs. Each core of each CPU has dedicated 32~kB of L1 and 256~kB of L2 cache. All cores of each CPU share 20~MB of L3 cache. The 64~GB of the node's memory is the combination of eight 8~GB DIMMs, four of which are connected to each CPU. The two CPUs of a node are connected to each other by two QPI (quick path interconnect) links. Nodes are connected by a quad-data rate InfiniBand interconnect.

The NVIDIA K20 is a powerful general purpose graphics processing unit (GPGPU) with 2496 computational cores which is designed for efficient double-precision calculation. GPU accelerated computing has become popular in recent years due to the GPU's ability to achieve high performance in computationally intensive portions of code beyond a general purpose CPU. The NVIDIA K20 GPU has 5 GB of onboard memory.

The Intel Phi 5110P is a recent offering from Intel, which packages 60 cores into a single coprocessor. Each core is x86 compatible and is capable of running its own instruction stream. The x86 compatibility allows the programmer to use familiar frameworks such as MPI and OpenMP when developing code for the Phi. The Phi 5110P has 8 GB of onboard memory.

CPUs, GPUs, and Phis for HPCF2013 equipment

Some HPCF technical reports contain performance studies that show how to use the cluster in a profitable way (i.e. good performance using the latest equipment):

The maya cluster contains several types of nodes that fall into four main categories for usage.

The maya cluster features several different computing environments. We ask that, when there is no specific need to use the Dell equipment, users should use the HPCF2010 nodes first. Then the HPCF2009 nodes. The HPCF2013 CPU-only nodes should be used when HPCF2009 is filled, and the GPU and Phi-enabled nodes should only be used when all other nodes are unavailable. The scheduler will be set up to help enforce this automatically.

The table below gives the hostnames of all nodes in the cluster, and to which hardware group and cabinet they belong.

Hardware Group Rack Hostnames Description
hpcf2013 A maya-mgt Management node, for admin use only
A n1, n2, ..., n33 CPU-only compute nodes.
n1 and n2 are currently designated as development nodes
B maya-usr2 User node with Phis
B n34, n35, ..., n51 Compute nodes with Phis
C maya-usr1 User node with GPUs
C n52, n53, ..., n69 Compute nodes witth GPUs
hpcf2010 A n70, n71, ..., n111 Compute nodes
n70 is currently designated as a development node
B n112, n113, ..., n153 Compute nodes
n112 is currently designated as a development node
hpcf2009 A n154, n155, ..., n195 Compute nodes
B n196, 198, ..., n237 Compute nodes

Network Hardware

Two networks connect all components of the system:

For more information about the network components, see:

Storage

There are a few special storage systems attached to the clusters, in addition to the standard Unix filesystem. Here we descibe the areas which are relevant to users. See Using Your Account for more information about how to access the space.