This page last changed on Jan 13, 2009 by straha1.

A distributed memory cluster is a bunch of individual computers that work together on computing tasks that are too large for one computer. Generally the computers are connected by a very fast network and have lots of hard drive space. The individual computers on a cluster are usually either numerous and weak or small in number and very powerful. Our cluster tends more towards few, powerful machines.

There are several parts of the architecture of a cluster, and several things you need to know about how to interact with a cluster:

Hardware

Nodes

A node is an individual computer that is designed to run users' programs. There are two different types of nodes: head nodes and compute nodes. They differ only in how users interact with them.

Our nodes have 13 GB of RAM, a 160 GB hard drive and two, dual-core AMD Opteron processors. Their AMD Opteron processors are compatible with most commercial software, including Comsol Multiphysics, Matlab and IDL. Essentially, they are big, fast desktop machines without monitors.

Head Nodes

A head node or login node is a machine that you can directly log in to. There is currently one head node on the hpc.rs.umbc.edu cluster, logically named hpc.rs.umbc.edu and it is identical to the other nodes except that it has 16 GB of RAM instead of 13. The head nodes have a number of purposes:

  • When you want to start a computational job (or job for short) on the cluster, you start it from a head node.
  • The head nodes allow you to monitor and interact with the computational jobs that you are running on the cluster (and ones that you're waiting to run).
  • The head nodes also allow you to develop new programs to run on the cluster.

We'll get into how computational jobs work in a minute.

Compute Nodes

A compute node also runs your jobs, but you can't log in to a compute node. Instead, you have to submit a computational job to the cluster, and the cluster will run it when there are available compute nodes. When the job starts to run, it will run on one or more compute nodes at a time, depending on how many you ask for.

File Servers

As I mentioned above, in a cluster, multiple machines work together on a single computing task. In order for that to be possible, the machines have to have a shared area in which to store files. One way to achieve that is to have file servers – dedicated machines whose only job is to transfer files to other machines. These file servers combine many hard drives in a manner that allows redundancy and fast data transfer. Redundancy means that one or more pieces of hardware can fail without the file server crashing or losing data. Sometimes file servers back up their data to magnetic tapes or to other off-site file servers. Other file servers don't do that since the amount of data they store is too large for that to be feasible.

On our cluster, there are several dedicated file servers, usually referred to by the nickname "Thumpers." These servers are connected to HPC by the fast Infiniband network but the data on them is not backed up in any way. The Thumpers contain some redundant hardware so they can survive a limited amount of hardware failure without losing data. You also have a home directory, and the data in your home directory is backed up, and is moderately fast, but you can only store about 100 MB of data in your home directory. The data in your home directory or on the Thumpers is only accessible on the HPC cluster. The Thumpers are only intended for temporary file storage such as input and output files during computations.

For long-term storage, you have other options. The UMBC Research Data Storage network provides reliable data storage space, but it is slower than the Thumpers. You can rent that space yearly from OIT. Contact your PI to see if such space is already available. You can also access your UMBC-wide AFS partition, which is where your UMBC email and website, and your home directory on OIT lab machines are all stored. Your UMBC Research Data Storage and AFS files can be accessed anywhere in UMBC.

See Disk Space and File Quotas for more information about these and other storage options.

Network

Computers on a cluster are connected by one or more networks. Usually these networks are much faster than the ones that you use in your home or office. They are faster in two different ways: lower latency and higher bandwidth. Latency is the amount of time it takes for a message to go from its source to its destination. Bandwidth is how much data you can send per second. There are several different types of high-speed networks and cluster designers usually pick a balance of latency and bandwidth to match the computational tasks the cluster will be performing.

Our cluster has two different networks. One is gigabit ethernet – the very same network used in some companies and in many UMBC buildings. The other is Infiniband – a high bandwidth, low latency network. It has a lower latency than ethernet and a much larger bandwidth. Our gigabit ethernet network is used primarily for system administration while the Infiniband is used for nearly everything else.

More Information on Hardware

For more information on the setup of HPC:

Software

Computational Jobs

You do not run your large computational tasks directly on the cluster. Instead, you break up your compute tasks into jobs – individual problem sets or computational tasks. Then you tell the cluster that you want to run the job. That results in your job being added to the job queue. A job queue is something like a line in a grocery store – once your job is at the front of the queue, it gets to run. However, getting to the front of the queue isn't as simple as it is in a grocery store – different jobs have different priorities and they need different amounts of resources. The scheduler decides when each job will run. I'll explain all of that in more detail below.

Jobs

A job is some sort of computational task that requires anywhere from one processor core on one machine all the way up to the entire cluster. That is a very open-ended and generic definition because jobs are open-ended, generic things. Pretty much anything that can run on one or more computers (without monitors) can be a job.

You describe jobs with job files. Job files have two parts: a resource usage description and the job commands. The resource usage description tells the scheduler how many machines you need, how much memory you need, how high-priority your job is, the name of your job and other information that the scheduler may need to know. The job commands are essentially your actual job. The job commands are the portion of the job file that do all the computational work that your job needs to do.

Jobs tend to fall into two different categories:

  • parallel jobs – a job that requires several machines communicating with each other to perform a single computational task
  • serial job – a job that performs a computational task on one machine at a time

Sometimes parallel jobs can be split up into many serial jobs. For example, suppose you have security footage that was digitized from a damaged VHS tape. You need to undo the damage by using a computationally-intensive image filter on each frame of the movie. You could have several computers work together on each frame of the movie (a parallel job), or you could have each computer work on a different frame (many serial jobs). If the image filter you're using takes up 50 GB of RAM, you cannot fit the frame on one machine. You would have to make a parallel job that has each of several machines work together, each processing part of the frame. If the image filter only took 1 GB of RAM, you could fit an entire frame on one machine. In that case, you could have each computer process one frame (many serial jobs).

Generally, if you can get away with doing a serial job, then that is what you should do. Serial jobs are easier to write – they're just like running a program on your own computer. Parallel jobs are harder because you have to use special pieces of software to allow your program to communicate between the machines. Also, generally the more machines you use, the less work you can do per machine due to the overhead of communication.

Whether you chose to run a serial or parallel job, you must submit it to the queue and wait for it to run. That brings us to...

The Queue

The queue contains a list of all of the queued jobs. A queued job is essentially a job that a user has asked to run, but that has not yet run. Another point of view is that, a queued job is a copy of your job file that is stored in the cluster's memory. Once your job is allowed to start running, that copy of your job file is moved to a compute node and the commands in your job file are executed. That compute node is given permission to access other compute nodes that were also allocated to your job.

Jobs in the queue do not execute in the order that they're inserted. Deciding what jobs to run when can be very complicated and it is the job of the scheduler.

The Scheduler

Scheduling jobs to run on a cluster is not as simple as it sounds. Some jobs will need only one machine and some will need more. What if you have ten queued jobs that need one machine each and one job that needs five machines? If you have five machines free, which job do you run? If you run the five-machine job, then the ten one-machine jobs will have to wait until it finishes. If you favor small jobs and run all ten of the one-machine jobs before running the five-machine job, then the five-machine job might be waiting for a long time before it gets to run. Some jobs have higher priorities than others, making the job of a scheduler even more difficult. Jobs that need the whole cluster might have to wait a long time before they can run.

Fortunately, you don't have to schedule your job – a computer program called a scheduler does that for you. There are many different schedulers, and it is difficult enough to write a good scheduler so that some companies are able to charge thousands of dollars or more for commercial schedulers.

More Information on Jobs

Job submission is covered in detail in our job submission tutorial and in several other places:

Storage Space

You will need to store files on HPC in order to use them in your computational tasks. The files are all stored in file servers as I mentioned earlier. You have several storage areas, and you can get more:

  • home directory – this is your personal area in which to store files. When you log in to hpc, you start in your home directory. You can only store 50 MB of data here. It is intended to be used only for final research results and other small but important files, as well as settings for various cluster programs. This data is thoroughly backed up, but slow to access.
  • personal scratch directory – this is another area to store your personal files. It is much faster, but it is not backed up. You can access it by typing "cd ~/scratch/" (without the quotes) when you're logged in to HPC.
  • common directory – this is an area for your research group to store files. It is fast but not backed up, just like your personal scratch directory. You can access it by typing "cd ~/common/" (without the quotes) when you're logged in to HPC.

There are other storage options, and your research group may have already made other arrangements. Contact your PI for more information. Or, if you are a PI, contact the cluster's Point of Contact.

This website has more information on storage areas:

Document generated by Confluence on Mar 31, 2011 15:37