The UMBC High Performance Computing Facility (HPCF)
is a shared resource for research at UMBC that requires a
high performance, particularly a parallel computer. The following
rules intend to help make this facility effective for users and to
ensure the maintenance of the facility. For the long-term benefit of
the community of users,
it is vital that all users comply with all aspects of the rules.
There are several aspects to usage rules on a large computers that is
shared by many users and additional aspects for a facility that relies
on active support from its users for its maintenance. Therefore, the
following items are grouped by their purpose.
These rules will be reviewed and updated periodically by
the HPCF Governance Committee and in
response to issues that come to our attention and in response to usage
patterns. This webpage always shows the current usage rules in effect.
If you have any questions or concerns regarding the rules,
do not hesitate to contact the
chair of the user committee, Dr. Matthias Gobbert;
for technical questions write to hpc-support@lists.umbc.edu preferrably;
see the contact information on this
webpage.
We will make a distriction between users and
PIs (principal investigators).
A user is anyone with an HPCF account.
A PI is a UMBC faculty member who brings their research projects to HPCF and
sponsors users to assist the PI on these projects. Users may
be sponsored by one or more PIs, and PIs may be users themselves.
Good User Behaviors
On a day-to-day basis, it is imperative that users run their code in a
responsible fashion, so as not to hinder or damage other users' code. To
this end, the following rules must be followed at all times. To comply
with many of these common sense rules might require you to understand
something about parallel computing or about the setup of the hardware
and software of the machine.
To this end, you are urged to study information on this webpage,
read e-mails to the list of users, attend user meetings and participate
actively in them, take classes that use the cluster in HPCF
(for instance, Math 447 or 627),
or ask questions using the support e-mail list
hpc-support@lists.umbc.edu.
It is always better to ask first and allow us to potentially coordinate
usage. Remember that HPCF provides user support through RAs that
can work with users to make the use of HPCF most effective.
All users must use the batch submission system of the scheduler
that is running on the user node to reserve compute nodes for their use.
You are not allowed to log in to the compute nodes for the purpose of
running job directly there.
Users will be notified by e-mail about issues related to the
system, such as scheduled downtime, upgrades, etc. Such mail may also
include requests for information and feedback. Users are required to
monitor their UMBC e-mail address
and are required to respond to contacts. This is part of the active
communication necessary for a shared resource such as this to be used
effectively by all users. The time slot for scheduled
downtime is every Tuesday evening. If a downtime is scheduled,
this usually will be communicated several days in advance.
IMPORTANT NOTE: If users are observed to violate any of the above rules
or are behaving in any way that impacts other users' ability to use the
resource, the chair of the user committee has the right to terminate the
users jobs and/or to suspend the user's account. Ordinarily, we will try
to make contact with the user first to discuss what is going on and to
try to work with the user, but if other users are impacted, the account
can be suspended first. The chair of the user committee acts
here on behalf of the entire user community; see the contact information for
details on the user committee.
Access to the Facility
This facility is a shared resource for research at UMBC that requires a
high performance parallel computer. To get an account to this facility,
please submit an account request form completely filled out.
All accounts must be sponsored by a UMBC faculty member.
Thus, for a student to get an account, two forms need to be submitted:
one by the student with name of the sponsor and
one by the sponsor listing the student's name.
To maintain their access, users must follow all rules outlined on the
HPCF webpage at all times.
To ensure the success of this facility in the long run, it is
vital that there be demonstrated research results created on this
machine, hence the PIs are required to have an on-going program of
high performance computing research.
PIs are invited to contribute direct funding to the facility at $5,000
per node or a multiple thereof. Contributions from faculty in this way
will be bundled and used for a periodic expansion of the cluster.
Contributing this money gives these PIs priority access over other
PIs to a proportion of the cluster, in the sense explained in the
following.
The access to compute nodes for users will be managed by a job
scheduling software, called scheduler, that reserves compute nodes for
users. The scheduler reserves compute nodes based on the availability of
resources in combination with a user's priority. Note that priority only
influences newly submitted jobs, not ones already running. The following
principles will guide the setup of the scheduler:
Basic scheduling is done by first-come-first-served, with
priorities adjusted by factors like job size and time waited.
The scheduler used currently is the software
SLURM
developed at Lawrence Livermore National Laboratory.
SLURM prioritizes jobs based on a multi-factor plugin, which
weighs several factors together to determine jobs' priority.
The priority of queued jobs incorporates
a "fair-share component". This component is determined by
the SLURM accounting group that a job is charged to.
There are two accounting groups in HPCF:
Contribution and Community.
PIs in the Contribution group have provided funding (or some other means of
contribution) to the facility. Community consists of the remaining PIs.
The relative weight between the two groups results from the
total financial contribution by members of the Contribution group.
Within the Contribution group, PI groups have weights proportional
to their financial contribution to the facility.
All PI groups in the Community group are weighed equally.
The following diagram demonstrates this hierarchy.
Principally, acceptable usage levels of the cluster are proportional
to a PI's standing in this hierarchy.
This means for example that users under community group should
not be monopolizing large portions of the cluster for
excessively long periods of time,
causing an unreasonable interference to jobs of contribution PIs.
The scheduling mechanism on the cluster is designed accordingly to give
increased access to users of paying PIs, while still serving the needs of our
Community users.
The scheduler tracks usage and uses a 30-day sliding time window
in the fair-share calculation of a user's priority.
That is, each PI group has in effect a monthly allocation of
the cluster expressed in terms of CPU hours
(wall time multiplied by number of cores used).
This allocation will be a nominal amount for users in the Community
group. Contribution PIs will receive an allocation proportional
to the number of nodes purchased. PIs who have used less than their
monthly allocation will have their jobs' priorities boosted, whereas PIs
who have used more than their monthly allocation will have their
priorities reduced.
Much effort has gone into a designing scheduling rules that will
automatically support the objectives of the cluster. However, it is ultimately
the responsibility of each user to ensure he or she is
maintaining an appropriate usage level.
The meaning of "appropriate" varies with usage patterns of the
overall community, so we may request that you adjust your usage based on the
current situation.
Users working under several PIs should take care to charge their
computing time to the correct PI. This can be done on a per-job basis,
and if no PI is specified one is considered to be the default.
Contribution users will also get increased access to running long
term jobs. We consider a "long term" job to be longer than over-night.
Community users will also be given a more limited
access to running long term jobs.
Jobs requiring many resources (i.e. many processors) are
allowed to use less wall time, to avoid tying up significant portions
of the cluster for too long.
Additionally, if current usage patterns on the machine allow
for it, we are happy to let users run longer or larger jobs by
arrangement; contact hpc-support@lists.umbc.edu.
The above rules do not apply to system administration and testing
of the machine, including select users running jobs for the purpose of
testing, debugging, or benchmarking the system. For instance, users with
existing code may be specifically running large jobs to test a new
system or a new system configuration;
that is, it is not just the actual system administrator running
such jobs. Such efforts will be coordinated by the chair of the user
committee in collaboration with DoIT and the user committee. We
anticipate that such activity is limited to the initial phase of the
machine or after significant changes in, e.g., hardware or software.
Technical details of the scheduling system are discussed on the
scheduling rules page.
Once the principles on this page are understood, users should refer
to that page and the how to run tutorial
for further information on proper usage of the cluster.
Obligation of All Users to Help Maintain the Facility
This machine has been created by financial and ideal support both from
faculty and from UMBC. To ensure the long-term existence of this
facility, all users have an obligation to help actively to sustain it.
This obligation has financial and scientific (non-financial) aspects,
and support for both aspects is required from all users to maintain
their accounts on the systems. The requirements includes the following
methods of support:
Each user must provide a title and abstract for all research
projects conducted on the facility's machines. Different projects should
have each their own information. This information will be posted on the
facility's webpage to demonstrate the uses.
Each user is required to provide information on outcomes of the
research conducted on the facility's machines. This includes both
information on papers submitted and published and on presentations
given. We are happy to post PDF files of papers or presentations on the
facility's webpage or point a link to another webpage.
Each user must acknowledge the use of this facility, for instance,
in papers and presentations.
The short from of the acknowledgment is "HPCF, NSF, UMBC".
For a full paragraph version, see Supporting Materials
under the Research tab on this webpage.
Each user (or the sponsoring PI, if the user is not a faculty
member) must be willing to participate as co-PI or co-investigator in
future grant proposals. This implies a willingness to supply short
descriptions of the research and its results and to provide the
necessary information for grant proposals (bio sketch, current/pending
support, and similar), when requested.
Each user is required to include budget requests for computational
resources in individual grant proposals. The support requested should be
commensurate with the amount of resource typically used; the cost per
node for contributing users above is a guide for the cost. To support
such efforts, we are ready to help with your proposal, including
drafting text, acting as co-PI/co-investigator, supplying a support
letter, or whatever way is suitable.
Contact the chair of the user committee early enough
before your proposal due date to work out details.
A standard phraseology for budget justifications is available
under Supporting Materials in the Research tab on this webpage.
All users including principal investigators must confirm when
requested that they and their research group still require the account
on the facility's machines. Specifically, at the beginning of every Fall
semester, all accounts will be reviewed to determine if they should be
continued. The purpose is to avoid large numbers of inactive accounts.
This facility is not suitable for long-term data storage; users are
required to move their data off the machine at the completion of
projects. An account cannot be kept open solely for the purpose of
access to data on the machine.
Users who wish to continue their account on the system are required
to supply proofs of outcomes of the usage of the machine, including for
instance publications, presentations, preprints, grant proposals
including funding requests for nodes on the machine. Users are required
to submit such proofs continuously throughout the year, but also
specifically at the time of account review at the beginning of the Fall
semester. If no information is received upon request or there was no
effort to help maintain the facility, the user's account including all
accounts sponsored by the faculty member will be suspended and/or their
priority of usage reduced. To help with the documentation of research
results, we provide at part of this webpage a Publications page where
technical reports of results can be posted as well as webpages for each
project, where publications and presentations of the research can be
posted throughout the year.
The philosophy adopted here is one of granting an account on this
facility first and then requiring help in maintaining it, as opposed to
requiring up-front payment to use the facility or on-going charge-backs.
This approach allows
researchers to start using the facility immediately at any point in the
year and to obtain initial research results using it. In turn and using
these results, it is then necessary for users to actively demonstrate
results as well as to search for funding to sustain the facility.
Effectiveness of these Rules
The HPCF Governance Committee approved these usage rules and
the associated implementation in the scheduling rules in Spring 2011.
It is subject to periodic review and future changes.