Partition | Description | Walltime limits |
---|---|---|
develop | There are six nodes in the develop partition: n1, n2, n70, n112, n156 and n196. This partition is dedicated to code under development. Jobs using many cores may be tested, but run time is supposed to be negligible. | 5 min default, 30 min max |
develop-mic | There is one node, with 2 Intel Phi in this develop partition: maya-usr2. This partition is dedicated to Intel Phi code under development. Jobs using many cores may be tested, but run time is supposed to be negligible. | 5 min default, 30 min max |
batch | The majority of the compute nodes on maya are allocated to this partition. There are 229 nodes: n3, ..., n69, n71, ..., n111, n113, ..., n153, n157, ..., n195, n197, ..., n237. Jobs running on these nodes are considered "production" runs; users should have a high degree of confidence that bugs have been worked out. | 5 day maximum |
prod | The majority of the compute nodes on maya are allocated to this partition. There are 162 nodes: n71, ..., n111, n113, ..., n153, n157, ..., n195, n197, ..., n237. Jobs running on these nodes are considered "long production" runs. Contibuting members can use this partition in conjunction with the long_contrib QOS for runs less than 45 days. The long_prod QOS should be used with this partition. | 45 day maximum |
mic | The nodes with two Intel Phi each on maya are allocated to this partition. There are 18 nodes each with 2 mic cards: n34, ..., n51. Jobs running on these nodes are considered "production" runs; users should have a high degree of confidence that bugs have been worked out. | 5 day maximum |
QOS | Wall time limit per job | CPU time limit per job | Total number of cores limit for the QOS | Number of cores limit per user |
---|---|---|---|---|
short | 1 hour | 1024 hours | --- | --- |
normal (default) | 4 hours | 1024 hours | --- | 256 |
medium | 24 hours | 1024 hours | 1536 | 256 |
long | 5 days | --- | 256 | 16 |
long_contrib | 5 days | --- | 768 | 128 |
long_prod | 45 days | --- | 64 | --- |
support | --- | --- | --- | --- |
Number of nodes | Cores per node | Total number of cores | Wall time (hours) | CPU time (hours) |
---|---|---|---|---|
64 | 16 | 1024 | 1 | 1024 |
32 | 16 | 512 | 2 | 1024 |
16 | 16 | 256 | 4 | 1024 |
8 | 16 | 128 | 8 | 1024 |
4 | 16 | 64 | 16 | 1024 |
2 | 16 | 32 | 32 | 1024 |
1 | 16 | 16 | 64 | 1024 |
1 | 8 | 8 | 128 | 1024 |
1 | 4 | 4 | 256 | 1024 |
1 | 2 | 2 | 512 | 1024 |
1 | 1 | 1 | 1024 | 1024 |
[araim1@maya-usr1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Job has invalid qos [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Job violates accounting policy (job submit limit, user's size and/or time limits) [araim1@maya-usr1 ~]$
[araim1@slurm-dev ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Job violates accounting policy (job submit limit, user's size and/or time limits) [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES QOS NODELIST(REASON) 4278 batch users01 araim1 PD 0:00 30 normal (AssociationResourceLimit) 4277 batch users01 araim1 R 2:54 2 normal n[7-8] [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ cat slurm.err slurmd[n1]: error: *** JOB 59545 CANCELLED AT 2011-05-20T08:10:52 DUE TO TIME LIMIT *** [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ cat slurm.err slurmd[n3]: *** JOB 4254 CANCELLED AT 2011-05-27T19:42:14 DUE TO TIME LIMIT *** slurmd[n3]: *** STEP 4254.0 CANCELLED AT 2011-05-27T19:42:14 DUE TO TIME LIMIT *** [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Job violates accounting policy job submit limit, user's size and/or time limits) [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Invalid account specified [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Job has invalid qos [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Invalid partition name specified [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Requested node configuration is not available [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Node count specification invalid [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ sbatch run.slurm sbatch: unrecognized option `--ndoes=2' sbatch: error: Try "sbatch --help" for more information [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ cat slurm.err slurmd[n1]: error: Job 60204 exceeded 10240 KB memory limit, being killed slurmd[n1]: error: *** JOB 60204 CANCELLED AT 2011-05-27T19:34:34 *** [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Requested node configuration is not available [araim1@maya-usr1 ~]$
[araim1@maya-usr1 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES QOS NODELIST(REASON) 62280 develop SNOW araim1 PD 0:00 1 normal (PartitionTimeLimit) [araim1@maya-usr1 ~]$