festus

The cluster “festus” (btrzx24) went into operation in January 2025. It consists of two management nodes, one virtualization server, two login nodes, several storage servers and 74 compute nodes which are connected by an 100G Infiniband Interprocess- and a 25G Sericenetwork. “festus” uses Slurm (24.11) as resource manager. The ITS file server (e.g., the ITS home directory) is not mounted on the cluster for performance reasons, every users has a separate home directory (10GB) which lies on the clusters own nfs-server.

Acknowledging festus / Publications

As with other DFG-funded projects, results must be made available to the general public in an appropriate manner. The publications must contain a reference to the DFG funding (so-called “Funding Acknowledgement”) in the language of the publication, stating the project number.

Whenever the festus has been used to produce results used in a publication or posters, we kindly request citing the service in the acknowledgements:

Calculations were performed using the festus-cluster of the 
Bayreuth Centre for High Performance Computing (https://www.bzhpc.uni-bayreuth.de),
funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 523317330.

Whereby the funding acknowledgement is mandatory.

The login nodes of festus will be accessible with ssh via festus.hpc.uni-bayreuth.de only from university networks. If you are outside the university or using wifi (BayernWLAN,eduroam), a VPN connection is required. If your login shell is (t)csh or ksh, you have to change it to bash or zsh in the ITS self-service portal.

Compute nodes

CPUHighMemNVidiaInstinctMini NVidiaMini Instinct

62 compute servers (“typA”)

2x AMD EPYC 9554 64c CPU (max. 3.75GHz, 128 cores total)
24x 16GB RAM (384GB total)
480 GB NVMe

5 compute servers (“typB”)

2x AMD EPYC 9684X 96c CPU (max 3.42GHz, 192 cores total)
24x 64GB RAM (1536GB total)
480 GB NVMe

1 compute server (“typC”)

2x INTEL® XEON® Platinum 8480+ 56c CPU (max. 3.8GHz, 112 cores total)
16x 128GB RAM (2048GB)
480 GB + 14TB nvme
4x NVIDIA H100

1 compute server (“typD”)

2x INTEL® XEON® Platinum 8480+ 56c CPU (max. 3.8GHz, 112 cores total)
16x 128GB RAM (2048GB)
480 GB + 14TB nvme
4x AMD MI210

3 compute server (“typE”)

2x AMD EPYC 9554 64c CPU (max. 3.75GHz, 128 cores total)
24x 16GB RAM (384GB total)
~3.84TB NVMe
2x NVIDIA L40

2 compute server (“typF”)

2x AMD EPYC 9554 64c CPU (max. 3.75GHz, 128 cores total)
24x 16GB RAM (384GB total)
~3.84TB NVMe
2x AMD MI210

Partitions

Priotities are calculated with slurm’s Multifactor Priority Plugin. Where the groups/accounts financial share and consumed resources are most weighted.

normalGPUdev

Wall time: 8 hours (default), 24 hours (max)
Nodes: typA,B,E,F-nodes

Wall time: 8 hours (default), 24 hours (max)
Nodes: typC,D-nodes

Wall time: 15 Minutes (default), 90 Minutes (max)
Restrictions: max 2 nodes per job

Network

Infiniband (100 Gbit/s)
Ethernet (25 Gbit/s)

User file space

NFS file systemBeeGFSnode local (/tmp)

Every data inside /workdir and /scratch has limited lifetime and neither a Backup nor snapshots. Start with /workdir (NFS) if /scratch (BeeGFS) is not needed.

/groups/org-id: Group directory (only for groups financially involved in the cluster)
/home: 10GB per User
/workdir: ~70TB
- data lifetime: max 60 days
- path to your personal workdir: $WORKDIR (/workdir/bt-identifier)

Every data inside /workdir and /scratch has limited lifetime and neither a Backup nor snapshots. Start with /workdir (NFS) if /scratch (BeeGFS) is not needed.

/scratch
- data lifetime: 10 days
- path to your personal scratchdir: $SCRATCH (/scratch/bt-identifier)
- default stripe pattern: 2ST x 512KB
  - large File IO Jobs may profit from change these settings, please contact for support

Warning

Use this system for MPIIO, parallel hdf5 and/or IO heavy POSIX workloads. If you don’t know whether you need to you use /scratch or not, try /workdir first!

If you log on with ssh to a node on which you are running a job, you will not be able to access the same /tmp or /dev/shm as your job!

typA/B: ~200GB
typC/D: ~14TB
typE/F: ~3.84TB

Commissioning & Extension

November 2024

Resource Manager & Scheduler

Slurm 25.05

Operating system

RHEL9.6 / RockyLinux 9.6

festus

Acknowledging festus / Publications

Login

Compute nodes

Partitions

Network

User file space

Commissioning & Extension

Resource Manager & Scheduler

Operating system