Typhon V7

0 views
Skip to first unread message

Demetrius Dade

unread,
Aug 3, 2024, 6:09:31 PM8/3/24
to omjochamhy

Overview
For large parallel computations and batch jobs, IAS has a 64 node beowulf cluster, named Typhon. Each node has quad 24 core 64-bit Intel Cascade Lake processors, providing a total of of 6144 processor cores. Each node has 384 GB RAM (4 GB/core). For low-latency message passing, all the nodes are interconnected using HDR100 infiniband.

All nodes mount the same /home and /data filesystems as the other computers in SNS. Scratch space locations have been tweaked to help identify local vs network resources. /scratch/lustre is the new mount point for the parallel file system and /scratch/local/ will be for any system local storage.

Login Nodes
The primary login nodes, typhon-login1 and typhon-login2, should be used for interactive work such as compiling programs and submitting jobs. Please remember that these are shared resources for all users.

Access to /data, /home/ and /scratch file systems are available on all login and cluster nodes.

All nodes have access to our parallel filesystem through /scratch/lustre.
600GB of local scratch is available on each node in /scratch/local/.

Job Scheduling
The cluster determines job scheduling and priority using Fair Share. This is a score determined per user based on past usage; the more jobs that you run the lower your score will temporarily be.

The current maximum allowed time is 168 hours or 7 days. Users needing to run jobs for longer than the maximum time window should add the capability to utilize restart files into their jobs so that they comply with these limits.

c80f0f1006
Reply all
Reply to author
Forward
0 new messages