AI/ML A100 GPU nodes available on Tinkercliffs; followup maintenance for Infer

1 view
Skip to first unread message

ARC

unread,
Jul 13, 2021, 1:07:26 PM7/13/21
to arc_u...@vt.edu
Four high-density GPU nodes have been added to the Tinkercliffs cluster. These nodes are each equipped with 8x NVIDIA A100-80G GPUs, adding 32 leading-edge GPUs to the Virginia Tech\'s research computing resource pool which should serve as excellent accelerators for demanding AI/ML and HPC workloads. These nodes are provisionally accessible via the a100_normal_q partition on Tinkercliffs. We are continuing to add software and test some configuration options and invite your feedback on this. All are welcome to connect and try them out, but be aware that their configuration may be subject to changes in the coming weeks.

The maintenance outage for Infer requires a followup maintenance which will be tomorrow morning, Wednesday July 14 4:00am - 8:00am to complete the network maintenance. The cluster has been released from reservation for anyone who can run jobs which can complete before 4:00am tomorrow. Scheduled jobs which cannot complete before then will continue to be held and will resume when the maintenance is completed.

To serve GPU needs in the meantime, the V100 nodes on Cascades may also be considered in addition to the A100s on Tinkercliffs mentioned above.

Thanks for your understanding. We will work to keep the outage as brief as possible.

Best regards,

Advanced Research Computing

Reply all
Reply to author
Forward
0 new messages