Euler Maintenace - Final Stages

21 views
Skip to first unread message

Euler Sysadmin

unread,
Jun 4, 2021, 11:53:55 PMJun 4
to euler...@g-groups.wisc.edu
Hello Euler Users,

As of today, Euler is ready for users to start migrating files back onto cluster storage. Other features are coming online very quickly; I will append the current status of the cluster to the end of my emails until everything is functional.

New account credentials will be emailed to the address saved on your account starting tomorrow (June 5th). If you do not receive your credentials by Monday (June 8th), please contact sysa...@sbel.wisc.edu as there may have been an error in your saved email address.

Once Euler is operating at around 80% of full capacity, I will start updating the documentation to reflect the new architecture of the cluster. If anyone would like to help with the process of creating this documentation, they are welcome to reach out to sysa...@sbel.wisc.edu.

Thank you, everyone, for your ongoing patience with this project. I have put in as much work as I can physically handle in order to get Euler ready for you as quickly as possible, and many of my colleagues have been helping whenever they can. I know that everyone involved is proud of their work and I hope you can all appreciate the new cluster capabilities as they begin to come online.

Regards and all my Best Wishes,
Colin Vanden Heuvel


---

Current Cluster Status:

Only one of the two configured login nodes is ready for use; the hostname euler.wacc.wisc.edu will not always redirect to a valid server. It is currently recommended that users connect directly to the working login server at euler-login-2.wacc.wisc.edu.

Euler's network is currently operating at around 50% of its full capacity. This limited capacity is still 5x greater than it was before the update; just know there is more to come.

1/3 of the total cluster nodes, representing the majority of the file storage backend and half of the available GPUs, are currently online. More GPUs will be configured over the weekend and will be available by Monday.

The modules system is not yet online, but some essential modules (such as CUDA) will become available before Monday.

The Slurm scheduler is online and accepting jobs, but due to some technical limitations, interactive jobs using srun are guaranteed to fail. This is a low-priority issue which will be remedied in the future, but it will be deferred until the more essential cluster components are available again.

The set of software currently installed on the nodes is fairly minimal. If you believe that a particular program or utility should be available by default, please contact the sysadmin by email.



Reply all
Reply to author
Forward
0 new messages