Hello Euler Users,
I've just put out the beta test for the next-generation Euler software configuration that will be deployed to the whole cluster in mid-August. Please use the
research-ng partition to test that your jobs will work on the new platform.
Answers to some anticipated questions are available below. Please reach out to
euler-...@engr.wisc.edu if you have further questions or concerns.
Regards,
Colin Vanden Heuvel
Why is this needed?
Some of you may have already discovered this, but there are a number of nodes (about ⅓ of the cluster) are running a different operating system than the login nodes. This was the result of a critical firmware bug on certain hardware configurations which rendered
it incompatible with the update. For security reasons, we couldn't revert to the old configuration and a second "fallback" operating system was chosen instead.
The mismatched configurations have, predictably, caused a lot of problems. That is why we've been working diligently over the last month to find a new platform which is more stable than the current one.
When will this take effect?
This configuration will be deployed in a rolling update to all Euler nodes in mid-August (probably the week of August 11th, exact date TBA). It will not require the same amount of downtime as previous maintenance periods, and thanks to the cooperation of hardware
owners, some nodes will already be configured and ready to go in advance of the deadline.
How can users prepare for the change?
Please test your software before the deadline by running some jobs on the
research-ng partition (i.e.
#SBATCH -p research-ng). If you run into any problems, let us know at
euler-...@engr.wisc.edu so we can figure out what needs to be fixed before the full deployment in August. If you don't run into any problems, or if the update fixes a problem found
in the current generation of Euler software, we'd love to know that as well.
Also, if you are a member of a lab which owns hardware on Euler, please have your PI reach out to volunteer one or more of your nodes for the beta test. If we have enough participation, we will be able to roll out the new configuration with just a few minutes
of downtime next month.
What is changing from the current generation to the next?
Euler currently uses Fedora 42, which serves as an upstream testbed for RedHat Enterprise Linux (RHEL). This provides us with cutting-edge hardware support and access to the most recent software features. However, Fedora requires frequent major upgrades which
can have breaking consequences as we saw in this year's Spring Maintenance.
The next-gen version is built on openSUSE Leap 15.6, which derives its source code from Suse Linux Enterprise (SLE) and the Suse community's testbed distribution, openSUSE Tumbleweed. Unlike RedHat variants, Leap supports a vast number of official and unofficial
backports from Tumbleweed, allowing it to accommodate new hardware and software while still maintaining the security and stability of a long-term support Linux distribution.
A number of system packages are likely to change versions as a result of this change. Some will be newer than the current version and some older, depending on whether recent updates have been backported to openSUSE Leap. Notably, the default compiler and python
versions will be much older than what are available on Fedora, but several alternative versions will be installed in parallel with the default, allowing compatibility with a wider variety of software. The ability to maintain multiple toolchains at once, and
for a longer amount of time, will also obviate the need to rebuild user software each year.