[euler-users] Filesystem Issues

Skip to first unread message

Euler Sysadmin

Dec 16, 2019, 12:11:05 PM12/16/19
to euler...@g-groups.wisc.edu
Euler is experiencing a problem with its filesystem. I'm aware of the problem and am working on a solution.

Thanks to everyone who notified me of the issue.


Euler Sysadmin

Dec 16, 2019, 1:22:59 PM12/16/19
to euler...@g-groups.wisc.edu
I have a temporary fix in place which I hope will hold out until after the new year. This fix was suggested to me by a developer who works on the filesystem software used by Euler, so it seems promising.

Should these issues persist, a more permanent solution may require additional downtime at some point during the winter break.

If you have any questions or concerns, please feel free to reach out to me.

Colin Vanden Heuvel
You received this message because you are subscribed to the Google Groups "euler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to euler-users...@g-groups.wisc.edu.
To view this discussion on the web visit https://groups.google.com/a/g-groups.wisc.edu/d/msgid/euler-users/b29035cd-0f2f-4c81-9929-d6827ecc907a%40sbel.wisc.edu.

Euler Sysadmin

Dec 16, 2019, 4:17:56 PM12/16/19
to euler...@g-groups.wisc.edu
Unfortunately, it looks like the earlier fix won't be sufficient. I will need to take Euler offline to fix the problem, I will notify this mailing list when it is back.

- Colin

Euler Sysadmin

Dec 18, 2019, 1:39:16 AM12/18/19
to euler...@g-groups.wisc.edu
After running an (agonizingly slow) analysis of Euler's filesystem structures, I managed to come up with a plan for how to deal with the ongoing issue. Euler should be available again for a while starting at some point during Wednesday afternoon. I will send out another update once it is restored.


Euler Sysadmin

Dec 19, 2019, 12:51:37 AM12/19/19
to 'Euler Sysadmin' via euler-users
Euler's file server metadata has been restored from backup and logins will be allowed again shortly.

The backup contains a functional copy of the metadata, but it is not a complete copy. The following data was not restored:
- Symbolic links
- Data which was being written at the time of the crash
- Data which was already corrupted (due to network issues or other corruption)
- Valid data which was contained within a corrupted directory

Colin will be attempting more drastic mitigations over the winter break in the hope of preventing this issue in the future. With that in mind, Euler's Job scheduler will be kept offline until the mitigations are complete, and additional downtime may be required to complete the implementation.

Your continued patience will help the changes go smoothly; it is much appreciated.

Thank you.

From: 'Euler Sysadmin' via euler-users <euler...@g-groups.wisc.edu>
Sent: Wednesday, December 18, 2019 12:39 AM
To: euler...@g-groups.wisc.edu <euler...@g-groups.wisc.edu>
Subject: [euler-users] Filesystem Issues
Reply all
Reply to author
0 new messages