[euler-users] Announcements: Higher-bandwidth File Transfers / End of support for VS Code

40 views
Skip to first unread message

Colin Vanden Heuvel

unread,
Oct 30, 2023, 1:47:04 PM10/30/23
to 'Colin Vanden Heuvel' via euler-users
Hello Euler users,

I have two announcements for you today.

First:

Euler's Globus endpoint is officially ready for production use and is now the recommended way to move data on and off of Euler's filesystem. Globus is up to 10x faster than rsync or sftp on the login nodes, supports scheduled and automated transfers, and is interoperable with any Globus endpoint in the world including DoIT's ResearchDrive.

Search for the "Euler Home Directories" collection to get started. More detailed instructions will be available in the CAE KB early this week.


Second:

I regret to inform you that, due to stability concerns, Visual Studio Code remote sessions will no longer be permitted on Euler beginning on Thursday, November 9th 2023. I know that it is a popular workflow and many users will be affected, so I will do my best to preemptively answer any questions below this message. There are a few other IDEs which have similar anti-features that might be restricted in the future, so I suggest reading section [3.] below. If you're looking for a replacement workflow, you can discuss options with your peers and share your insights on the Euler Q&A site [https://euler-answers.cae.wisc.edu/]. Some folks have already started a good discussion on the topic, and it would be great to have even more input.


Regards,
Colin Vanden Heuvel



P.S.
Explanation of the ban on Visual Studio Code (and software with similar impact on multi-user systems)

1. Overview

Visual Studio Code (henceforth "VSCode") is a powerful tool which is designed to provide an customizable environment for developers on Single-User systems (workstations, laptops, etc.) to do software development. It works very well for that, and many users justifiably want to be able to reproduce that workflow on remote systems as well. When those remote systems are also Single-User environments, it continues to be quite effective. However, its capabilities start to become hindrances when introduced to multi-user environments.

The remote server application scales with the availability of system resources on said system, and can occupy a significant CPU/memory footprint, even so much as to starve out resources needed by other users. Marketplace extensions download and install arbitrary scripts and binaries which are difficult for administrators to verify. Live scanning of projects, which would normally be stored on a local disk, is a problem for networked filesystems and can starve other applications which are using those files at the same time. Even the editor's build and debugging tool integration will consume as much of the processing power available on the system as it can, without concern for other users.

Euler is a supercomputer, and it is optimized to be used as one. Just as it isn't reasonable to use a small workstation for high-performance code, it isn't always reasonable to use Euler for everyday development. There are other systems that are more suitable for a highly-integrated development environment, and they should be used as such. By using these tools more optimally, we will be able to ensure that the specialized compute resources on Euler can be shared more equitably across the College of Engineering community.

In short, these problems are intrinsic to the design of VSCode and they can be quite harmful in their impact on any shared resources, including a system like Euler.


2. VSCode Anti-features

VSCode has a number of features which are problematic for a managed multi-user environment like Euler. These are sometimes referred to as anti-features since they are design decisions which are intentional, but undesirable for a variety of use-cases.

i. Live Filesystem Indexing

VSCode uses some relatively demanding filesystem features to watch for changed files on disk. This feature allows it to build an extremely fast cache for opening files as well as to keep track of symbols in those files for linting/debugging. It even runs regular checks to see whether a file has differed from the version tracked by git. On a system with a low-latency local disk where a file is unlikely to be open in two places at once, the impact is almost unnoticeable. Euler's network filesystem, on the other hand, exposes several problems.
a. The increased latency of a network system causes these features to take longer to execute and increases system load.
b. Each read/write lock on a file or directory requires synchronization with every single node which is accessing that file or directory. This places a lot of load on the filesystem, especially when users leave their sessions running while executing jobs on compute nodes. Each time a file changes, the VSCode instance is notified that it needs to reindex the file, increasing lock contention substantially.
c. This impact is multiplied when users have projects with large HPC-sized data sets. For example, the largest project I've seen in use by VSCode has over 700k files, which introduces no small amount of overhead.

ii. Extensions

While extensions are incredibly useful for providing editor features, situations where users are able to install and run unverified programs can easily introduce instability and compromise security. VSCode extensions, much like the main application, are often built on scripts which assume full use of the underlying system resources, which is not the case for a system like Euler. Some extensions introduce computationally-demanding background tasks which run without the user's notice. Others include telemetry which collects unknown data about the system and sends it out to the internet (see 2.iv below).

iii. Persistent Sessions

In order to load more quickly when users leave and reconnect for a second time, VSCode's remote service continues operating even after a user has disconnected their SSH session. Because VSCode generates one processing thread for each logical CPU core on the system (that's 40 threads per login node), and does so for the users extensions as well, Euler's hundreds of users have the potential to be running tens of thousands of tasks at once, all on the login nodes which are meant to be shared between users. When a user connects and all of those tasks become active, the burden can be quite noticeable.

iv. Telemetry

VSCode (and many of its extensions) offer the option to send data to the publisher in order to "improve" the software. Many of these programs have the behavior enabled by default and a great number of those fail to notify the user that it exists at all. While individual users have the option to consent to this type of collection on their own devices, if these services collect information about the activity of other users on the system, it would be of grave concern. As it is not feasible for the UW to enter into a contract to protect user information with the publisher of each and every extension, we must take measures to prevent unauthorized data collection.


3. Other Restricted Software

Visual Studio Code is not the only piece of software which has the problems listed above.

For the November 9th deadline, Visual Studio Code AND any forks or derivatives built from the same code base such as VSCodium, Code-OSS, or extensions designed to emulate the same behavior, will no longer be allowed on Euler.

There are a few other development platforms which provide similar problems. For example, JetBrains, JetBrains Remote Development, and derivatives such as PyCharm are known to cause the same types of problems as VSCode. Those platforms are deprecated on Euler effective immediately and are likely to be removed in the future as well.

Further platforms may be added to the list if they are found to have the same issues described above. Users are encouraged to contact euler-...@engr.wisc.edu if they are unable to determine for themselves if a tool they would like to use is going to misbehave.


4. Blocking Mechanism

For VSCode (and derivatives), the remote server process can be prevented from launching on Euler's login nodes using the Linux kernel's built-in security features. This is a fairly heavy-handed approach, but is similar to the behavior of common antivirus platforms. The ability to work around the block shouldn't be taken as permission to run a particular implementation. Users are still expected to ensure that they use Euler responsibly.

Please note that while we will not block users from running deprecated or discouraged software until after the announced deadline has passed, any such applications may be force-closed by an administrator if they begin to cause noticeable performance degradation on Euler.
Reply all
Reply to author
Forward
0 new messages