Lustre 2.15

25 views
Skip to first unread message

Zee Palmer

unread,
Jul 30, 2024, 10:49:45 PM7/30/24
to tripjordoru

From the Lustre project page in the Lustre issue tracker, we can see a dashboard for the project. On that page under the "Versions: Unreleased" section, there are links to future releases and all of the issues that are either targeted to be addressed in that release, or are already addressed for the release. Here is a shortcut to Lustre 2.15.0's issue summary page:

Any issue that contains "Lustre 2.15.0" in the "Fix Version/s" field of a ticket is intended to be fixed for version 2.15.0. The Lustre 2.15.0 Issues list is fluid, and will be updated on a continual basis as resources and priorities change.

The main purpose of this post is not to explain what Lustre is, but to give an up-to-date document for installing and configuring the current latest Lustre version (2.15.4) on supported platforms RHEL 8.9 for servers and Ubuntu 22.04 for clients. It can be used as a basic guide for production use but I have not installed or managed Lustre in production and I am not touching any production specific topics in this post.

The reason I wrote this post is because the documentation I found for installing Lustre is usually old. For example, the installation page on wiki mentions RHEL 7 which is not supported anymore. The Lustre 101 introdutory course created by Oak Ridge Leadership Computing Facility is great, but it is also created more than 5 years ago.

I think the installation steps on the installation page on wiki should slightly be modified for RHEL 8.9 for installing servers. In this post, I give my version of installation steps of Lustre servers on RHEL 8.9.

It is not the main purpose of this post, but if you know nothing about Lustre, I should give a brief introduction for this post to make some sense. If you are familiar with Lustre, you can skip this section.

It is common in HPC environments, but because it can handle very large capacities with high performance, I guess it would not be surprising to see Lustre anywhere large datasets are read or written from a large number of clients.

It means the MGS is shared by all the file systems, whereas a particular MDS or OSS is only for one particular file system. Since the metadata is separated from the actual data, all metadata operations are performed by MDS, whereas actual data operations are performed by OSS.

Lustre on its own does not provide any redundancy. The targets have to provide redundancy on their own using a hardware or a software solution. For high availability or redundancy of the servers, cluster pairs (failover) are required.

Similar to striped RAID/ZFS, a file in Lustre is striped across OSTs, depending on stripe_count and stripe_size options. These can be set individually for each file (or the default values are inherited). Each file is striped to stripe_count OSTs and each part in OST has stripe_size bytes.

Lustre supports LDISKFS (which is based on ext4) and ZFS. LDISKFS support requires a Lustre patched kernel, whereas ZFS support can be added with modules (without patching the kernel). It is possible to install Lustre supporting both or only one of them.

I first installed Lustre with LDISKFS support only. It is not difficult (probably easier than ZFS support), and it might be easier to work with if you are not used to ZFS. However, it requires a patched kernel. So, you cannot freely update the kernel.

ZFS support does not require a patched kernel, and ZFS has more features. The problem is, Lustre installation page describes installing ZFS as a DKMS package. This makes sense, so the kernel can be updated. However, RHEL 8.9 does not by default support DKMS. So, epel-release has to be installed separately. I think it is also possible to install ZFS as kmod but it is not mentioned in the installation, so I am not sure if it is supported or not.

I decided to continue using ZFS, so I show the Lustre servers installation steps below supporting only ZFS. If you want to install LDISKFS support, it is not difficult, just skip the ZFS installation steps, and install the Lustre patched kernel and LDISKFS packages as described in the installation page on wiki.

Static IPv4: Lustre servers require a static IPv4 (and IPv6 is not supported). If using DHCP, setup a static IP configuration (for example with nmcli) and reboot to be sure the IP configuration is working fine. Make sure the hostname resolves to the IP address, not to loopback address (127.0.0.1). If required, add the hostname with the static IP to /etc/hosts.

Identity Management: In a cluster, all user and group IDs (UIDs and GIDs) has to be consistent. I am using an LDAP server installed on another server in my home lab and using sssd with ldap in the cluster including Lustre servers. For a very simple demonstration, you can just add the same users and groups to all servers and clients with the same UIDs and GIDs.

In the next step (Install ZFS packages), kernel-devel package will be installed. If you install a different kernel version now, make sure you also install the correct kernel-devel and related (kernel-headers etc.) packages.

Not a must for a demonstration, but to protect ZFS zpools to be simultaneously imported on multiple servers, a persistent hostid is required. Simply run gethostid, this will create /etc/hostid if it does not exist.

The Installing the Lustre Software page also lists lustre-resource-agents package to be installed. This package depends on resource-agents package which is not available in my RHEL 8.9 installation. I do not know if this is an error or this is changed with RHEL 8.9 or resource-agents is related to HA installations and my subscriptions might not be covering this. Anyway, it should not matter for the purpose of this post.

Configuring Lustre file systems is not difficult but a number of commands have to be executed (think about creating each target, mounting them etc.). To simplify this, I wrote lustre-utils.sh which is available on lustre-utils.sh@github. It is a simple tool that executes multiple commands to create and remove Lustre file systems and starting and stopping the corresponding Lustre servers.

The targets are created by running lvcreate to create the logical volume and mkfs.lustre to create the actual target (which, for ZFS, implicitly calls zpool and zfs to create ZFS pool and dataset). The targets are mounted by mount -t lustre. You can see all the parameters in lustre-utils.sh.

I wonder what happens if I install linux-image-generic-hwe-22.04 giving kernel version 6.5. I installed it and then DKMS requested to install linux-headers-6.5.0-27-generic which I did and this triggered DKMS compilation. However it resulted with the following:

The error points to a function called prandom_u32_max which exists in kernel version 5.15, and it seems to be removed in kernel version 6.1. So, it is not possible to build the client module yet on kernel version 6.1+. I do not know about kernel version 6.0.

Until today, you could set and enforce user- and group-level storage consumption using user quotas and group quotas. With project quotas, you can also set and enforce storage limits based on the number of files or storage capacity consumed by a specific project. You can set a hard limit to prevent projects from consuming additional storage after exceeding their project quota, or set a soft limit that provides users with a grace period to complete their workloads before converting into a hard limit.

Support for project quotas is now available at no additional cost on all Amazon FSx for Lustre file systems running on Lustre version 2.15. For more information about this new feature, see FSx for Lustre documentation.

In this article, you learn how to download and install a Lustre client package. Once installed, you can set up client VMs and attach them to an Azure Managed Lustre cluster. Select an operating system version to see the instructions.

If you want to upgrade only the kernel and not all packages, you must, at minimum, also upgrade the amlfs-lustre-client metapackage in order for the Lustre client to continue to work after the reboot. The command should look similar to the following example:

Ubuntu 18.04 LTS reached the end of Standard Support on May 31, 2023. Microsoft recommends either migrating to the next Ubuntu LTS release or upgrading to Ubuntu Pro to gain access to extended security and maintenance from Canonical. For more information, see the announcement.

The following command installs a metapackage that keeps the version of Lustre aligned with the installed kernel. For this to work, you must use apt full-upgrade instead of apt upgrade when updating your system.

Optionally, if you want to upgrade only the kernel (and not all packages), you must, at minimum, also upgrade the amlfs-lustre-client metapackage in order for the Lustre client to continue to work after the reboot. The command should look similar to the following example:

93ddb68554
Reply all
Reply to author
Forward
0 new messages