Xfs_repair Device Or Resource Busy

171 views
Skip to first unread message

Garoa Wolff

unread,
Jul 24, 2024, 11:36:46 AM7/24/24
to tsennacorna

Originally created by Silicon Graphics, the XFS file system is a robust and high-performance journaling filesystem that was first included in the Linux kernel in 2001. Since then, the popularity of the filesystem has grown exponentially, and by 2014, the XFS filesystem found its way into major Linux distributions. As a matter of fact, XFS is the default filesystem in Red Hat- based distributions such as RHEL, CentOS, and Rocky Linux. The filesystem works incredibly well with huge files and is popularly known for its speed and robustness.

As robust as the XFS filesystem is, it is not immune to suffering filesystem corruption. Common causes of filesystem errors or corruption include un-procedural or ungraceful shutdowns, NFS write errors, sudden power outages and hardware failure such as bad blocks on the drive. Corruption of the filesystem can cause grave problems such as corruption of regular files and can even render your system unable to boot when boot files are affected.A few tools are useful in checking filesystem errors. One of them is the fsck command (Filesystem Check). The fsck system utility tool verifies the overall health of a filesystem. It checks the filesystem for potential and existing errors and repairs them alongside generating a report. The fsck command comes pre-installled in most Linux distributions and no installation is required. Another useful system utility used for rectifying errors in a filesystem is the xfs_repair utility. The utility is highly scalable and is tailored to scan and repair huge filesystems with several inodes with the highest possible efficiency.

xfs_repair device or resource busy


Download Ziphttps://geags.com/2zKXFa



In this guide, we walk you through how to repair corrupted XFS filesystem using the xfs_repair utility.Step 1) Simulate File corruptionTo make the most of this tutorial, we are going to simulate file system corruption of an XFS filesystem. Here we are going to use an 8GB external USB drive as our block volume. This is indicated as /dev/sdb1 as shown in the command below.

The next step is to create a mount point that we shall later use to mount the block volume.$ sudo mkdir /mnt/dataNext, mount the partition using the mount command.$ sudo mount /dev/sdb1 /mnt/dataYou can verify if the partition was correctly mounted as shown.

Our partition is now successfully mounted as an xfs partition. Next, we are going to simulate filesystem corruption by trashing random filesystem metadata blocks using the xfs_db command.But before that, we need to unmount the partition.

$ sudo xfs_repair /dev/deviceBut before we embark on repairing the filesystem, we can perform a dry run using the -n flag as shown. A dry run provides a peek into the actions that will be performed by the command when is it executed.$ sudo xfs_repair -n /dev/device

For our case, this translates to:$ sudo xfs_repair -n /dev/sdb1From the output, we can see some metadata errors and inode inconsistencies. The command terminates with a brief summary of the steps the actual command would have carried out. The corrective measures that would have been applied in steps 6 and 7 have been skipped.

To perform the actual repair of the XFS filesystem, we will execute the xfs_repair command without the -n option$ sudo xfs_repair /dev/sdb1The command detects the errors and inconsistencies in the filesystem.

For more xfs_repair options visit the man page.$ man xfs_repairConclusionThat was a demonstration of how you can repair corrupted xfs filesystem using the xfs_repair command. We hope that you are now confident in fixing the corrupted xfs filesystem in Linux.

Excellent - seems to point to something. The following is dmesg output from the host (I can no longer get into the container). Load started to increase at this point according to my monitoring and this Kernel error seems to relate to the Plex process in some way!

Looking into my media disk (denoted in the pictures above), it is possibly in a bit of a bad state. I can mount/unmount without issue it but trying to run xfs_repair indicates that the device is busy when unmounted.

It is the same Kernel Oops though; NULL pointer dereference. A quick Google search of a few different elements of the dmesg logs led me to BUG: kernel NULL pointer dereference, address: 0000000000000008 Issue #10642 openzfs/zfs GitHub which is identical and another reporter has the same problem with Plex.

This design document is split into seven parts.Part 1 defines what fsck tools are and the motivations for writing a new one.Parts 2 and 3 present a high level overview of how online fsck process worksand how it is tested to ensure correct functionality.Part 4 discusses the user interface and the intended usage modes of the newprogram.Parts 5 and 6 show off the high level components and how they fit together, andthen present case studies of how each repair function actually works.Part 7 sums up what has been discussed so far and speculates about what elsemight be built atop online fsck.

Metadata directly supporting these functions (e.g. files, directories, spacemappings) are sometimes called primary metadata.Secondary metadata (e.g. reverse mapping and directory parent pointers) supportoperations internal to the filesystem, such as internal consistency checkingand reorganization.Summary metadata, as the name implies, condense information contained inprimary metadata for performance reasons.

The filesystem check (fsck) tool examines all the metadata in a filesystemto look for errors.In addition to looking for obvious metadata corruptions, fsck alsocross-references different types of metadata records with each other to lookfor inconsistencies.People do not like losing data, so most fsck tools also contains some abilityto correct any problems found.As a word of caution -- the primary goal of most Linux fsck tools is to restorethe filesystem metadata to a consistent state, not to maximize the datarecovered.That precedent will not be challenged here.

Filesystems of the 20th century generally lacked any redundancy in the ondiskformat, which means that fsck can only respond to errors by erasing files untilerrors are no longer detected.More recent filesystem designs contain enough redundancy in their metadata thatit is now possible to regenerate data structures when non-catastrophic errorsoccur; this capability aids both strategies.

System administrators avoid data loss by increasing the number ofseparate storage systems through the creation of backups; and they avoiddowntime by increasing the redundancy of each storage system through thecreation of RAID arrays.fsck tools address only the first problem.

Code is posted to the kernel.org git trees as follows:kernel changes,userspace changes, andQA test changes.Each kernel patchset adding an online repair function will use the same branchname across the kernel, xfsprogs, and fstests git repos.

The first program, xfs_check, was created as part of the XFS debugger(xfs_db) and can only be used with unmounted filesystems.It walks all metadata in the filesystem looking for inconsistencies in themetadata, though it lacks any ability to repair what it finds.Due to its high memory requirements and inability to repair things, thisprogram is now deprecated and will not be discussed further.

The second program, xfs_repair, was created to be faster and more robustthan the first program.Like its predecessor, it can only be used with unmounted filesystems.It uses extent-based in-memory data structures to reduce memory consumption,and tries to schedule readahead IO appropriately to reduce I/O waiting timewhile it scans the metadata of the entire filesystem.The most important feature of this tool is its ability to respond toinconsistencies in file metadata and directory tree by erasing things as neededto eliminate problems.Space usage metadata are rebuilt from the observed file metadata.

User programs suddenly lose access to the filesystem when unexpectedshutdowns occur as a result of silent corruptions in the metadata.These occur unpredictably and often without warning.

Data owners cannot check the integrity of their stored data withoutreading all of it.This may expose them to substantial billing costs when a linear media scanperformed by the storage system administrator might suffice.

System administrators cannot schedule a maintenance window to dealwith corruptions if they lack the means to assess filesystem healthwhile the filesystem is online.

This new third program has three components: an in-kernel facility to checkmetadata, an in-kernel facility to repair metadata, and a userspace driverprogram to drive fsck activity on a live filesystem.xfs_scrub is the name of the driver program.The rest of this document presents the goals and use cases of the new fscktool, describes its major design points in connection to those goals, anddiscusses the similarities and differences with existing tools.

The naming hierarchy is broken up into objects known as directories and filesand the physical space is split into pieces known as allocation groups.Sharding enables better performance on highly parallel systems and helps tocontain the damage when corruptions occur.The division of the filesystem into principal objects (allocation groups andinodes) means that there are ample opportunities to perform targeted checks andrepairs on a subset of the filesystem.

While this is going on, other parts continue processing IO requests.Even if a piece of filesystem metadata can only be regenerated by scanning theentire system, the scan can still be done in the background while other fileoperations continue.

In summary, online fsck takes advantage of resource sharding and redundantmetadata to enable targeted checking and repair operations while the systemis running.This capability will be coupled to automatic system management so thatautonomous self-healing of XFS maximizes service availability.

Because it is necessary for online fsck to lock and scan live metadata objects,online fsck consists of three separate code components.The first is the userspace driver program xfs_scrub, which is responsiblefor identifying individual metadata items, scheduling work items for them,reacting to the outcomes appropriately, and reporting results to the systemadministrator.The second and third are in the kernel, which implements functions to checkand repair each type of online fsck work item.

ff7609af8f
Reply all
Reply to author
Forward
0 new messages