Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Can't list root directory

18 views
Skip to first unread message

Gary Dale

unread,
Jan 29, 2024, 12:20:05 PMJan 29
to
I'm running Debian/Trixie on an AMD64 workstation. I've lost the ability
to see the root directory even when I am logged in as root (su -).

This has been happening intermittently for several months. I initially
thought it might be related to failing NVME drive that was part of a
RAID1 array that is mounted as "/" but I replaced the device and the
problem is still happening.

I had been able to fix it by booting to SystemRescue and running an fsck
on the device but it didn't work this time. The device checks out OK
(even when using fsck -/dev/mdx -f) but I still can't list the root. "ls
-l /" just hangs, as do any attempts to see the root directory in a
graphical file manager. In dolphin this means there is nothing in the
folders - and since that is the default starting point I have to
manually enter a folder name (e.g. /home/me) in the location bar to be
able to see anything - but even then the folders panel remains empty.

Even running commands like df -h hang because they can't access the root
folder. However the system is otherwise running normally.

Strangely, in the past simply booting to a rescue shell then exiting
would also work. I'd usually try to do an fsck on the raid device but
that would always fail because it was mounted.

The only thing I noticed that was unusual was I rebooted after
installing the latest Trixie updates this morning. That took about 10
minutes to shut down - 6 of which were spent waiting for a drkonqi
process to finish. There was also a systemd message really late in the
shutdown about /dev/md0 but that's not the root device.

I'm used to Linux taking its time to shutdown lately so I don't think
this was related. The systemd shutdown just seems to be easily delayed.

Any ideas on how I can restore my ability to see the root directory?

to...@tuxteam.de

unread,
Jan 29, 2024, 12:50:07 PMJan 29
to
On Mon, Jan 29, 2024 at 11:42:14AM -0500, Gary Dale wrote:
> I'm running Debian/Trixie on an AMD64 workstation. I've lost the ability to
> see the root directory even when I am logged in as root (su -).
>
> This has been happening intermittently for several months. I initially
> thought it might be related to failing NVME drive that was part of a RAID1
> array that is mounted as "/" but I replaced the device and the problem is
> still happening.

[...]

Anything mounted below / whose block device is taking its time?
Maybe a network device?

What does mount say?

Cheers
--
t
signature.asc

Hans

unread,
Jan 29, 2024, 1:00:07 PMJan 29
to
Hi Gary,

before loosing any data, I suggest, to boot from a liuvefile linux. Please use
a modern livefile like Knoppix or Kali-Linux.

If it is not a BIOS problem, you should see the device again and are able to
mount it. If /root is on a seperated partition, you can do some filesystem
checks, like e2fsck or else.

Ans: Most important, with a livefile system you can mount an external harddrive
and backup all files. Thus , even when the /dev/nvme*** is died or partly
broken, you can maybe restore /root on another partition.

Second: Please check ACL, although I do not believe the reason for these, it
is worth to look at this. Maybe you or someone else has chenged it accidently.

Third idea: Is the harddrive full? In the past I has the problem, not to be
able to do anything. The reason: My harddrive was completely full (some
temporary file was the reason). Deleting this big file was the trick.

Just some ideas, maybe it could help.

Good luck!

Best

Hans

hw

unread,
Jan 30, 2024, 4:00:06 PMJan 30
to
On Mon, 2024-01-29 at 11:42 -0500, Gary Dale wrote:
> I'm running Debian/Trixie on an AMD64 workstation. I've lost the ability
> to see the root directory even when I am logged in as root (su -).
>
> This has been happening intermittently for several months. I initially
> thought it might be related to failing NVME drive that was part of a
> RAID1 array that is mounted as "/" but I replaced the device and the
> problem is still happening.
> [...]

What happens when you put the device you replaced back?

Gary Dale

unread,
Jan 31, 2024, 9:30:06 AMJan 31
to
How could putting a known-failing device back in help? The problem
existed before I replaced it and continues to exist after the replacement.

Gary Dale

unread,
Jan 31, 2024, 9:40:10 AMJan 31
to
OK, got it working again. I used tune2fs to do an fsck on every boot.
This being an NVME device, it's barely noticeable.

Gary Dale

unread,
Jan 31, 2024, 9:50:06 AMJan 31
to
There is no problem seeing the root folder when I boot from a live distro.

fsck never finds any significant issue.

An ACL issue would be permanent. This comes and goes.

I actually doubled the size of the root device when I put in the new
NVME drive. When I set up the RAID array, I'd bought a 500G second drive
to mirror the 256G original drive. When I replaced the 256G drive, I was
able to expand the array to 500G (less a small amount for the EFI
partition). The partition has lots of free space.

As I said, running an fsck seems to fix the issue temporarily. I now run

The Wanderer

unread,
Jan 31, 2024, 10:00:06 AMJan 31
to
On 2024-01-29 at 11:42, Gary Dale wrote:

> I'm running Debian/Trixie on an AMD64 workstation. I've lost the
> ability to see the root directory even when I am logged in as root
> (su -).
>
> This has been happening intermittently for several months. I
> initially thought it might be related to failing NVME drive that was
> part of a RAID1 array that is mounted as "/" but I replaced the
> device and the problem is still happening.
>
> I had been able to fix it by booting to SystemRescue and running an
> fsck on the device but it didn't work this time. The device checks
> out OK (even when using fsck -/dev/mdx -f) but I still can't list the
> root. "ls -l /" just hangs, as do any attempts to see the root
> directory in a graphical file manager. In dolphin this means there is
> nothing in the folders - and since that is the default starting point
> I have to manually enter a folder name (e.g. /home/me) in the
> location bar to be able to see anything - but even then the folders
> panel remains empty.
>
> Even running commands like df -h hang because they can't access the
> root folder. However the system is otherwise running normally.

I'm not sure it'll help lead to anything, but out of curiosity and/or as
a possible diagnostic: when the problem is manifesting, what happens if
you run 'stat /'? Does it report data (similar to what you'd get from
'stat' on another directory), or does it hang, or give errors, or...?

My thought is that this will give information about the filesystem
object that is the root directory, without trying to also access
information about the *contents* of that directory. If the one succeeds
where the other fails, that might help narrow down where the actual
issue is.

--
The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw

signature.asc

Max Nikulin

unread,
Jan 31, 2024, 12:10:06 PMJan 31
to
On 29/01/2024 23:42, Gary Dale wrote:
> "ls -l /" just hangs

It may dereference symlinks, call stat, etc. to colorize output. May it
happen that you have automount points or something related to network
mounts?

Does "echo /*" hangs?

Even bash prompt may do some funny stuff. I would try it from "dash".

Can you install strace? E.g. copy files while booted from a live media.

hw

unread,
Jan 31, 2024, 4:30:05 PMJan 31
to
It sounded like you were able to list the root directory (at least
sometimes) before you did the replacement. Manually failing the
device (perhaps after adding it back first) could make a difference.

I've seen such indefinite hangs only when an NFS share has become
unreachable after it had been mounted. You could use clonezilla to
make a copy and then perhaps convert the file system to btrfs.

Do you still have the problem when you remove one of the NVME storage
things? Perhaps you have the equivivalent of a bad SATA cable or the
mainboard doesn't like it when you access two of those at the same
time, or something like that. Even simple network cables can behave
very strangely, and NVME may be a bit more complicated than that.

Running fsck on every boot to work around an issue like this is
certainly a bad idea. Doesn't fsck report anything? If it really
makes a difference in itself rather than creating some side effect
that leads to the root directory being readable, it should report
something. Perhaps you need to increase its verbosity.

If there's no report then it would look like a side effect and raise
the question what side effect it might be. Does fsck run before the
RAID has been brought up or after? Is the RAID up when booting is
completed? What does mdadm say about the device(s)? Can you still
list the root directory when you manually fail either drive? What
exactly are the circumstances under which you can and not list the
root directory?

You need to do some investigating and ask questions like those ...

Loren M. Lang

unread,
Feb 1, 2024, 2:50:06 AMFeb 1
to
Also, instead of doing "ls -l /" which will stat() every child folder under root, try "/bin/ls -f /" and see if that is successful. That will only do a readdir() on root itself. Also, it might be interesting to get a log of "strace ls -l /" to confirm exactly where the hang happens.

-Loren

--
Sent from my Nexus 4 with K-9 Mail. Please excuse my brevity.

Gary Dale

unread,
Feb 2, 2024, 9:50:07 AMFeb 2
to
Thanks everyone for the suggestions. I'll retune the array to not fsck
every boot and see if the problem recurs so I can try your suggestions.

Gary Dale

unread,
Feb 17, 2024, 11:10:06 AMFeb 17
to
Thanks loren. /bin/ls -l works. The strace shows the hang is on
/keybase. The strace did a really bad hang - ctrlC wouldn't kill it.
I've set the fsck count to 1 again, so I can reboot and take a look at it.
0 new messages