Aft-f file system tool reports MD1, but MD1 does not appear in Raid tool

148 views
Skip to first unread message

W. Tango Foxtrot

unread,
Nov 18, 2021, 11:19:23 AM11/18/21
to Alt-F
Details of this process in separate thread, but for simplicity the background overview is:

This post is about a possible bug in Alt-f discovered on a  migrated raid set: (failed RAID 1 disk replaced, new disk restructured, new (degraded) raid created, data copied, old RAID destroyed,  New raid rebuilt.)

After the above process, a no longer existing Raid (MD1) appears in the  Alt-f "Filesystem maintenance" tool, but does not appear in the "Raid Tool".  Further, a partition that exists on one disk (sda2) does not appear in the partition tool.

I have tried using mdadm to remove the MD1, but it is not found.

Here are the details.

First the screens from Alt-f:

Notice in this first image Device md1 is shown, and device sda2 is missing.
Screenshot (1213).png
Notice in this Raid menu image, md1 is missing:
Screenshot (1214).png
Notice in this partition tables on both drives sdx1, 2, and 3 all exist:
Screenshot (1215).pngScreenshot (1216).png

So there is some error/bug which shows md1 as existing, but it cannot be removed, and i surmise that is why sda2 is also not visible (as it is "in" md1)  [like sda2 and sdb2 do  not appear since they are components of md0]

At the cli i tried to remove md1 and the superblocks assocated with it, but was not sucessful and the results are inconsistent.

first the dev list where both md0 and md1 are shown:

[root@NF-NAS1]# ls /dev
blkid.tab           loop7               mtd5                ptyp3               sg0
blkid.tab.old       md                  mtd5ro              ptyp4               sg1
console             md0                 mtdblock0           ptyp5               tty
cpu_dma_latency     md1                 mtdblock1           ptyp6               ttyS0
device              mdev.seq            mtdblock2           ptyp7               ttyS1
event0              mem                 mtdblock3           ram0                ttyp0
full                memory_bandwidth    mtdblock4           ram1                ttyp1
kmsg                mtd0                mtdblock5           random              ttyp2
log                 mtd0ro              network_latency     rtc0                ttyp3
loop-control        mtd1                network_throughput  sda                 ttyp4
loop0               mtd1ro              null                sda1                ttyp5
loop1               mtd2                port                sda2                ttyp6
loop2               mtd2ro              ptmx                sda3                ttyp7
loop3               mtd3                pts                 sdb                 urandom
loop4               mtd3ro              ptyp0               sdb1                zero
loop5               mtd4                ptyp1               sdb2
loop6               mtd4ro              ptyp2               sdb3

Next, i try to get details of md1:

[root@NF-NAS1]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sdb3[2] sda3[0]
      1952763532 blocks super 1.0 [2/2] [UU]
      bitmap: 0/15 pages [0KB], 65536KB chunk

unused devices: <none>

This tells me it doesn't exist.  BUT it appeared in /dev/

So nube that i am try to remove it (google help) stop, remove, check:

[root@NF-NAS1]# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
[root@NF-NAS1]# mdadm --remove /dev/md1
[root@NF-NAS1]# mdadm --remove /dev/md1
[root@NF-NAS1]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sdb3[2] sda3[0]
      1952763532 blocks super 1.0 [2/2] [UU]
      bitmap: 0/15 pages [0KB], 65536KB chunk

unused devices: <none>
[root@NF-NAS1]# mdadm --zero-superblock /dev/sda2
mdadm: Unrecognised md component device - /dev/sda2

mdadm reports that md1 stopped,  i use the remove command which either worked or failed silently, i tried it twice.  I check mdstat, and like before it reports no md1, but a cat of /dev/ shows it still there:

[root@NF-NAS1]# ls /dev
blkid.tab           loop7               mtd5                ptyp3               sg0
blkid.tab.old       md                  mtd5ro              ptyp4               sg1
console             md0                 mtdblock0           ptyp5               tty
cpu_dma_latency     md1                 mtdblock1           ptyp6               ttyS0
device              mdev.seq            mtdblock2           ptyp7               ttyS1
event0              mem                 mtdblock3           ram0                ttyp0
full                memory_bandwidth    mtdblock4           ram1                ttyp1
kmsg                mtd0                mtdblock5           random              ttyp2
log                 mtd0ro              network_latency     rtc0                ttyp3
loop-control        mtd1                network_throughput  sda                 ttyp4
loop0               mtd1ro              null                sda1                ttyp5
loop1               mtd2                port                sda2                ttyp6
loop2               mtd2ro              ptmx                sda3                ttyp7
loop3               mtd3                pts                 sdb                 urandom
loop4               mtd3ro              ptyp0               sdb1                zero
loop5               mtd4                ptyp1               sdb2
loop6               mtd4ro              ptyp2               sdb3

So maybe the remove failed because md1 is "busy" like something mounted to it, so i check for that:

[root@NF-NAS1]# mount
tmpfs on /rootmnt type tmpfs (rw,relatime)
/dev/root on /rootmnt/ro type squashfs (ro,relatime)
aufs on / type aufs (rw,relatime,si=247513ee)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
tmpfs on /tmp type tmpfs (rw,relatime,size=152576k)
devpts on /dev/pts type devpts (rw,relatime,mode=600)
/dev/loop0 on /rootmnt/sqimage type squashfs (ro,relatime)
/dev/md0 on /mnt/md0 type ext4 (rw,relatime,data=ordered)
/dev/sdb2 on /mnt/sdb2 type ext4 (rw,relatime,data=ordered)

So i have failed.  My goal is simply to be able to access sda2.

Would any of you more experienced gods please help this nube?
Thank you

W. Tango Foxtrot

unread,
Nov 18, 2021, 1:27:46 PM11/18/21
to Alt-F
I decided to wipe the drive in question and start from scratch.  This has resolved the mysterious md1 issue for me.  This experience however shows that there is some kind of bug or inconsistency in the alt-f software or it's underlying linux build.  I hope the information above helps developers resolve the bug/issue such that other users won't end up in the same spot.

Joao Cardoso

unread,
Nov 21, 2021, 11:19:41 PM11/21/21
to Alt-F
Hi,

Yes, you found some misbehavior in linux hotpluging device names handling and mdadm . They will propagate up to Alt-F.

The Filesystems webUI is known (to me) for suffering from browser cache effects, as re-displaying the page sometime leads to different info being displayed. If you are expecting something and you don't see it, or you see something that you don't expect, you have to re-display the page :-) That deserves attention.

But the worse if the existence of ghost device names. md1 exists as a device node but not as a valid md device. That's why it is displayed in the Filesystems webUI with no FS, but not in the RAID webUI. Notice also that md1 is the suggested device name for a new RAID array, meaning that it does not exists (mdstat also shows that).

You are right, sda2 isn't displayed in Filesystems, because it belongs to md1 (as swap). But how, if md1 does not exists as a md array? The mdadm (and RAID webui) Examine (array) and (component) Details could give more insights, but it is now academic.
The only certain is the existence of md device nodes that don't have a valid md array.

If faced again with the issue with valuable data on the array, use "mdadm --examine /dev/component_name" and "mdadm --detail /dev/md_name" to try to figure it out. mdstat is also invaluable. But all software has bugs (not Alt-F, in this case).

PS-the "device busy" error when manipulating md arrays is generally due to a component being already auto-mounted as an ext2/3/4 filesystem, of belonging to another md array, or LVM, or other higher level block device.
"ls  /sys/block/md_device/slaves/" and "ls  /sys/block/md_device/holders/" gives some insights about that. You can use it also with bottom base devices, as in "ls /sys/block/sda/sda2/holders".

Thanks for the report and apologies for the late reply.
João
Reply all
Reply to author
Forward
0 new messages