Software RAID won't persist after shutdown

403 views
Skip to first unread message

ace402

unread,
Feb 10, 2021, 11:27:26 AM2/10/21
to esos-users
Hi, I'm new to ESOS and trying to learn so perhaps I have just missed something. I have what I believe is a simple use case and it's not working as I expected.

I have ESOS 3.0.6 running in a VM, it has a bunch of virtual shared disks attached. Using the TUI, I do Software RAID - Add Array - Select the first 4 sequential disks (sda, sdb, sdc, sdd) - Name "array10" - RAID level "raid10" - Chunk size "64k"

I wait a bit (10gb disks), checking the status using Software RAID - Linux MD Status. When the progress completes it says:

md127: active raid10 sda[3] sdc[2] sdb[1] sdd[0]
20953088 blocks super 1.2 64k chunks 2 near-copies [4/4] [UUUU]

Now I sync config using System - Sync Configuration. This has worked for everything else so far, like password, date/time, network, colour.

I do Interface - Exit to Shell. In the shell, regardless of whether I do "poweroff" or "reboot", when ESOS comes back, the RAID array is gone!

I did notice that after "poweroff" I see these messages:

stopping MD RAID arrays...
mdadm: stopped /dev/md/array10

But when ESOS boots back up and I log in, Software RAID - Linux MD Status no longer shows an array. I expected that my array would persist, especially since I did a simple case using the TUI. Did I miss a step or is this intended? In the meantime, I will attempt to do a persistent md array using CLI.

ace402

unread,
Feb 10, 2021, 4:50:57 PM2/10/21
to esos-users
I am new to mdadm, but here are some basic findings:

After creating the RAID 10 array in the TUI, /etc/mdadm.conf is not updated. It seems this is a necessary step as per other guides I found, but updating it manually does not fix the issue - the update to /etc/mdadm.conf survives the shutdown, but /proc/mdstat does not.

Manually running mdadm --create doesn't help either.

Interestingly, if I create a zfs zpool, it survives poweroff and reboot.

Steve Jones

unread,
Feb 10, 2021, 6:33:46 PM2/10/21
to esos-...@googlegroups.com
It's been a long time since I've done anything w/ mdadm, and I've
never done anything w/ mdadm and ESOS in the same box (always just
done raw block storage on ESOS and redundancy in software on the
hosts) but I remember even on a raw fresh install linux box having
issues like this w/ mdadm. I never really looked into it, but it
scared me away from mdadm completely. I really hope someone can help
solve your problem.. I'll be lurking to hopefully resolve this years
long nagging problem in the back of my brain!
> --
> You received this message because you are subscribed to the Google Groups "esos-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to esos-users+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/esos-users/aa8ab34b-4556-49d6-8998-5664e6f040e7n%40googlegroups.com.

Marc Smith

unread,
Feb 11, 2021, 11:15:38 AM2/11/21
to esos-...@googlegroups.com
Andrew: Can you share your '/var/log/boot' file? In start() of
'rc.mdraid' it uses "mdadm --assemble --scan" which should
scan/assemble using all visible block devices.

--Marc
> To view this discussion on the web visit https://groups.google.com/d/msgid/esos-users/CALjE-6ZHf4_cg08jc0GaZTVekewLYt%3DL19_-aDxiME2mF3fUoA%40mail.gmail.com.

Andrew K

unread,
Feb 11, 2021, 12:08:58 PM2/11/21
to esos-users
Hi Marc,

Here is the file, thanks for your consideration
Message has been deleted

Andrew K

unread,
Feb 11, 2021, 12:11:24 PM2/11/21
to esos-users
Having some trouble attaching the file, maybe this will work.
boot.txt

Alnitak

unread,
Feb 12, 2021, 4:54:57 AM2/12/21
to esos-...@googlegroups.com
You might try using fdisk to set the partition type to linux raid. 0xfd I believe.

Andrew K

unread,
Feb 12, 2021, 11:19:43 AM2/12/21
to esos-users
I tried this, created the RAID10 partition with mdadm via TUI as originally described

Dropped to CLI and typed: fdisk /dev/md127

Entered commend: t (to change partition type)

It says: No partition is defined yet! (Can't continue)

I should note that originally, before I discovered the earliest point where this workflow failed, I had created the RAID10 array, created a PV on top of that, created a VG on top of that, and created an LV on top of that (all with LVM via TUI), and on reboot it had disappeared, I think because that lowest level array disappears.

Danilo Godec

unread,
Feb 12, 2021, 11:30:28 AM2/12/21
to esos-...@googlegroups.com
You need to create partitions on the physical disks and these need to be of the type linux raid ('fd').

     Danilo

Andrew K

unread,
Feb 12, 2021, 2:11:56 PM2/12/21
to esos-users
Hi Danilo, it's my impression that an LVM Physical Volume (PV) is a partition. Even after I create the PV on top of the array, fdisk does not see it as a valid partition. I wonder if this is another symptom of an underlying issue?

Alnitak

unread,
Feb 12, 2021, 11:44:51 PM2/12/21
to esos-...@googlegroups.com
An lvm PV can be just about any block device, not just partitions. 

The way I would typically do this would be create a partition on each disk (sd1a,b1,c1,etc), mark them as partition type fd, create the raid(s), then use those md devices for lvm if desired. 

Though I've had issues in the past mixing lvm and mdraid. Usually if I intend to use lvm, I just use lvm to do the raid.

If you use them both and the initrd and/or the services load the wrong one first, your raid won't be detected because one will be dependent on the other being detected first.

Andrew K

unread,
Feb 15, 2021, 2:16:49 PM2/15/21
to esos-users
Hey Alnitak,

Thank you for your guidance. I'm very pleased to report that your suggestion worked!

I cleared all LVM structures and wiped all my disks to start from scratch.

I started my process by using fdisk to create partitions that spanned the entire disk, e.g.

fdisk /dev/sda (start interactive mode)
n (new partition)
p (primary)
1 (partition #)
<blank> (default behaviour, use first byte at start boundary)
<blank> (default behaviour, use last byte as end boundary)
w (write)

I did this for 4 disks. I then created the mdadm raid 10 array on top of these 4 partitions:

mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm --detail --scan | tee -a /etc/mdadm/mdadm.conf

Then using the TUI, Sync Configuration

After poweroff, and booting back up, md0 has survived! I also checked that it survives reboot.

So, as you suggested, the issue for me was creating the array using whole disks. I'm not sure if there is something specific to my setup that caused this (using ESXi shared virtual disks), but if this is reproducible for all setups, I would recommend addressing these issues in the TUI:
  • When creating an mdadm array with the TUI (Software RAID -> Add array) it only allows you to use whole disks. I tried to use the TUI after my fdisk steps, but the list only had whole disks, not the partitions
  • After successfully creating the persistent mdadm array as described above, the TUI is in a weird state, because if i do (Software RAID -> Remove array), it says "No running md arrays were detected". But, if I do (LVM -> Add Physical Volume) it shows md0 in the list
  • Online guides for creating mdadm arrays suggest that mdadm.conf should be updated with the new array. However, the TUI doesn't do this. But, I did verify that the array is persistent even if mdadm.conf is left not updated

Thanks again!
Reply all
Reply to author
Forward
0 new messages