Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

jessie won't install/boot on a Dell Poweredge R815

190 views
Skip to first unread message

Jeffrey Mark Siskind

unread,
Jun 19, 2016, 3:10:04 PM6/19/16
to
I am attempting to install jessie on a Dell Poweredge R815. It has been
running wheezy reliably for years. And running squeeze reliably for years
before that. But no matter what I try it won't install or boot.

I have tried two ways.

1. I attempt a fresh install from a USB dongle. It gets all the way to
installing grub and then fails.

2. I do a fresh install of wheezy from a USB dongle. It boots wheezy just fine.
I do nothing but

nano /etc/apt/sources.list
(change all instances of wheezy to jessie, save, and exit)
apt-get update
apt-get dist-upgrade
(It upgrades without error. I answer the default to all questions.)
/sbin/reboot

Then it fails to reboot and goes into the initramfs. I have a picture of
the screen if anybody wishes.

I can reliably install and run wheezy over and over. I have not been able to
install or boot jessie despite numerous attempts.

Any suggestions?

Jeff (http://engineering.purdue.edu/~qobi)

Jan Bakuwel

unread,
Jun 19, 2016, 5:20:04 PM6/19/16
to
Hi Jeffrey,
Two things come to mind, one being potential lack of disc space. I think
Jessie needs more than Wheezy if you selected the "standard utilities"
or whatever it's called (bottom line) when you're asked what to install.
I use a "rescue/boot manager" partition for many of my systems, which
only function is to chainload one of a few other operating systems. That
way I don't have to throw away my old boots before I try the new.
Installing Jessie on that 1G partition is only possible if the only
thing I select during install is the SSH server.

The other thing you may want to have a look at is the output on tty4
(Alt F4), perhaps that reveals why grub is not able to finish.

cheers,
Jan

Ramon Diaz-Uriarte

unread,
Jun 19, 2016, 6:10:03 PM6/19/16
to

Dear All,

It is my understanding that both systemd per se from v227 and plymouth
will cache passwords[1]. However, there is no caching of LUKS passwords in
my setting, a laptop with two encrypted partitions, corresponding to root
and swap, and where both share the passphrase.

I am using systemd 230-2 and plymouth 0.9.2-3+b1 and running kernel
linux-image-4.6.0-1-amd64 (kernel 4.5 behaves the same way). Trying with or
without plymouth makes no difference (i.e., I am always asked for both
passwords).

I wonder if there is something I need to set/unset, or if I need to create
some (which?) script in /etc/systemd/system.

My /etc/crypttab is

crypt-sda5 UUID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx none luks
crypt-sda2 UUID=yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy none luks


And my /etc/fstab is

proc /proc proc defaults 0 0
UUID=zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz /boot ext3 defaults 0 2
/dev/mapper/crypt-sda5 / ext4 errors=remount-ro,user_xattr 0 1
/dev/mapper/crypt-sda2 none swap sw 0 0


Best,



[1] Changes in v227:
https://lists.freedesktop.org/archives/systemd-devel/2015-October/034509.html,
or for instance the step-by-step instructions on setting full disk
encryption at htps://thesimplecomputer.info/full-disk-encryption-with-ubuntu

--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdi...@gmail.com
ramon...@iib.uam.es

http://ligarto.org/rdiaz

deloptes

unread,
Jun 20, 2016, 2:10:03 AM6/20/16
to
Jeffrey Mark Siskind wrote:

> I am attempting to install jessie on a Dell Poweredge R815. It has been
> running wheezy reliably for years. And running squeeze reliably for years
> before that. But no matter what I try it won't install or boot.
>

why is an upgrade not an option?

Ramon Diaz-Uriarte

unread,
Jun 20, 2016, 4:00:03 AM6/20/16
to
Thank you. Yes, I replied with the idea of removing the reference but
... obviously I didn't. I'll wait a couple of days before trying again
(I don't want to look like a spammer).

Best,

On Mon, Jun 20, 2016 at 1:54 AM, David Wright <deb...@lionunicorn.co.uk> wrote:
> (off-list)
>
> On Sun 19 Jun 2016 at 23:43:37 (+0200), Ramon Diaz-Uriarte wrote:
>>
>> Dear All,
> [..]
>
> Some people may miss this posting because they're not interested
> in "jessie won't install/boot on a Dell Poweredge R815" under
> which subject you've threaded it.
>
> It's the header line
> References: <E1bEhme-...@upplysingaoflun.ecn.purdue.edu>
> that's doing you no favours.
>
> Cheers,
> David.



--
Ramon Diaz-Uriarte
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
Phone: +34-91-732-8000 ext. 3019
Fax: +-34-91-224-6972

Sven Hartge

unread,
Jun 20, 2016, 4:50:04 AM6/20/16
to
Upgrade to what? He wants to install Jessie, you can't get a newer
stable Debian than that.

Grüße,
Sven.

--
Sigmentation fault. Core dumped.

Michael Lange

unread,
Jun 20, 2016, 7:20:04 AM6/20/16
to
On Mon, 20 Jun 2016 10:43:35 +0200
Sven Hartge <sv...@svenhartge.de> wrote:

> deloptes <delo...@gmail.com> wrote:
> > Jeffrey Mark Siskind wrote:
>
> >> I am attempting to install jessie on a Dell Poweredge R815. It has
> >> been running wheezy reliably for years. And running squeeze reliably
> >> for years before that. But no matter what I try it won't install or
> >> boot.
>
> > why is an upgrade not an option?
>
> Upgrade to what? He wants to install Jessie, you can't get a newer
> stable Debian than that.

I guess he meant a dist-upgrade from an installed wheezy to jessie, if
jessie won't do a fresh install.

Regards

Michael


.-.. .. ...- . .-.. --- -. --. .- -. -.. .--. .-. --- ... .--. . .-.

He's dead, Jim.
-- McCoy, "The Devil in the Dark", stardate 3196.1

Brian

unread,
Jun 20, 2016, 7:40:04 AM6/20/16
to
On Mon 20 Jun 2016 at 13:06:30 +0200, Michael Lange wrote:

> On Mon, 20 Jun 2016 10:43:35 +0200
> Sven Hartge <sv...@svenhartge.de> wrote:
>
> > deloptes <delo...@gmail.com> wrote:
> > > Jeffrey Mark Siskind wrote:
> >
> > >> I am attempting to install jessie on a Dell Poweredge R815. It has
> > >> been running wheezy reliably for years. And running squeeze reliably
> > >> for years before that. But no matter what I try it won't install or
> > >> boot.
> >
> > > why is an upgrade not an option?
> >
> > Upgrade to what? He wants to install Jessie, you can't get a newer
> > stable Debian than that.
>
> I guess he meant a dist-upgrade from an installed wheezy to jessie, if
> jessie won't do a fresh install.

I think the OP's attempt at a dist-upgrade was described in item 2 of
his first mail.

Michael Lange

unread,
Jun 20, 2016, 8:20:04 AM6/20/16
to
On Mon, 20 Jun 2016 12:39:34 +0100
Brian <ad...@cityscape.co.uk> wrote:

> > > > why is an upgrade not an option?
> > >
> > > Upgrade to what? He wants to install Jessie, you can't get a newer
> > > stable Debian than that.
> >
> > I guess he meant a dist-upgrade from an installed wheezy to jessie, if
> > jessie won't do a fresh install.
>
> I think the OP's attempt at a dist-upgrade was described in item 2 of
> his first mail.

Oh, yes, sure (^.^);

Ok, then a few questions to the OP that come to mind, since we still don't
seem to know why the jessie boot actually fails:
Did you try to boot different kernel versions, does the old kernel from
wheezy also fail to boot (just to rule out a problem of jessie's kernel
with that particular machine)?
What's the contents of your sources.list file? Sometimes I myself
experienced problems with a dist-upgrade when mirrors like backports or
multimedia where active during that process. (ok, since a fresh install
seems to fail also, that's probably not the issue here)
Maybe you could post the exact error messages that show up during the
failed boot?

Regards

Michael

.-.. .. ...- . .-.. --- -. --. .- -. -.. .--. .-. --- ... .--. . .-.

Death. Destruction. Disease. Horror. That's what war is all about.
That's what makes it a thing to be avoided.
-- Kirk, "A Taste of Armageddon", stardate 3193.0

Don Armstrong

unread,
Jun 20, 2016, 2:10:04 PM6/20/16
to
On Sun, 19 Jun 2016, Jeffrey Mark Siskind wrote:
> 2. I do a fresh install of wheezy from a USB dongle. It boots wheezy just fine.
> I do nothing but
>
> nano /etc/apt/sources.list
> (change all instances of wheezy to jessie, save, and exit)
> apt-get update
> apt-get dist-upgrade
> (It upgrades without error. I answer the default to all questions.)
> /sbin/reboot
>
> Then it fails to reboot and goes into the initramfs. I have a picture of
> the screen if anybody wishes.

It would be useful to see that screen (or better, the console output
as text directly from the DRAC in an e-mail.)

I'm guessing this is a "cannot find root filesystem" issue; it's also
possible that you're missing the appropriate driver for however the
disks are attached to that R815.

--
Don Armstrong https://www.donarmstrong.com

Life would be way easier
if I were easier.
-- a softer world #473
http://www.asofterworld.com/index.php?id=473

deloptes

unread,
Jun 20, 2016, 3:00:04 PM6/20/16
to
The problem is the kernel and some other changes that cause troubles.

Upgrade usually is done by

apt-get update
apt-get upgrade
apt-get dist-upgrade

I failed today to upgrade wheezy to jessie on raided system as well.

The kernel/initramfs is the key to this and perhaps eliminate systemd first
time booting after the upgrade.

In the initramfs shell I usually
1. check if disks are found (might be /dev/[hs]d* are missing.
2. mount the root partition (in the example to dir called new) and
3. run

cd /new
exec /usr/sbin/chroot . /bin/sh <<- EOF >dev/console 2>&1
exec /sbin/init ${CMDLINE}
EOF

4. when system is up update initram
update-initramfs

This magic worked always

It is a bit more complicated if you use raid, lvm and luks, but still it
comes to the magic at the end

I hope this helps

regards

Brian

unread,
Jun 20, 2016, 4:20:05 PM6/20/16
to
On Mon 20 Jun 2016 at 20:53:55 +0200, deloptes wrote:

> Brian wrote:
>
> > On Mon 20 Jun 2016 at 13:06:30 +0200, Michael Lange wrote:
> >
> >> On Mon, 20 Jun 2016 10:43:35 +0200
> >> Sven Hartge <sv...@svenhartge.de> wrote:
> >>
> >> > deloptes <delo...@gmail.com> wrote:
> >> > > Jeffrey Mark Siskind wrote:
> >> >
> >> > >> I am attempting to install jessie on a Dell Poweredge R815. It has
> >> > >> been running wheezy reliably for years. And running squeeze reliably
> >> > >> for years before that. But no matter what I try it won't install or
> >> > >> boot.
> >> >
> >> > > why is an upgrade not an option?
> >> >
> >> > Upgrade to what? He wants to install Jessie, you can't get a newer
> >> > stable Debian than that.
> >>
> >> I guess he meant a dist-upgrade from an installed wheezy to jessie, if
> >> jessie won't do a fresh install.
> >
> > I think the OP's attempt at a dist-upgrade was described in item 2 of
> > his first mail.
>
> The problem is the kernel and some other changes that cause troubles.

Really?

You can deduce that from the sparse information provided by the OP?

I like "some other changes". Push something completely unspecified into
a discussion and we all nod our heads at the wisdom of the statement.

Jochen Spieker

unread,
Jun 20, 2016, 5:00:04 PM6/20/16
to
deloptes:
>
> Upgrade usually is done by
>
> apt-get update
> apt-get upgrade
> apt-get dist-upgrade

No. You upgrade to a new stable release by reading and following the
release notes.

J.
--
I am heading for the loony bin.
[Agree] [Disagree]
<http://archive.slowlydownward.com/NODATA/data_enter2.html>
signature.asc

Jeffrey Mark Siskind

unread,
Jun 21, 2016, 4:10:05 PM6/21/16
to
My posting has not appeared on debian-{boot,kernel,user}. I think it is
because of the attachments. I have removed them. I'll send the screen images
to people individually if they request them.
-------------------------------------------------------------------------------
I am cross posting this to debian-{boot,kernel,user}. I had replied to a reply
to my original post on debian-{boot,kernel} with a to: to the replier and a
cc: to debian-{boot,kernel} apparently it didn't get posted. So I am reposting
this there. And I am posting this on debian-user to provide more information
to all of the responders to my post there. My original post was short, just to
raise the issue. This post is longer, to provide all of the details that I
have.

Thanks to everyone for your help.

Some background. I have 23 machines.

11 Dell T5500 each has 4 disks
4 HP DL165 each has 3 disks
4 Dell Poweredge R815 each has 6 disks
4 Dell Poweredge C6145 each has 4 disks

All were purchased around 2011. All have been running wheezy reliably for
years and running squeeze reliably for years before that. The initial install
about 5 years ago was squeeze, with the squeeze installer. And then a
dist-upgrade to wheezy a few years later.

All machines within a class have the same hardware and have their disks
partitoned identically. The disks were partitioned at the time of the initial
install of squeeze about five years ago by the squeeze installer. All the
machines have SATA disks but different classes of machines have different
numbers of disks of different sizes. The disks on the T5500s and C6145s are
the same.

Dell T5500
sd[a-d]1 md0 RAID1 ext4 /
sd[a-d]2 md1 RAID5 ext4 /aux
sd[a-d]3 swap
DL165
sd[a-c]1 md0 RAID1 ext3 /
sd[a-c]2 md1 RAID5 ext3 /aux
sd[a-c]3 swap
R815
sd[a-f]1 md0 RAID1 ext3 /
sd[a-f]2 md1 RAID5 ext3 /aux
sd[a-f]3 swap
C6145
sd[a-d]1 md0 RAID1 ext3 /
sd[a-d]2 md1 RAID5 ext3 /aux
sd[a-d]3 swap

The reason that the T5500s have ext4 and the others do not is that the
machines were purchased at slightly different times and ext4 became available.

I first tried to do a dist-upgrade from wheezy to jessie one one machine of
each class. But the dist-upgrade hung on 3 of the 4 machine types. I didn't
save the details from that. But what I decided to do was a fresh install on
one machine of each class. That fresh install succeeded on the T5500, the
DL165, and the C6145. So I upgraded all of the T5500s, all of the DL165s, and
all of the C6145s with a fresh install of jessie. That was successfull. There
was (and still is) a minor issue with the C6145s. I will discuss that
later. But the attempted fresh install to one R815 has not been successful.

For the fresh installs, I am using the jessie installer on USB, built as
described below. I attempt to preserve the existing disk partitioning. I also
attempt to preserve the existing md1 /aux. These are my long-term data storage
and collectively have about 100 terabytes of data. I reformat md0 /, keeping
it as ext3 on the DL165s, R815s, and C6145s and keeping it as ext4 on the
T5500s.

On the R815, I first tried to do a fresh install from USB. (That was after the
unsuccessful attempt at a dist-upgrade from a wheezy installation that had
been running for years.) I tried that about 8 times, all unsuccessful. But it
fails in slightly different ways each time. That nondeterministic behavior,
described below, leads me to believe that there is a bug. After that, I tried
unsuccessfully to boot from a live wheezy. (See my other posts to
debian-user.) After that, I was successful in doing a fresh install of wheezy.
That install was a minimal install. I did nothing but the fresh install from
USB and I deselected all of the options for additional software to install.
After that minimal install of wheezy, all I did was:

nano /etc/apt/sources.list
(change all wheezy to jessie)
apt-get update
apt-get dist-upgrade
(answer default to all questions)
/sbin/reboot

The dist-upgrade did not complain and did not give any errors. But upon
reboot, it entered the initramfs. A screen picture is enclosed below.

I am only posting the part below because it has not previously been posted. To
the readers of debian-users, there have been posts to debian-{boot,kernel}
that may answer some of your questions and provide more information. I am not
reposting those. Likewise, to the readers of debian-{boot,kernel}, there have
been posts to debian-user that may answer some of your questions and provide
more information. I am not reposting those.

From: deloptes <delo...@gmail.com>
I failed today to upgrade wheezy to jessie on raided system as well.

Please note that all of the above systems have / as md0 RAID1. The fresh
install of jessie was successfull on all but the R815s.
--------------------------------------------------------------------------------
>     Then it fails to reboot and goes into the initramfs. I have a picture of
>     the screen if anybody wishes.

Yes please.  Also please use the 'rescue' boot option which enables
more verbose logging to the screen.

Thanks for your help.

Here is a screen picture.

This is after (a) a fresh install of wheezy followed by (b) an apt-get
dist-upgrade to jessie followed by (c) /sbin/reboot.

The above picture was taken before your email. I have since reinstalled a
fresh wheezy. I can redo the apt-get dist-upgrade to jessie and reboot with
the rescue boot option and take a new picture if you wish. But before I do so,
please let me know what else you would like me to do as part of the same
experiment. The experiment will take several hours (including the subsequent
reinstall of a fresh wheezy). So let's maximize the amount of information gain
with this experiment.

I conjecture that the jessie kernel has difficulty accessing the MD array on
disk. The same problem occurs when I attempt a direct fresh install of jessie
with the installer.

The machine has six disks, all ST9500530NS SATA. These have about 500GB each.
They all are partitioned identically with three partitions. sd[a-f]1 is RAID1
md0 ext3 mounted as /. sd[a-f]2 is RAID5 md1 ext3 mounted as /aux. sd[a-f]3
is swap.

Enclosed below is the output of fdisk on one disk. It is not from the
particular machine in question because that machine is not currently on the
net and I am offsite. But it is from another R815 purchased at the same time
that is running wheezy. All six disks on all four R815s are partitioned
identically. I partitioned them only once when I did a fresh install of
squeeze (with the squeeze installer) when I purchased the machines in about
2011.

When I fresh install either wheezy or jessie, I keep md1 and reformat
md0. When I apt-get dist-upgrade from wheezy to jessie, there is no reformat.

Here is what happens that is strange. When I do a fresh install of jessie, one
of the first things that the installer does is probe for hardware to try to
find the ISO. I have done this about 10 times. Sometimes (about 3 or 4) it
succeeds in finding the ISO. Sometimes (the rest) it comes up with a red
screen and claims that it can't find the ISO. In all cases, I am booting the
installer from the same USB dongle with the same data on it. I made the dongle
as follows:

# cd /tmp
# wget http://ftp.nl.debian.org/debian/dists/jessie/main/installer-amd64/current/images/hd-media/boot.img.gz
# wget http://cdimage.debian.org/cdimage/unofficial/non-free/cd-including-firmware/8.5.0+nonfree/amd64/iso-cd/firmware-8.5.0-amd64-netinst.iso
# zcat boot.img.gz >/dev/sdf
# mount /dev/sdf /mnt
# cp firmware-8.5.0-amd64-netinst.iso /mnt/.
# umount /mnt

(I actually have two such dongles, identical brand and size, with identical
data installed on them by the above. Sometime I use one and sometimes the
other.)

When it does find the ISO, it proceeds through the entire install without
issue until it gets to installing grub. Below are the answers that I give to
the installer. Somewhere in there, I forget exactly where but before the
network configuration, it asks which network device to use. The R815 has 4
identical ethernet ports. I select:

eth0: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet

When if gets to installing grub, I switch to ctrl-alt-f2 and type

cat /proc/mdstat

Every time so far, md1 has all 6 components. But md0 has only some of the
components, sometimes 5/6, sometimes 4/6, and sometimes 1/6. And every time it
is a different set of components. Even though, just a few minutes earlier, I
was running wheezy and md0 had all 6 components. I do

mdadm /dev/md0 --add <each of the missing components one by one>

but it refuses. I forget the error. If I redo a fresh wheezy install after a
failed jessie install, I get to the same place and do the same thing and it
does successfully add the missing components. I wait about a half an hour and
the array is successfully rebuilt. I then do

chroot target
grub-install /dev/sda
...
grub-install /dev/sdf

and it works. But if I attempt the grub-install in the jessie installer it
refuses. I forget the error.

In the jessie installer, no matter what I try, md0 has missing components, I
can't add them, and I can't install grub. If I go back to ctrl-alt-f1, it asks
what device to install grub to. I select sda. And I get a red screen that says
something like

Unable to install GRUB in /dev/sda
Executing 'grub-install /dev/sda' failed.
This is a fatal error.

If I look at ctrl-alt-f4, there are messages about unable to read block 2048
or 2052 or 2056 on dev/sd[a-f]. But there is no hardware problem. Because
right after this, I redo a fresh reinstall of wheezy from USB, rebuild md0 as
part of the process, install grub on all 6 drives as part of the process, and
everything works.

It is not just the jessie installer. If I do a fresh install of wheezy and get
a fully working wheezy with all six components of md0 and grub installed on
all 6 drives, and all I do is an apt-get dist-upgrade to jessie, I get no
errors during the upgrade. And after the upgrade, before reboot, all 6
components of md0 are there. (That is still running the wheezy kernel.) All I
do is /sbin/reboot and then it comes up in the initfs. And if I then do a
fresh reinstall of wheezy, I need to rebuild md0.

So it seems to me that something in the jessie kernel is broken, probably
related to the disk driver.

Also note that I upgraded to the latest BIOS. But the same exact problems
occurred both before the BIOS upgrade and after.

booting jessie also takes hours to do systemd
> configuration of the network

FYI, here is a screen picture where it takes minutes for systemd to bring up
the network. Note that I am not using DHCP. As per the enclosed, each host has
a fixed IPv4 address. There are fixed DNS servers. I am at a university and IT
services maintains the network for thousands of machines. I do not observe
issues bringing up the network when running wheezy.

Jeff (http://engineering.purdue.edu/~qobi)
--------------------------------------------------------------------------------
default Install
default English
default United States
default American English
Go Back
default Configure network manually
128.46.115.211
default netmask
default gateway
128.210.11.57 128.210.11.5 128.46.154.76
default hostname
default domain name
root password
root password
Jeffrey Mark Siskind
qobi
password
password
default Eastern
Manual
RAID1 #1
Ext3 journaling file system
Format the partition: yes, format it
Mount point: /
Done setting up the partition
RAID5 #1
Ext3 journaling file system
default Format the partition: no, keep existing data
Mount point: /aux
Done setting up the partition
Finish partitioning and write changes to disk
Yes
default United States
default ftp.us.debian.org
default blank
Yes
uncheck all
Yes
/dev/sda
Continue
-------------------------------------------------------------------------------
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000080

Device Boot Start End Blocks Id System
/dev/sda1 * 2048 78319615 39158784 fd Linux raid autodetect
/dev/sda2 78319616 859570175 390625280 fd Linux raid autodetect
/dev/sda3 859570176 976771071 58600448 82 Linux swap / Solaris

Don Armstrong

unread,
Jun 21, 2016, 4:30:03 PM6/21/16
to
On Tue, 21 Jun 2016, Jeffrey Mark Siskind wrote:
> Please note that all of the above systems have / as md0 RAID1. The fresh
> install of jessie was successfull on all but the R815s.
> --------------------------------------------------------------------------------
> >     Then it fails to reboot and goes into the initramfs. I have a picture of
> >     the screen if anybody wishes.
>
> Yes please.  Also please use the 'rescue' boot option which enables
> more verbose logging to the screen.
>
> Thanks for your help.
>
> Here is a screen picture.

Could you upload this to an image paste site or send it along (or use a
serial console to get it as text?)

> I conjecture that the jessie kernel has difficulty accessing the MD
> array on disk. The same problem occurs when I attempt a direct fresh
> install of jessie with the installer.

Which add-in card are you using on the R815s? What does the kernel
output while it is detecting the disks and partitions? Do all of the
drives show up properly? Are the blocksizes correct for the partitions?

When the boot fails, can you read from the underlying block devices? Do
the block devices get detected after the boot fails? Does specifying
delay=20 or similar result in a successful boot?

> Here is what happens that is strange. When I do a fresh install of jessie, one
> of the first things that the installer does is probe for hardware to try to
> find the ISO. I have done this about 10 times. Sometimes (about 3 or 4) it
> succeeds in finding the ISO. Sometimes (the rest) it comes up with a red
> screen and claims that it can't find the ISO. In all cases, I am booting the
> installer from the same USB dongle with the same data on it. I made the dongle
> as follows:
>
> # cd /tmp
> # wget http://ftp.nl.debian.org/debian/dists/jessie/main/installer-amd64/current/images/hd-media/boot.img.gz
> # wget http://cdimage.debian.org/cdimage/unofficial/non-free/cd-including-firmware/8.5.0+nonfree/amd64/iso-cd/firmware-8.5.0-amd64-netinst.iso
> # zcat boot.img.gz >/dev/sdf
> # mount /dev/sdf /mnt
> # cp firmware-8.5.0-amd64-netinst.iso /mnt/.

You can actually just cat firmware-8.5.0-amd64-netinst.iso > /dev/sdf;

> Every time so far, md1 has all 6 components. But md0 has only some of
> the components, sometimes 5/6, sometimes 4/6, and sometimes 1/6. And
> every time it is a different set of components. Even though, just a
> few minutes earlier, I was running wheezy and md0 had all 6
> components. I do
>
> mdadm /dev/md0 --add <each of the missing components one by one>
>
> but it refuses. I forget the error.

The error would be useful to know. Most likely one or more of them
dropped out of the array for some reason and you're booting off of one
which has a lower event count and it won't assemble.

But it could be any number of things.

The output of mdadm --examine /dev/sd[abcdef]1; when md0 fails to
assemble would also be useful.

--
Don Armstrong https://www.donarmstrong.com

S: Make me a sandwich
B: What? Make it yourself.
S: sudo make me a sandwich
B: Okay.
-- xkcd http://xkcd.com/c149.html

deloptes

unread,
Jun 21, 2016, 6:30:04 PM6/21/16
to
Don Armstrong wrote:

> The error would be useful to know. Most likely one or more of them
> dropped out of the array for some reason and you're booting off of one
> which has a lower event count and it won't assemble.
>
> But it could be any number of things.
>
> The output of mdadm --examine /dev/sd[abcdef]1; when md0 fails to
> assemble would also be useful.

In my case it is Dell OptiPlex 7xx - I have it under the desk for 2y now -
but it looks like it is 5y old.
When I looked into the drives they were detected but md disks seemed to be
messed and not easy recovarable.

What I observed that only raid0 was loaded but not raid1. After removing
raid0 and loading raid1 I was able to see at least the partitions of the
drives but I did not have time to go further, so as I had to do a lot in
the office and @home I just shut it down. I hope I'll have some time next
week to play with that. Good that I do not need a remote machine at the
moment.

Jeffrey Mark Siskind

unread,
Jun 21, 2016, 7:00:04 PM6/21/16
to
Thanks for your help.

> Here is a screen picture.

Could you upload this to an image paste site or send it along (or use a
serial console to get it as text?)

http://upplysingaoflun.ecn.purdue.edu/~qobi/20160619_140357.jpg

(The other screen picture of a machine (not an R815) that does boot but that
takes a really long time to bring up the network is at

http://upplysingaoflun.ecn.purdue.edu/~qobi/IMG-20160609-WA0000.jpeg

)

> I conjecture that the jessie kernel has difficulty accessing the MD
> array on disk. The same problem occurs when I attempt a direct fresh
> install of jessie with the installer.

Which add-in card are you using on the R815s?

I don't believe that I have any add-in cards. The machine was purchased
straight from Dell. It has six SATA disks and 4 gigabit ethernet ports. It has
four 12-core AMD CPUs and 128GB RAM. The output of lspci on an indentical
machin purchased at the same time that is still running wheezy is enclosed
below.

What does the kernel
output while it is detecting the disks and partitions? Do all of the
drives show up properly? Are the blocksizes correct for the partitions?

I don't know how to get this info when in the initramfs after boot. If you
tell me what commands I should give I will redo this exercise. Right now, I
have a fresh minimal wheezy reinstalled. But after the reinstall of wheezy,
everything works. I did not repartition either during the (re)install of
jessie or during the (re)install of wheezy. I go back and forth. The
(re)install of wheezy works and the (re)install of jessie does not.

When the boot fails, can you read from the underlying block devices? Do
the block devices get detected after the boot fails?

I don't know what one can do in at the initramfs command prompt. If you give
me some commands, I will try them out and post the output.

Does specifying delay=20 or similar result in a successful boot?

I will try this.

I made the dongle
> as follows:
>
> # cd /tmp
> # wget http://ftp.nl.debian.org/debian/dists/jessie/main/installer-amd64/current/images/hd-media/boot.img.gz
> # wget http://cdimage.debian.org/cdimage/unofficial/non-free/cd-including-firmware/8.5.0+nonfree/amd64/iso-cd/firmware-8.5.0-amd64-netinst.iso
> # zcat boot.img.gz >/dev/sdf
> # mount /dev/sdf /mnt
> # cp firmware-8.5.0-amd64-netinst.iso /mnt/.

You can actually just cat firmware-8.5.0-amd64-netinst.iso > /dev/sdf;

Please see my other post to debian-user

subject: how to make bootable live wheezy USB that doesn't use isohybrid

One of the exercises I tried was when the machine failed to boot after a fresh
USB-install of jessie, I tried to boot a live wheezy from USB by using a USB
dongle that I made by catting the isohybrid live wheezy ISO to the USB. But
the BIOS failed to detect the USB as bootable. I haven't tried to do that with
the netinst ISO but I suspect that it also won't be detected as bootable. But
when I build the USB dongle as per above it is detected by the BIOS as bootable.

> Every time so far, md1 has all 6 components. But md0 has only some of
> the components, sometimes 5/6, sometimes 4/6, and sometimes 1/6. And
> every time it is a different set of components. Even though, just a
> few minutes earlier, I was running wheezy and md0 had all 6
> components. I do
>
> mdadm /dev/md0 --add <each of the missing components one by one>
>
> but it refuses. I forget the error.

The error would be useful to know. Most likely one or more of them
dropped out of the array for some reason and you're booting off of one
which has a lower event count and it won't assemble.

But it could be any number of things.

The output of mdadm --examine /dev/sd[abcdef]1; when md0 fails to
assemble would also be useful.

I will try to get this info. It will require me to redo the exercise of a
fresh jessie install from USB. I'll have to take and post screen pictures
because I have no way to capture the console output. (I guess that I could use
iDRAC but I don't know how to and would have to learn.) If you let me know all
of the info you would like me to collect, I will try to collect it all in the
same retry of the fresh install.

But again note, that I do not believe that there are any disk hardware
errors. And I do not believe that there are any data errors in the layout of
the ext3 file system, the layout of the md0 raid array, or the partition
tables. The reason is that after the failed jessie install, I reinstall a
fressh wheezy from USB. I don't repartition. And I don't rebuild md1 and don't
rebuild /aux. But I do rebuild md0 and / as part of the fresh install. And it
works. I have done this over and over, switching between wheezy and jessie,
about a half dozen times. Each time, the jessie install leaves a different
collection of md0 components out. And each time, as part of the wheezy
install, I add them back in.

Thanks for your help.
Jeff (http://engineering.purdue.edu/~qobi)
--------------------------------------------------------------------------------
qobi@upplysingaoflun>lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (external gfx0 port A) (rev 02)
00:02.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port B)
00:03.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port C)
00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port D)
00:09.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port H)
00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
00:12.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.1 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0 USB OHCI1 Controller
00:12.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.1 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0 USB OHCI1 Controller
00:13.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller (rev 3d)
00:14.3 ISA bridge: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC host controller
00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI Bridge
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
00:1a.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:1a.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:1a.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:1a.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:1a.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
00:1b.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:1b.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:1b.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:1b.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:1b.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
00:1c.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:1c.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:1c.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:1c.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:1c.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
00:1d.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:1d.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:1d.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:1d.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:1d.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
00:1e.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:1e.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:1e.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:1e.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:1e.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
00:1f.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:1f.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:1f.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:1f.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:1f.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
04:00.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
04:01.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
04:04.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
04:05.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
0a:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
20:00.0 Host bridge: Advanced Micro Devices [AMD] nee ATI RD890 Northbridge only dual slot (2x8) PCI-e GFX Hydra part (rev 02)
20:02.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port B)
20:03.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port C)
20:0b.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (NB-SB link)
qobi@upplysingaoflun>

Ramon Diaz-Uriarte

unread,
Jun 22, 2016, 3:40:05 AM6/22/16
to
Dear All,

(I originally sent this by replying to, and thus threading under, another
thread; doing it right now :-)


It is my understanding that both systemd per se starting on v227 and

Jonathan Dowland

unread,
Jun 22, 2016, 7:20:03 AM6/22/16
to
Hi,


On Wed, Jun 22, 2016 at 09:14:21AM +0200, Ramon Diaz-Uriarte wrote:
> It is my understanding that both systemd per se starting on v227 and
> plymouth will cache passwords[1]. However, there is no caching of LUKS
> passwords in my setting, a laptop with two encrypted partitions,
> corresponding to root and swap, and where both share the passphrase.
snip
> I wonder if there is something I need to set/unset, or if I need to create
> some (which?) script in /etc/systemd/system.

Looking at the manpage[1], it would appear you need to specify the argument
"--keyname=somename" to the systemd-ask-password process in order for it to
try and cache the passphrases. You would need to use the same keyring name
for each invocation and the subsequent invocations need to also have
--accept-cached.

At boot time you aren't invoking systemd-ask-password yourself, so we need
to figure out what calls it and how to configure *that* to pass the keyname
argument through.

I haven't tested it, but if you copy and override
/lib/systemd/system/systemd-ask-password-console.service to /etc/systemd/system
and add the two arguments, that might work. (you might also need to regenerate
the initramfs).


[1] https://www.freedesktop.org/software/systemd/man/systemd-ask-password.html#

--
Jonathan Dowland
Please do not CC me, I am subscribed to the list.
signature.asc

Don Armstrong

unread,
Jun 22, 2016, 12:00:04 PM6/22/16
to
On Tue, 21 Jun 2016, Jeffrey Mark Siskind wrote:
> http://upplysingaoflun.ecn.purdue.edu/~qobi/20160619_140357.jpg

Are you certain that there isn't a PERC H700 in this machine? [Sort of
odd that mpt2sas is triggering a state error in your screenshot if there
actually isn't one.]

> I don't believe that I have any add-in cards. The machine was
> purchased straight from Dell. It has six SATA disks and 4 gigabit
> ethernet ports. It has four 12-core AMD CPUs and 128GB RAM. The output
> of lspci on an indentical machin purchased at the same time that is
> still running wheezy is enclosed below.

OK. This:

> 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]

makes me think that the SATA controller is in IDE/Legacy mode instead of
AHCI. In theory, this shouldn't matter, but it's possible that this is
also a problem. I'd try switching it in the bios and see what happens.

> What does the kernel output while it is detecting the disks and
> partitions?

Remove the quiet option from the kernel command line by editing it in grub.

> Do all of the drives show up properly?

echo /dev/sd*; should give you an idea of what is there in the initramfs.

> When the boot fails, can you read from the underlying block
> devices?

more /dev/sda; should work, I believe.

> I don't know what one can do in at the initramfs command prompt. If you give
> me some commands, I will try them out and post the output.
>
> Does specifying delay=20 or similar result in a successful boot?

> I will try this.

This should actually be rootdelay=20; sorry.

> I will try to get this info. It will require me to redo the exercise
> of a fresh jessie install from USB. I'll have to take and post screen
> pictures because I have no way to capture the console output.

I believe the R815 still has a serial port; you can just plug in a
serial cable and append an appropriate serial tty option to the kernel
command line to get output as text.

> But again note, that I do not believe that there are any disk hardware
> errors. And I do not believe that there are any data errors in the
> layout of the ext3 file system, the layout of the md0 raid array, or
> the partition tables. The reason is that after the failed jessie
> install, I reinstall a fressh wheezy from USB. I don't repartition.
> And I don't rebuild md1 and don't rebuild /aux. But I do rebuild md0
> and / as part of the fresh install. And it works.

Yes; it's possible that a change in one of the drivers between the
wheezy and jessie kernels is exposing a firmware bug (or there's a bug
in the kernel itself) which is causing this issue.

What I'm trying to do is get enough information so that the error is
obvious.


--
Don Armstrong https://www.donarmstrong.com

What I can't stand is the feeling that my brain is leaving me for
someone more interesting.

Ramon Diaz-Uriarte

unread,
Jun 22, 2016, 3:20:05 PM6/22/16
to

On Wed, 22-06-2016, at 12:55, Jonathan Dowland <jm...@debian.org> wrote:
> Hi,
>
>
> On Wed, Jun 22, 2016 at 09:14:21AM +0200, Ramon Diaz-Uriarte wrote:
>> It is my understanding that both systemd per se starting on v227 and
>> plymouth will cache passwords[1]. However, there is no caching of LUKS
>> passwords in my setting, a laptop with two encrypted partitions,
>> corresponding to root and swap, and where both share the passphrase.
> snip
>> I wonder if there is something I need to set/unset, or if I need to create
>> some (which?) script in /etc/systemd/system.
>
> Looking at the manpage[1], it would appear you need to specify the argument
> "--keyname=somename" to the systemd-ask-password process in order for it to
> try and cache the passphrases. You would need to use the same keyring name
> for each invocation and the subsequent invocations need to also have
> --accept-cached.
>
> At boot time you aren't invoking systemd-ask-password yourself, so we need
> to figure out what calls it and how to configure *that* to pass the keyname
> argument through.
>
> I haven't tested it, but if you copy and override
> /lib/systemd/system/systemd-ask-password-console.service to /etc/systemd/system
> and add the two arguments, that might work. (you might also need to regenerate
> the initramfs).


Thanks, but it does not seem to work.

- I copied /lib/systemd/system/systemd-ask-password-console.service to
/etc/systemd/system (i.e., it is not a symlink)

- I added --keyname=cryptsetup --accept-cached at the end of ExecStart

- Regenerated initramfs

- s2disk and then boot. I am still asked for both passwords.


- Note I am not using plymouth at the moment, but I understand this should
work without plymouth.

Best,,,



>
>
> [1] https://www.freedesktop.org/software/systemd/man/systemd-ask-password.html#

deloptes

unread,
Jun 22, 2016, 3:30:05 PM6/22/16
to
Don Armstrong wrote:

> Yes; it's possible that a change in one of the drivers between the
> wheezy and jessie kernels is exposing a firmware bug (or there's a bug
> in the kernel itself) which is causing this issue.
>
> What I'm trying to do is get enough information so that the error is
> obvious.

Sorry to bother you again. I turned on the Dell OptiPlex today and gathered
some information.

The problem as I see it is with the raid created by the installer in wheezy,
but I don't have time or opportunity to verify my statement.
However I see that there is only sda and sdb, while I have sd[ab]1,2,3,5 and
fdisk shows this correctly.

Jeffrey, can you check this on your Dell system?

Here I see following

deloptes

unread,
Jun 22, 2016, 3:40:04 PM6/22/16
to
Sorry previous went out incomplete, because of some shortcut I pressed
wrongly

Here is what I found

lrwxrwxrwx 1 root root 9 Jun 22 14:16
ata-WDC_WD800GD-75FLC3_WD-WMAKE1962410 -> ../../sda
lrwxrwxrwx 1 root root 9 Jun 22 14:16
ata-WDC_WD800JD-75JNC0_WD-WCAM97914701 -> ../../sdb
lrwxrwxrwx 1 root root 10 Jun 22 14:16
dm-name-isw_dgebjhdbhb_Volume0 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Jun 22 14:16
dm-name-isw_dgebjhdbhb_Volume0p1 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Jun 22 14:16
dm-name-isw_dgebjhdbhb_Volume0p2 -> ../../dm-2
lrwxrwxrwx 1 root root 10 Jun 22 14:16
dm-name-isw_dgebjhdbhb_Volume0p3 -> ../../dm-3
lrwxrwxrwx 1 root root 10 Jun 22 14:16
dm-uuid-DMRAID-isw_dgebjhdbhb_Volume0 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Jun 22 14:16
dm-uuid-part1-DMRAID-isw_dgebjhdbhb_Volume0 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Jun 22 14:16
dm-uuid-part2-DMRAID-isw_dgebjhdbhb_Volume0 -> ../../dm-2
lrwxrwxrwx 1 root root 10 Jun 22 14:16
dm-uuid-part3-DMRAID-isw_dgebjhdbhb_Volume0 -> ../../dm-3
lrwxrwxrwx 1 root root 10 Jun 22 14:16
raid-isw_dgebjhdbhb_Volume0-part1 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Jun 22 14:16
raid-isw_dgebjhdbhb_Volume0-part2 -> ../../dm-2
lrwxrwxrwx 1 root root 10 Jun 22 14:16
raid-isw_dgebjhdbhb_Volume0-part3 -> ../../dm-3

mdadm --examine --scan
ARRAY metadata=imsm UUID=4613f991:8bbd4593:72c1388f:0b91f6a7
ARRAY /dev/md/Volume0 container=4613f991:8bbd4593:72c1388f:0b91f6a7 member=0
UUID=546fe9f1:4a141c96:5d18debe:ee4cb184
ARRAY metadata=imsm UUID=4613f991:8bbd4593:72c1388f:0b91f6a7
ARRAY /dev/md/Volume0 container=4613f991:8bbd4593:72c1388f:0b91f6a7 member=0
UUID=546fe9f1:4a141c96:5d18debe:ee4cb184

dmesg

[ 4.734576] ata3: SATA link down (SStatus 4 SControl 300)
[ 4.734619] ata4: SATA link down (SStatus 0 SControl 300)
[ 4.916035] usb 2-1: new high-speed USB device number 2 using ehci-pci
[ 5.044071] ata1.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 5.044082] ata1.01: SATA link down (SStatus 4 SControl 300)
[ 5.044206] ata2.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 5.044218] ata2.01: SATA link down (SStatus 4 SControl 300)
[ 5.049227] usb 2-1: New USB device found, idVendor=0ff9, idProduct=1003
[ 5.049231] usb 2-1: New USB device strings: Mfr=1, Product=2,
SerialNumber=3
[ 5.049234] usb 2-1: Product: RunDisk
[ 5.049237] usb 2-1: Manufacturer: USBDisk
[ 5.049240] usb 2-1: SerialNumber: ABCDEFGH
[ 5.052207] ata2.00: LPM support broken, forcing max_power
[ 5.052321] ata2.00: ATA-6: WDC WD800JD-75JNC0, 06.01C06, max UDMA/100
[ 5.052324] ata2.00: 156250000 sectors, multi 8: LBA
[ 5.053266] ata1.00: ATA-6: WDC WD800GD-75FLC3, 32.08G32, max UDMA/133
[ 5.053270] ata1.00: 156250000 sectors, multi 8: LBA48
[ 5.056016] tsc: Refined TSC clocksource calibration: 2925.981 MHz
[ 5.060131] ata2.00: LPM support broken, forcing max_power
[ 5.060261] ata2.00: configured for UDMA/100
[ 5.069165] ata1.00: configured for UDMA/133
[ 5.069291] scsi 0:0:0:0: Direct-Access ATA WDC WD800GD-75FL
8G32 PQ: 0 ANSI: 5
[ 5.069470] sd 0:0:0:0: [sda] 156250000 512-byte logical blocks: (80.0
GB/74.5 GiB)
[ 5.069497] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 5.069512] sd 0:0:0:0: [sda] Write Protect is off
[ 5.069514] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 5.069534] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA
[ 5.069610] scsi 1:0:0:0: Direct-Access ATA WDC WD800JD-75JN
1C06 PQ: 0 ANSI: 5
[ 5.069792] sd 1:0:0:0: Attached scsi generic sg1 type 0
[ 5.069794] sd 1:0:0:0: [sdb] 156250000 512-byte logical blocks: (80.0
GB/74.5 GiB)
[ 5.069836] sd 1:0:0:0: [sdb] Write Protect is off
[ 5.069839] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 5.069865] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA
[ 5.090456] sda: sda1 sda2 sda3 < sda5 >
[ 5.090855] sd 0:0:0:0: [sda] Attached SCSI disk
[ 5.107116] sdb: sdb1 sdb2 sdb3 < sdb5 >
[ 5.107497] sd 1:0:0:0: [sdb] Attached SCSI disk

Lennart Sorensen

unread,
Jun 22, 2016, 3:50:04 PM6/22/16
to
Oh dear, that means you are using intel fake raid. I had no end of
trouble when I tried to do that, and often had to manually start the
raid in the initramfs before the boot would continue.

Unless you are sharing the drive with windows I would highly recommend
avoiding that, and doing the software raid purely in linux. It is much
simpler and much better supported.

--
Len Sorensen

Don Armstrong

unread,
Jun 22, 2016, 4:00:04 PM6/22/16
to
On Wed, 22 Jun 2016, Jeffrey Mark Siskind wrote:
> and attempted
>
> mdadm /dev/md0 --add /dev/sda1
> mdadm /dev/md0 --add /dev/sdb1
> mdadm /dev/md0 --add /dev/sdc1
> mdadm /dev/md0 --add /dev/sdd1
> mdadm /dev/md0 --add /dev/sde1
> mdadm /dev/md0 --add /dev/sdf1
>
> but these all failed.

This is the wrong command; it should be mdadm --assemble /dev/md0
/dev/sd[abcdef]1;

And that should only be done if the md0 device doesn't show up in the
initrd when you cat /proc/mdstat.

What's happened is that the raid1 device now has 12 drives instead of 6,
which basically isn't going to work at all.

You should be able to just directly reinstall jessie on this machine;
I'd also zero out the superblocks on the devices in /dev/md0, and then
assuming that the syncing has proceeded enough, you should be able to
install grub with an appropriate rootdelay and get it to boot. (Again,
in theory.)

--
Don Armstrong https://www.donarmstrong.com

The computer allows you to make mistakes faster than any other
invention, with the possible exception of handguns and tequila
-- Mitch Ratcliffe

Jeffrey Mark Siskind

unread,
Jun 22, 2016, 4:10:04 PM6/22/16
to
Are you certain that there isn't a PERC H700 in this machine? [Sort of
odd that mpt2sas is triggering a state error in your screenshot if there
actually isn't one.]

There could be one. But I probably don't use it. I use software RAID. Dell
wouldn't sell an R815 without an OS. I think I purchased it with RHEL which
may have needed the PERC H700. But I never even booted RHEL. The first thing I
did was a fresh install of squeeze, or maybe wheezy.

OK. This:

> 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]

makes me think that the SATA controller is in IDE/Legacy mode instead of
AHCI. In theory, this shouldn't matter, but it's possible that this is
also a problem. I'd try switching it in the bios and see what happens.

I'll do that in a bit. Before I got your current post, I tried some things in
response to your previous post. I'll report on that here and then go back and
try the new things.

Here is what I did.

I had a fresh minimal USB install of wheezy running. That install was done
with debian-wheezy-DI-b1-amd64-netinst.iso from Jul 15 2012. I also put the
non-free firmware on the USB. When I did that, I unchecked all of the boxes
during the install for any extra packages. The only thing that I installed
after that was

apt-get install less

I then did

nano /etc/apt/source.list
(change all wheezy to jessie)
apt-get update
apt-get dist-upgrade

I answered all of the defaults.

(default) all
(default) no
(default) cron

I captured this with

script -t 2>upgrade-jessie1 time -a ~/upgrade-jessie1.script

(My mistake. I forgot a period between upgrade-jessie1 and time.)

http://upplysingaoflun.ecn.purdue.edu/~qobi/time
http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie1

You can see that it all worked.

You can see that at the end I did

apt-get install firmware-linux

dpkg-reconfigure grub-pc
# default
# default
# check all /dev/sd?

and it all worked.

You can also see that at the end I did

cat /proc/mdstat

and all 6 components of both md0 and md1 were there.

Then I did and

/sbin/reboot

The first reboot failed. It gave a similar screen as to the one that you
already saw.

Then I did a second reboot, with delay=20. That did the same.

Then I did a third reboot, with rootdelay=20. That worked. I got a login
prompt, logged in, and got a root shell.

At that point, I did a

cat /proc/mdstat

and all 6 components of both md0 and md1 were there.

Then I did a

dpkg-reconfigure grub-pc

My intent was to add rootdelay=20 to the command line. But I got lots of
errors while doing so. I realized that I should have done this under script.
So I did

script -t 2>upgrade-jessie2.time -a ~/upgrade-jessie2.script

(this time with the period) and redid

dpkg-reconfigure grub-pc

and also did

cat /proc/mdstat

and attempted

mdadm /dev/md0 --add /dev/sda1
mdadm /dev/md0 --add /dev/sdb1
mdadm /dev/md0 --add /dev/sdc1
mdadm /dev/md0 --add /dev/sdd1
mdadm /dev/md0 --add /dev/sde1
mdadm /dev/md0 --add /dev/sdf1

but these all failed.

http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie2.script
http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie2.time

The machine is now in the state left at the end of the above script. If you
want me to do some more things in this state, let me know. Or I can do a fresh
USB install of wheezy and rebuild md0.

> What does the kernel output while it is detecting the disks and
> partitions?

Remove the quiet option from the kernel command line by editing it in grub.

I will do this next time.

> Do all of the drives show up properly?

echo /dev/sd*; should give you an idea of what is there in the initramfs.

I will do this next time.

> When the boot fails, can you read from the underlying block
> devices?

more /dev/sda; should work, I believe.

I will do this next time.

> I don't know what one can do in at the initramfs command prompt. If you give
> me some commands, I will try them out and post the output.
>
> Does specifying delay=20 or similar result in a successful boot?

> I will try this.

This should actually be rootdelay=20; sorry.

Done. See above.

> I will try to get this info. It will require me to redo the exercise
> of a fresh jessie install from USB. I'll have to take and post screen
> pictures because I have no way to capture the console output.

I believe the R815 still has a serial port; you can just plug in a
serial cable and append an appropriate serial tty option to the kernel
command line to get output as text.

I figured out how to use script. That will work for most situations.

What I'm trying to do is get enough information so that the error is
obvious.

Thanks. Let me know what you want me to try next. Do you still wish me to do
the following?

> What does the kernel output while it is detecting the disks and
> partitions?

Remove the quiet option from the kernel command line by editing it in grub.

> Do all of the drives show up properly?

echo /dev/sd*; should give you an idea of what is there in the initramfs.

> When the boot fails, can you read from the underlying block
> devices?

more /dev/sda; should work, I believe.

Should it be with or without rootdelay=20?

Jeff (http://engineering.purdue.edu/~qobi)

Jeffrey Mark Siskind

unread,
Jun 22, 2016, 5:30:04 PM6/22/16
to
> and attempted
>
> mdadm /dev/md0 --add /dev/sda1
> mdadm /dev/md0 --add /dev/sdb1
> mdadm /dev/md0 --add /dev/sdc1
> mdadm /dev/md0 --add /dev/sdd1
> mdadm /dev/md0 --add /dev/sde1
> mdadm /dev/md0 --add /dev/sdf1
>
> but these all failed.

This is the wrong command; it should be mdadm --assemble /dev/md0
/dev/sd[abcdef]1;

And that should only be done if the md0 device doesn't show up in the
initrd when you cat /proc/mdstat.

What's happened is that the raid1 device now has 12 drives instead of 6,
which basically isn't going to work at all.

You can see from the transcript that md0 is there and has only 6 drives. Just
that 5 of the six are marked as failed. And you can see that it refused to do
the mdadm --add.

http://upplysingaoflun.ecn.purdue.edu/~qobi/upgrade-jessie2.script

root@verstand:~# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 sda2[0] sdf2[5] sdd2[4] sdc2[3] sde2[2] sdb2[1]
1953118720 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]

md0 : active raid1 sda1[6](F) sdd1[8](F) sdb1[7](F) sde1[9](F) sdc1[10] sdf1[11](F)
39157688 blocks super 1.2 [6/1] [__U___]

unused devices: <none>
root@verstand:~# mdadm --add [K [K [K [K [K/dev/md0 --add /def [Kv/sda1
mdadm: Cannot open /dev/sda1: Device or resource busy
root@verstand:~# mdadm /dev/md0 --add /dev/sda1 [K [Kb1
mdadm: Cannot open /dev/sdb1: Device or resource busy
root@verstand:~# mdadm /dev/md0 --add /dev/sdb1 [K [Kd1
mdadm: Cannot open /dev/sdd1: Device or resource busy
root@verstand:~# mdadm /dev/md0 --add /dev/sdd1 [K [Ke1
mdadm: Cannot open /dev/sde1: Device or resource busy
root@verstand:~# mdadm /dev/md0 --add /dev/sde1 [K [Kf1
mdadm: Cannot open /dev/sdf1: Device or resource busy
root@verstand:~# mdadm /dev/md0 --add /dev/sdf1 [1P1 c1
mdadm: Cannot open /dev/sdc1: Device or resource busy

You should be able to just directly reinstall jessie on this machine;

In earlier posts I explained how this fails. If I do a direct install from
USB, I observe two kinds of errors.

1. Sometimes, but not every time, (it is nondeterministic) after the first 3
questions, the installer complains that it can't find the ISO.
2. Whenever it does find the ISO, the install progresses without error all
the way to the grub install and then complains that it can't install grub.
I've tried several different things. Sometimes, I just answer sda to the
grub install question. (Actually sometimes sdb, because if I plug the USB
into the front port, the USB gets sdg and the drives get sd[a-f] but if I
plug the USB into the back port, the USB gets sda and the drives get
sd[b-g].) But this always fails. Sometimes, I go into ctrl-alt-f2 and do
chroot target
grub-install /dev/sda
...
grub-install /dev/sdf
(or b-g as appropriate)
but this also fails. At that point, I have no way to install grub. (If I
abort the install, the machine is unbootable.) Whenever I'm in this state
I do cat /proc/mdstat and it shows that some components of md0 are failed
or missing. Some are present. This is nondeterministic. Which components
are present and which are missing changes each time I attempt this. If I
attempt to do mdadm --add I get errors. If I reinstall fresh wheezy from
USB and then in wheezy do mdadm --add, it works and rebuilds the
array. When it is done it has all 6 components. And then I immediately do
a fresh install of jessie from USB and the same problem happens.

I'd also zero out the superblocks on the devices in /dev/md0,

What command?

Jeff (http://engineering.purdue.edu/~qobi)

Jared_D...@dell.com

unread,
Jun 22, 2016, 5:40:04 PM6/22/16
to
> Are you certain that there isn't a PERC H700 in this machine? [Sort of
> odd that mpt2sas is triggering a state error in your screenshot if there
> actually isn't one.]
>
> There could be one. But I probably don't use it. I use software RAID. Dell
> wouldn't sell an R815 without an OS. I think I purchased it with RHEL which
> may have needed the PERC H700. But I never even booted RHEL. The first
> thing I did was a fresh install of squeeze, or maybe wheezy.

We definitely sell PowerEdge systems without an OS and have for quite a while. However, we do limit configuration for higher end systems to include hardware RAID.

There's definitely a PERC controller in there based on <https://lists.debian.org/debian-user/2016/06/msg00934.html>
"05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)"

I'm not seeing the subvendor/subsystem ID's there but it's presumably the PERC 6/i. If you're really not using it at all, you might be able to pull it out if the driver for it is causing problems. However, I suspect you need it to connect to the drive backplane. Stuart (CCed) may be able to offer some more insight into driver issues you might see.

The SATA controller should only really be in use by the optical drive if present. Some of the mid-tier systems of that generation support SATA drives connected directly to a controller on the motherboard, but support for that under Linux was spotty from my recollection.
--
Jared Domínguez
OS Architect
Linux Engineering
Dell | Client Software Group

Jeffrey Mark Siskind

unread,
Jun 22, 2016, 5:40:04 PM6/22/16
to
I conjecture that there may be two to five separate issues.

1. Setting up md0 upon boot takes a long time. rootdelay=20 fixes this.
2. There is a problem writing to disk. Perhaps just writing to certain blocks.
Because even when the machine boots with rootdelay=20, and md0 has all 6
components, grub-install fails and causes md0 to drop some/most of its
components.

Both of these are observed with a dist path-upgrade from a fresh USB install
of wheezy to jessie. Separate from this, there are two other errors observed
with a direct fresh USB install of jessie.

3. Can't find the ISO.
4. grub-install
This may be the same as (2) above.

This is yet distinct from the fact that

5. a fresh direct USB install of jessie on the Dell Poweredge C6145s takes a
really long time (an hour) for each hardware probe (three times, once
before finding the ISO, once before partitioning, and once before grub
install).

Jeff (http://engineering.purdue.edu/~qobi)

Jeffrey Mark Siskind

unread,
Jun 22, 2016, 6:20:04 PM6/22/16
to
> Are you certain that there isn't a PERC H700 in this machine? [Sort of
> odd that mpt2sas is triggering a state error in your screenshot if there
> actually isn't one.]
>
> There could be one. But I probably don't use it. I use software RAID. Dell
> wouldn't sell an R815 without an OS. I think I purchased it with RHEL which
> may have needed the PERC H700. But I never even booted RHEL. The first
> thing I did was a fresh install of squeeze, or maybe wheezy.

We definitely sell PowerEdge systems without an OS and have for quite a
while. However, we do limit configuration for higher end systems to include
hardware RAID.

My appologies. I may misremember. I purchased the machines (twelve T5500s,
four R815s, and four C6145s) about 5 years ago and don't remember precisely
the arrangements. I'd have to check archived email to know for sure.

The machines were purchased through ECN (Purdue's Engineering IT services). I'm
a lowly professor. But I software-maintain my own machines. I definitely
didn't spec out a hardware RAID controller. The mechanisms by which one was
included are unclear at this point.

There's definitely a PERC controller in there based on <https://lists.debian.org/debian-user/2016/06/msg00934.html>
"05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)"

I'm not seeing the subvendor/subsystem ID's there but it's presumably the
PERC 6/i. If you're really not using it at all, you might be able to pull
it out if the driver for it is causing problems. However, I suspect you
need it to connect to the drive backplane. Stuart (CCed) may be able to
offer some more insight into driver issues you might see.

The SATA controller should only really be in use by the optical drive if
present. Some of the mid-tier systems of that generation support SATA
drives connected directly to a controller on the motherboard, but support
for that under Linux was spotty from my recollection.

My T5500s have optical drives. But neither my R815s nor my C6145s have optical
drives. All my machines have SATA drives. The R815s in question each have
six ST9500530NS drives. They have been running squeeze and then wheezy with
software RAID for 5 years since purchase.

Now that I have someone from Dell on the line who appears to be
Debian-friendly, it would be nice if you made firmware upgrades
Debian-friendly. I have been able to apply

R815_BIOS_JF8YH_LN_3.2.2.BIN

but have not been able to apply

ESM_Firmware_7N76T_LN32_1.07_A00.BIN
ESM_Firmware_J7YYK_LN32_2.85_A00.BIN
SATA_FRMW_LX_R300994.BIN

(I don't even know if either of the ESM upgrades are for my hardware. But the
shell scripts don't run.)

Jeff (http://engineering.purdue.edu/~qobi)

Jared_D...@dell.com

unread,
Jun 22, 2016, 6:30:06 PM6/22/16
to
Dell Customer Communication
> > Are you certain that there isn't a PERC H700 in this machine? [Sort of
> > odd that mpt2sas is triggering a state error in your screenshot if there
> > actually isn't one.]
> >
> > There could be one. But I probably don't use it. I use software RAID. Dell
> > wouldn't sell an R815 without an OS. I think I purchased it with RHEL
> which
> > may have needed the PERC H700. But I never even booted RHEL. The first
> > thing I did was a fresh install of squeeze, or maybe wheezy.
>
> We definitely sell PowerEdge systems without an OS and have for quite a
> while. However, we do limit configuration for higher end systems to
> include
> hardware RAID.
>
> My appologies. I may misremember. I purchased the machines (twelve
> T5500s, four R815s, and four C6145s) about 5 years ago and don't remember
> precisely the arrangements. I'd have to check archived email to know for
> sure.
>
> The machines were purchased through ECN (Purdue's Engineering IT
> services). I'm a lowly professor. But I software-maintain my own machines. I
> definitely didn't spec out a hardware RAID controller. The mechanisms by
> which one was included are unclear at this point.

It looks like Stuart is out of office, but I'll try to remember to ping him when he's back.

> There's definitely a PERC controller in there based on
> <https://lists.debian.org/debian-user/2016/06/msg00934.html>
> "05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008
> PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)"
>
> I'm not seeing the subvendor/subsystem ID's there but it's presumably the
> PERC 6/i. If you're really not using it at all, you might be able to pull
> it out if the driver for it is causing problems. However, I suspect you
> need it to connect to the drive backplane. Stuart (CCed) may be able to
> offer some more insight into driver issues you might see.
>
> The SATA controller should only really be in use by the optical drive if
> present. Some of the mid-tier systems of that generation support SATA
> drives connected directly to a controller on the motherboard, but support
> for that under Linux was spotty from my recollection.
>
> My T5500s have optical drives. But neither my R815s nor my C6145s have
> optical drives. All my machines have SATA drives. The R815s in question each
> have six ST9500530NS drives. They have been running squeeze and then
> wheezy with software RAID for 5 years since purchase.
>
> Now that I have someone from Dell on the line who appears to be Debian-
> friendly, it would be nice if you made firmware upgrades Debian-friendly. I
> have been able to apply

If you have an iDRAC, it's possible to those updates out-of-band using the Lifecycle Controller using either WS-MAN or racadm.

I'm told (I work on client platforms (laptops/desktops/etc) now so haven't checked) that DUPs (the .BIN files) built after December 2014 should work on Debian and Ubuntu now, though only PowerEdge 12G/13G were tested. Also, not all types of DUPs have been tested, but the BIOS and iDRAC DUPs should work pretty well. More obscure stuff like Qlogic DUPs may not work. I'm not working in that area so am just relaying information and don't know much more than that.

> R815_BIOS_JF8YH_LN_3.2.2.BIN
>
> but have not been able to apply
>
> ESM_Firmware_7N76T_LN32_1.07_A00.BIN
> ESM_Firmware_J7YYK_LN32_2.85_A00.BIN
> SATA_FRMW_LX_R300994.BIN
>
> (I don't even know if either of the ESM upgrades are for my hardware. But
> the shell scripts don't run.)

ESM = Embedded Server Management. "ESM" updates are for updating the iDRAC.

> Jeff (http://engineering.purdue.edu/~qobi)

deloptes

unread,
Jun 23, 2016, 2:40:03 AM6/23/16
to
Lennart Sorensen wrote:

> Unless you are sharing the drive with windows I would highly recommend
> avoiding that, and doing the software raid purely in linux.  It is much
> simpler and much better supported.

Hi,
this is not a critical machine. I let debian configure the raid at
installation time. Usually I do it myself and I use software raid - as you
said it is in many ways better.

What should I do to revive the raid and be able to boot? You said you have
done this.

thanks

deloptes

unread,
Jun 23, 2016, 3:20:05 PM6/23/16
to
Just FYI

I managed to solve this on the OptiPlex today - it took me 3h and several
reboots.

The thing is that the BIOS is locked by the admins and the built in raid
controller can not be deactivated. Thus obviously when upgrading the array
info was dumped into mdadm.conf from the built in array or may be even
before - I don't know, which lead to broken initrd.

When in the initramfs shell I had to unload all related modules because I
had only sda and sdb (no prtitions). After loading ata_piix I get all the
partitions and loading raid1 + ext4 was enough to start raid and mount
root.
mdadm -A -R /dev/md0 etc

then
mount -o bind /proc /root/proc
sys, run, dev ...

mount /dev/md2 /root
cd /root

exec /sbin/chroot . /bin/sh <<- EOF >dev/console 2>&1
exec /sbin/init ${CMDLINE}
EOF

install 3.16 kernel and update-initramfs, but first remove the wrong ARRAY
info from mdadm.conf

DONE.

messy

regards

Jeffrey Mark Siskind

unread,
Jun 24, 2016, 6:40:04 PM6/24/16
to
Please note that bootint with rootdelay=20 does not solve the problem. It only
masks it.

1. If I attempt a fresh USB install of jessie, when md0 is correctly built
before the install, the process of doing the fresh install breaks
md0. When it gets to grub install, components of md0 are missing (even
though all six components were present before the install). And
grub-install fails. At this point it is impossible to complete the install
and produce a bootable system.

2. If I do a fresh minimal USB install of wheezy, rebuilding md0 in the
process, and then do a dist-upgrade to jessie, I can manually add
rootdelay=20 in grub and boot into jessie with all six components of md0
present. But if I do so, then after boot, if I do dpkg-reconfigure pc-grub,
doing that gives errors, drops components of md0, precludes me from adding
them back, fails to install grub, and leaves the machine in an unbootable
state.

I fear that there is a problem writing to disk. Even if I boot with
rootdelay=20, unless the kind of writes that dpkg-reconfigure pc-grub does are
different, doing ordinary writes to disk may also corrupt the disk.

Please let me know what new information you would like me to gather.

Jeff (http://engineering.purdue.edu/~qobi)

Steve McIntyre

unread,
Jun 24, 2016, 7:00:04 PM6/24/16
to
Ummm. Checking back up-thread, I can see that you're using md0 across
more than 4 disks and you're trying to boot off it with
grub-pc. You're hitting BIOS limitations here - the BIOS is only
capable of accessing 4 disks. I'm *guessing* that maybe the newer grub
in jessie is just being pickier about checking BIOS access to those
disks. Try just using 4 of the disks for md0, and I'd expect it to
work.

--
Steve McIntyre, Cambridge, UK. st...@einval.com
"Arguing that you don't care about the right to privacy because you have
nothing to hide is no different than saying you don't care about free
speech because you have nothing to say."
-- Edward Snowden

Ben Hutchings

unread,
Jun 27, 2016, 8:30:04 AM6/27/16
to
On Mon, 2016-06-27 at 08:07 -0400, Jeffrey Mark Siskind wrote:
[...]
>     Whenever I observe any of the behavior reported in this email, it is
>     almost always associated with dmesg reporting the same error on the same
>     sector 2056 (sometimes 2058 or 2062). Given the dozens of attempted
>     reinstalls and reboots, at this point, I have seen this on almost all, if
>     not all, of the six disks on each of the four machines. I don't believe
>     that 24 disks all have the same bad sectors.

The first partition probably starts at an offset of 1MB, which is 2048
sectors.  So these errors are presumably occurring while reading a
filesystem label near the start of that partition, which is pretty much
the first thing that will happen after the array is assembled.

[...]
>  D. In step (4), there appears to be nondeterminism in the serial numbers of
>     the disks that get reported in the menu of options of where to install
>     grub. Sometimes, the disks get reported as ata-*, sometimes as scsi-*.
>     Note that all of my disks are SATA so the ones reported as scsi-* are
>     clearly in error. If I do fresh installs multiple times on the same
>     machine, each time it reports different serial numbers for the disks.

Linux uses an ATA/SCSI translation layer (libata), so that each ATA
drive is also seen as a SCSI drive and has two such identifiers.  The
non-determinism in which identifiers are shown might be a bug in the
installer, or it might be caused by failure of ID commands to the
drives.

[...]
> Note that there is a lot of nondeterministic behavior (all cases above where I
> say "sometimes"). In all cases, I do exactly the same thing over and over to
> the same machine and get different behavior.

This is an unfortunate effect of doing multiple things in parallel,
which is really the only way to make them go fast.

I think most of the problems you're still having must be caused by a
bug in the RAID driver, mpt2sas (or its firmware, if that's not
embedded in the BIOS).

Ben.

--

Ben Hutchings
Humour is the best antidote to reality.
signature.asc

Jeffrey Mark Siskind

unread,
Jun 27, 2016, 8:30:04 AM6/27/16
to
I'd like to thank everyone for helping out.

Here is an update on installing jessie on R815s.

I succeeded in installing on three of my four R815s. But I am holding off on
the last because it is my file server and there are still issues. Please read
on. I don't believe that the problem is solved and there may be a bug lurking
that can lead to data loss.

Here is what I did.

1. Before the install, while still running wheezy, I upgraded the BIOS.
R815_BIOS_JF8YH_LN_3.2.2.BIN
This seemed to alleviate the problem of the jessie installer failing to
find the ISO. More on this later.

2. Before the install, while still running wheezy, I reduced the number of
components of md0 from 6 to 4. This was in response to Steve' suggestion.
mdadm /dev/md0 --fail /dev/sdf1
mdadm /dev/md0 --fail /dev/sde1
mdadm /dev/md0 --remove /dev/sdf1
mdadm /dev/md0 --remove /dev/sde1

3. I did a fresh USB install of jessie. More on this later.

4. When it asked about which devices to install grub, I answered "manual" and
then typed /dev/sdb. More on this later.

5. After the fresh install, I rebooted, and in grub, I added rootdelay=20.
This was in response to Don's suggestion.

6. After the reboot, I ran my standard post-install script. Among other
things, this installs numerous packages, makes a small number of mods to
/etc, and does a dpkg-reconfigure grub-pc. When it did that, I specified
only the 4 drives with active components of md0 and added rootdelay=20.

7. I rebooted. More on this later.

Now for the issues.

A. Even after the BIOS upgrade, when it no longer fails to find the ISO,
during the installer phase where it searches for an ISO, I notice
nondetermininstic behavior. Sometimes it searchs sdb{1,2,3}, sdc{1,2,3},
sdd{1,2,3}, sde{1,2,3}, sdf{1,2,3}, sdg{1,2,3}, sd{a,b,c,d,e,f,g} and
eventually finds an ISO (sda is the USB dongle). Sometimes it finds the
ISO right away without any searching. This doesn't cause problems but I
believe that it is symptomatic of other problems.

B. I'm not sure that reducing the number of components of md0 to 4 and/or
adding rootdelay=20 really solved the problem. I think it just reduced the
likelihood of occurrence. On one of the machines (arivu), during the
reboot in step (7), at an early phase of the boot, the machine first
reported that it found all 4 components of md0 and all 6 components of md1.
Then at a later phase it reported that there were errors on 3 of the 4
components. After the machine came up, md0 had only one component. Three
of the four components were in failed (F) state. I did mdadm --remove to
them and then mdadm --add to them. This doesn't happen all of the time. But
it happens some of the time.


qobi@upplysingaoflun>all-n-3g dmesg --level=err
upplysingaoflun:
verstand:
arivu:
[ 28.012558] mpt2sas0: fault_state(0x265d)!
[ 29.231355] end_request: I/O error, dev sdb, sector 2056
[ 29.231600] end_request: I/O error, dev sdc, sector 2056
[ 29.231773] end_request: I/O error, dev sde, sector 2056
[ 29.232020] end_request: I/O error, dev sda, sector 2056
perisikan:
[ 13.035132] mpt2sas0: fault_state(0x265d)!
[ 28.600099] mpt2sas0: fault_state(0x265d)!
qobi@upplysingaoflun>

qobi@upplysingaoflun>all-n-3g "dmesg --level=warn|fgrep -i error|fgrep -v ACPI"
upplysingaoflun:
verstand:
arivu:
[ 29.231430] md: super_written gets error=-5, uptodate=0
[ 29.231670] md: super_written gets error=-5, uptodate=0
[ 29.231869] md: super_written gets error=-5, uptodate=0
[ 29.232117] md: super_written gets error=-5, uptodate=0
perisikan:
qobi@upplysingaoflun>

(These are my four R815s. upplysingaflun is the file server that has not
been updated. The other three have.) Note that one machine reports no
"mpt2sas0: fault_state(0x265d)" errors, one machine reports one, and one
machine reports two. Note that the machine that dropped three components
of md0 during boot reported I/O errors on all 4 disks with the 4
components of md0. I don't believe that there really are faulty disks.
Whenever I observe any of the behavior reported in this email, it is
almost always associated with dmesg reporting the same error on the same
sector 2056 (sometimes 2058 or 2062). Given the dozens of attempted
reinstalls and reboots, at this point, I have seen this on almost all, if
not all, of the six disks on each of the four machines. I don't believe
that 24 disks all have the same bad sectors.

C. In step (3), sometimes, but not always, during the install, I get a screen
that says that some partition failed. If offers a menu of two options. I
select "retry". Sometimes, but not always, this causes md0 to drop
components in the installer, which I fix by going to ctrl-alt-f2 during
the install and doing mdadm --remove and mdadm --add.

D. In step (4), there appears to be nondeterminism in the serial numbers of
the disks that get reported in the menu of options of where to install
grub. Sometimes, the disks get reported as ata-*, sometimes as scsi-*.
Note that all of my disks are SATA so the ones reported as scsi-* are
clearly in error. If I do fresh installs multiple times on the same
machine, each time it reports different serial numbers for the disks.

E. In step (4), it appears that if I select the menu item "sdb", it reports
that it tries to install on "md0" and then gives a red error screen. At
that point, I go to ctrl-alt-f2 and observe that it has dropped many
components of md0, usually all but one.

F. In step (4), sometimes, but not always, I get warning screens about EFI.

G. In step (4), if I select "manual" and then type "sdb", it appears to work.
But sometimes, but not always, I get warning screens about EFI.

Note that there is a lot of nondeterministic behavior (all cases above where I
say "sometimes"). In all cases, I do exactly the same thing over and over to
the same machine and get different behavior.

Jeff (http://engineering.purdue.edu/~qobi)

Jeffrey Mark Siskind

unread,
Jun 27, 2016, 12:20:04 PM6/27/16
to
The non-determinism in which identifiers are shown might be a bug in the
installer, or it might be caused by failure of ID commands to the
drives.

I think most of the problems you're still having must be caused by a
bug in the RAID driver, mpt2sas (or its firmware, if that's not
embedded in the BIOS).

Thanks. Please let me know how I can report the potential bug(s) and what I
can do to help track them down.

Jeff (http://engineering.purdue.edu/~qobi)

deloptes

unread,
Jun 27, 2016, 2:20:04 PM6/27/16
to
Ben Hutchings wrote:

> I think most of the problems you're still having must be caused by a
> bug in the RAID driver, mpt2sas (or its firmware, if that's not
> embedded in the BIOS).

I had big issues with mptsas and 3.16 in jessie, so I am still using
3.2.0-4-rt-amd64

rootdelay=15 did not help in 3.16 but works in 3.2

Jeffrey Mark Siskind

unread,
Jun 27, 2016, 6:10:05 PM6/27/16
to
I had big issues with mptsas and 3.16 in jessie, so I am still using
3.2.0-4-rt-amd64

Will jessie run with 3.2.0-4-rt-amd64? If so, where do I get it and how do I
install it on a fresh jessie install that wasn't dist-upgraded from wheezy?

Jeff (http://engineering.purdue.edu/~qobi)

deloptes

unread,
Jun 27, 2016, 7:00:04 PM6/27/16
to
Yes I run it with that kernel since wheezy. You can get it from wheezy
https://packages.debian.org/wheezy/linux-image-3.2.0-4-rt-amd64

https://packages.debian.org/wheezy/linux-headers-3.2.0-4-rt-amd64
https://packages.debian.org/wheezy/linux-image-3.2.0-4-rt-amd64-dbg

Here is what I have and bit of background

# uname -a
Linux lisa 3.2.0-4-rt-amd64 #1 SMP PREEMPT RT Debian 3.2.68-1+deb7u4 x86_64
GNU/Linux

# cat /etc/debian_version
8.5

- Disk controller is mptsas here not mpt2sas as you posted - no idea what is
the difference.

- disabled systemd and using old init. When I upgraded (as this is
production machine I tested upfront as you did - found out I had a problem
with the kernel and did not have opportunity to work on the issue - I don't
have another with this configuration - I just kept the old working kernel).
I later tried to move to systemd as on other machines it worked, but I hit
an issue with the raid and missing root and told the kernel to use the old
init. I just hope debian would keep old initrc style longer and God bless
the wise people there.

# grep init /etc/default/grub
GRUB_CMDLINE_LINUX="ro rootdelay=15 splash quite init=/lib/sysvinit/init

I also do not understand why you should install and not upgrade as you
already narrowed the problem.
I never do a fresh install - much easier is bootstrap + copy (gzip or
whatever), besides this obsoletes the question how the kernel would go
there. Of course this applies only if you can boot from usb. But if you can
boot from usb, you can also "repair" things easier.

Jonathan Dowland

unread,
Jun 28, 2016, 6:50:05 AM6/28/16
to
On Wed, Jun 22, 2016 at 08:57:19PM +0200, Ramon Diaz-Uriarte wrote:
> Thanks, but it does not seem to work.

I'm sorry to hear that. I will have a go at reproducing this but it will take
me a little time to set up some VMs.
signature.asc

Ben Hutchings

unread,
Jun 28, 2016, 3:10:04 PM6/28/16
to
On Tue, 2016-06-28 at 00:49 +0200, deloptes wrote:
> Jeffrey Mark Siskind wrote:
>
> >    I had big issues with mptsas and 3.16 in jessie, so I am still using
> >    3.2.0-4-rt-amd64
> >
> > Will jessie run with 3.2.0-4-rt-amd64? If so, where do I get it and how do
> > I install it on a fresh jessie install that wasn't dist-upgraded from
> > wheezy?
> >
> >     Jeff (http://engineering.purdue.edu/~qobi)
>
> Yes I run it with that kernel since wheezy. You can get it from wheezy
> https://packages.debian.org/wheezy/linux-image-3.2.0-4-rt-amd64

Any particular reason why you use the -rt variant?

> https://packages.debian.org/wheezy/linux-headers-3.2.0-4-rt-amd64
> https://packages.debian.org/wheezy/linux-image-3.2.0-4-rt-amd64-dbg

The proper way is to add the wheezy-security suite to
/etc/apt/sources.list.  (All updates to wheezy now go to the wheezy-
security suite.)

> Here is what I have and bit of background
>
> # uname -a
> Linux lisa 3.2.0-4-rt-amd64 #1 SMP PREEMPT RT Debian 3.2.68-1+deb7u4 x86_64
> GNU/Linux
>
> # cat /etc/debian_version
> 8.5
>
> - Disk controller is mptsas here not mpt2sas as you posted - no idea what is
> the difference.
[...]

So far as I can see, mptsas is for SAS 1.0 (3 Gbps) controllers and
mpt2sas is for SAS 2.0 (6 Gbps) controllers.  They are two entirely
separate drivers, probably with different sets of bugs.

Ben.

--

Ben Hutchings
If at first you don't succeed, you're doing about average.
signature.asc

deloptes

unread,
Jun 28, 2016, 4:10:04 PM6/28/16
to
Ben Hutchings wrote:

> On Tue, 2016-06-28 at 00:49 +0200, deloptes wrote:
>> Jeffrey Mark Siskind wrote:
>>
>> > I had big issues with mptsas and 3.16 in jessie, so I am still using
>> > 3.2.0-4-rt-amd64
>> >
>> > Will jessie run with 3.2.0-4-rt-amd64? If so, where do I get it and how
>> > do I install it on a fresh jessie install that wasn't dist-upgraded
>> > from wheezy?
>> >
>> > Jeff (http://engineering.purdue.edu/~qobi)
>>
>> Yes I run it with that kernel since wheezy. You can get it from wheezy
>> https://packages.debian.org/wheezy/linux-image-3.2.0-4-rt-amd64
>
> Any particular reason why you use the -rt variant?
>

I wanted to benchmark some operations there, but as you heard its a bit
paint to bring it up after kernel upgrades, so I still did not have the
opportunity to do something and see if -rt really matters in my case.

>> https://packages.debian.org/wheezy/linux-headers-3.2.0-4-rt-amd64
>> https://packages.debian.org/wheezy/linux-image-3.2.0-4-rt-amd64-dbg
>
> The proper way is to add the wheezy-security suite to
> /etc/apt/sources.list.  (All updates to wheezy now go to the wheezy-
> security suite.)
>

>> Here is what I have and bit of background
>>
>> # uname -a
>> Linux lisa 3.2.0-4-rt-amd64 #1 SMP PREEMPT RT Debian 3.2.68-1+deb7u4
>> x86_64 GNU/Linux
>>
>> # cat /etc/debian_version
>> 8.5
>>
>> - Disk controller is mptsas here not mpt2sas as you posted - no idea what
>> is the difference.
> [...]
>
> So far as I can see, mptsas is for SAS 1.0 (3 Gbps) controllers and
> mpt2sas is for SAS 2.0 (6 Gbps) controllers.  They are two entirely
> separate drivers, probably with different sets of bugs.
>

Might be separate drivers but we have similar problems after upgrades from
wheezy to jessie.
I think OP might have one of my 2 cases (systemd vs init or kernel). of
course it could be anything.

I have 2 small Geod machines that I need to upgrade as well and I'll plan a
weekend for that. There for example no recent kernel runs - the latest
working is 2.6.26. It took me about a month of investigation - but its
doing what it should since 2007. I think the time spent has paid off
already.

Ben Hutchings

unread,
Jun 28, 2016, 4:10:04 PM6/28/16
to
Please test a more recent kernel version, like Linux 4.6
(available as linux-image-4.6.0-0.bpo.1-amd64 in jessie-backports).

Then use 'reportbug' to submit a bug report against that package if the
bug is still present, or the jessie package if it's fixed there, giving
a summary of the problems you've described.
signature.asc

Ramon Diaz-Uriarte

unread,
Jul 21, 2016, 7:10:05 AM7/21/16
to
OK, I guess it must not be a big deal for most people anyway. I am a
getting used to typing the password twice :-)
0 new messages