coreos boot into emergency shell

918 views
Skip to first unread message

kun...@shenzhou-ucar.com

unread,
Apr 26, 2017, 3:26:51 AM4/26/17
to CoreOS User
Recently, My coreos server rebooted abnormally with "BUG: unable to handle kernel NULL pointer dereference at 0000000000000024 balabala" in the pstore files, then it booted into Emergency mode.

From journal, it rebooted at Mar 25 00:02:40:

00:02:40 localhost systemd-journald[378]: Runtime journal (/run/log/journal/) is 8.0M, max 4.0G, 3.9G free.  
00:04:28 systemd[1]: Started Emergency Shell.
00:04:28 systemd[1]: Reached target Emergency Mode.
00:06:37 systemd[1]: Reached target Remote File Systems.
00:06:38 systemd[1]: Startup finished in 8.096s (kernel) + 17.660s (initrd) + 3min 40.917s (userspace) = 4min 6.673s.

Then stopped here, until 07:58:04, I logged to this server through console, and pressed an Enter, server resumed booting:

07:58:04 systemd[1]: Stopping LVM2 PV scan on device 8:96...
07:58:04 systemd[1]: Stopping LVM2 PV scan on device 8:112...
07:58:04 systemd[1]: Stopping LVM2 PV scan on device 8:48...
07:58:04 systemd[1]: Starting Clean up broken links in /etc/ssl/certs...
07:58:04 systemd[1]: Starting Create missing system files...
07:58:04 systemd[1]: Starting Activation of LVM2 logical volumes...
07:58:04 systemd[1]: Stopping LVM2 PV scan on device 8:32...
07:58:04 systemd[1]: Stopped target Emergency Mode.
07:58:04 systemd[1]: Stopping Emergency Shell...
07:58:04 systemd[1]: Stopping LVM2 PV scan on device 8:16...
07:58:04 systemd[1]: Stopping LVM2 PV scan on device 8:80...
07:58:04 systemd[1]: Stopping LVM2 PV scan on device 8:64...
07:58:04 systemd[1]: Stopped Emergency Shell.
...

Then I rebooted the server again, it entered the emergency mode again, human intervention is needed for the normal booting.

Container Linux Version

NAME=CoreOS
ID=coreos
VERSION=1185.5.0
VERSION_ID=1185.5.0
BUILD_ID=2016-12-07-0937
PRETTY_NAME="CoreOS 1185.5.0 (MoreOS)"
ANSI_COLOR="1;32"

Environment
baremental with Dell PowerEdge R730

Anyone know what's wrong with it ? How can I boot the server smoothly ?

Alex Crawford

unread,
Apr 26, 2017, 6:51:05 PM4/26/17
to kun...@shenzhou-ucar.com, CoreOS User
On 04/26, kun...@shenzhou-ucar.com wrote:
> Recently, My coreos server rebooted abnormally with "BUG: unable to handle
> kernel NULL pointer dereference at 0000000000000024 balabala" in the pstore
> files, then it booted into Emergency mode.

Do you have the full stack trace from the kernel panic?

> From journal, it rebooted at Mar 25 00:02:40:
>
> 00:02:40 localhost systemd-journald[378]: Runtime journal
> (/run/log/journal/) is 8.0M, max 4.0G, 3.9G free.
> 00:04:28 systemd[1]: Started Emergency Shell.
> 00:04:28 systemd[1]: Reached target Emergency Mode.
> 00:06:37 systemd[1]: Reached target Remote File Systems.
> 00:06:38 systemd[1]: Startup finished in 8.096s (kernel) + 17.660s (initrd)
> + 3min 40.917s (userspace) = 4min 6.673s.
>
> Then stopped here, until 07:58:04, I logged to this server through console,
> and pressed an Enter, server resumed booting:

Can you attach the full boot logs? `journalctl -b --no-pager`

-Alex
signature.asc

kun...@shenzhou-ucar.com

unread,
Apr 27, 2017, 4:39:55 AM4/27/17
to CoreOS User, kun...@shenzhou-ucar.com
Thank you Alex. The stack trace at  time 00:02 wasn't saved in pstore ,  I attached trace file at another reboot time, 08:47, and also the full boot logs at time of ‘00:02', '08:31' and '08:49'
logs.tgz

Alex Crawford

unread,
Apr 27, 2017, 5:57:22 PM4/27/17
to kun...@shenzhou-ucar.com, CoreOS User
Ignition is failing to start. It's failing to parse the given Ignition
Config:

invalid character '\x1f' looking for beginning of value

Since you are manually continuing the boot, Ignition never marks itself
as completed and so it runs on the next reboot. You can use the online
validator [1] to validate your Ignition Config.

-Alex

[1]: https://coreos.com/validate
signature.asc

kun...@shenzhou-ucar.com

unread,
Apr 27, 2017, 9:49:10 PM4/27/17
to CoreOS User, kun...@shenzhou-ucar.com
On Friday, April 28, 2017 at 5:57:22 AM UTC+8, Alex Crawford wrote:
Ignition is failing to start. It's failing to parse the given Ignition
Config:


Thanks. But I don't use ignition to boot the system, and also validate my cloud-config file with online validator, it's ok.
Where can I find the config about ignition on the system ? 

Alex Crawford

unread,
May 2, 2017, 12:34:56 AM5/2/17
to kun...@shenzhou-ucar.com, CoreOS User
On 04/27, kun...@shenzhou-ucar.com wrote:
> Thanks. But I don't use ignition to boot the system, and also validate my
> cloud-config file with online validator, it's ok.
> Where can I find the config about ignition on the system ?

If you are using bare-metal, it's using the config passed via the
"coreos.config.url" kernel parameter (you can read /proc/cmdline to see
the parameters). Ignition uses a different parameter name from
cloud-config. If you are truely using a cloud-config (via
"cloud-config-url"), Ignition shouldn't run.

Can you show the contents of /proc/cmdline?

-Alex
signature.asc

kun...@shenzhou-ucar.com

unread,
May 2, 2017, 2:09:21 AM5/2/17
to CoreOS User, kun...@shenzhou-ucar.com
Why can’t I find anything about ignition in the log files ?

Wasn't the logs like 'dev-datavg01-lvol1.device: Job dev-datavg01-lvol1.device/start failed with result 'timeout'' the culprit ?


Alex Crawford

unread,
May 2, 2017, 3:22:17 AM5/2/17
to kun...@shenzhou-ucar.com, CoreOS User
On 05/01, kun...@shenzhou-ucar.com wrote:
> Why can’t I find anything about ignition in the log files ?

You should be able to. I saw a few entries from Ignition when I read
through it. Try using `journalctl -b -t ignition`.

> Wasn't the logs like 'dev-datavg01-lvol1.device: Job
> dev-datavg01-lvol1.device/start failed with result 'timeout'' the culprit ?

That doesn't look good either, but it shouldn't be causing the emergency
shell to be invoked.

-Alex
signature.asc

kun...@shenzhou-ucar.com

unread,
May 4, 2017, 2:14:45 AM5/4/17
to CoreOS User, kun...@shenzhou-ucar.com


On Tuesday, May 2, 2017 at 12:34:56 PM UTC+8, Alex Crawford wrote

Can you show the contents of /proc/cmdline?


core@ad10 ~/files $ cat /proc/cmdline 
rootflags=rw mount.usrflags=ro BOOT_IMAGE=/coreos/vmlinuz-b mount.usr=PARTUUID=e03dd35c-7c2d-4a47-b3fe-27f15780a57c rootflags=rw mount.usrflags=ro consoleblank=0 root=LABEL=ROOT console=ttyS0,115200n8 console=tty0 verity.usrhash=23e97170d442ca4e7c190734c1e8599305384d3e907c1840650aa8b1ab4e7607 

I‘m not using ignition to boot the system, and doesn't setup matchbox either. also I have user_data under /var/lib/coreos-install directory and can't find anything about ignition from logs. 'journalctl -b -t ignition' output nothing, while 'journalctl -b -t coreos-cloudinit' shows a lot.

Alex Crawford

unread,
May 4, 2017, 5:06:16 AM5/4/17
to kun...@shenzhou-ucar.com, CoreOS User
On 05/03, kun...@shenzhou-ucar.com wrote:
> I‘m not using ignition to boot the system, and doesn't setup matchbox
> either. also I have user_data under /var/lib/coreos-install directory and
> can't find anything about ignition from logs. 'journalctl -b -t ignition'
> output nothing, while 'journalctl -b -t coreos-cloudinit' shows a lot.

Oh jeez. I'm sorry, I mixed up the logs I was reading. As you mentioned,
there is no mention of Ignition in the logs. Let's start over.

I looked through the three sets of logs and it looks like in
reboot000240 and reboot083158, the boot paused as you described (based
on the timestamps). The last one, reboot084943, looks like it booted
without pause though. Can you confirm that?

It looks like the emergency shell is being invoked because
export-gluster-sdb4.mount is failing. I assume this is a dependency of
multi-user.target. Is this using Wants= or Requires=?

I also just noticed you filed a bug on GitHub
(https://github.com/coreos/bugs/issues/1919). Sorry that went
unanswered. Let's go ahead and move this conversation over there since
it will be a bit more visible to others who may have this problem.

-Alex
signature.asc
Reply all
Reply to author
Forward
0 new messages