Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

GRUB really slow to boot

920 views
Skip to first unread message

Greg Wooledge

unread,
Dec 18, 2021, 11:10:05 AM12/18/21
to
Today I rebooted my machine for the first time in quite a while, after
the kernel update that was released along with Debian 11.2.

When it reached the GRUB screen, I pressed Enter, and nothing happened
as far as I could see. I was initially worried that it had stopped
seeing my USB keyboard (a thing that I've experienced with GRUB and
certain USB slots on certain machines in the past). This keyboard
plugged into this same USB slot had worked in previous versions of GRUB
on this machine, though.

The next thing I observed was that after 5 seconds, it still hadn't
booted, nor had the coundown ("will automatically boot in 5s" or whatever)
advanced. It appeared to be hung.

I waited a bit longer, and the 5s changed to 4s. It just took a really
long time (like 15+ seconds for each second on the timer).

Eventually, after a minute or two, the system booted. Everything is
working normally now, post-GRUB.

Has anyone experienced this, or does anyone have ideas about how to
prevent it happening again? I am not interested in trial and error
for this, because it's far too annoying and disruptive. But if there
are well-known ideas about things I could try (e.g. "grub 2.04 is known
to have bugs on Intel motherboards, revert to 2.03") then I'm game.

I Googled it, and the only hits I found were for people reporting slow
interactivity with GRUB on high-resolution displays. I don't think my
monitor is high resolution, and this has NEVER been a problem on ANY
previous boot, with this same computer and monitor. I have not changed
any hardware. Only software versions. (Of course, I can't rule out
hardware going bad.)

Here's the monitor, from xdpyinfo:

screen #0:
dimensions: 1920x1080 pixels (508x285 millimeters)
resolution: 96x96 dots per inch

Here's the other hardware:

unicorn:~$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:591f] (rev 05)
00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 630 [8086:5912] (rev 04)
00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af]
00:15.0 Signal processing controller [1180]: Intel Corporation 200 Series PCH Serial IO I2C Controller #0 [8086:a2e0]
00:15.1 Signal processing controller [1180]: Intel Corporation 200 Series PCH Serial IO I2C Controller #1 [8086:a2e1]
00:16.0 Communication controller [0780]: Intel Corporation 200 Series PCH CSME HECI #1 [8086:a2ba]
00:17.0 SATA controller [0106]: Intel Corporation 200 Series PCH SATA controller [AHCI mode] [8086:a282]
00:1c.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #5 [8086:a294] (rev f0)
00:1d.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #15 [8086:a29e] (rev f0)
00:1e.0 Signal processing controller [1180]: Intel Corporation 200 Series/Z370 Chipset Family Serial IO UART Controller #0 [8086:a2a7]
00:1f.0 ISA bridge [0601]: Intel Corporation 200 Series PCH LPC Controller (H270) [8086:a2c4]
00:1f.2 Memory controller [0580]: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller [8086:a2a1]
00:1f.3 Audio device [0403]: Intel Corporation 200 Series PCH HD Audio [8086:a2f0]
00:1f.4 SMBus [0c05]: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller [8086:a2a3]
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 10)
03:00.0 Network controller [0280]: Intel Corporation Dual Band Wireless-AC 3168NGW [Stone Peak] [8086:24fb] (rev 10)

Here's the GRUB versions:

unicorn:~$ dpkg -l grub\* | grep -v ^un
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=====================-============-============-=================================================================
ii grub-common 2.04-20 amd64 GRand Unified Bootloader (common files)
ii grub-efi-amd64 2.04-20 amd64 GRand Unified Bootloader, version 2 (EFI-AMD64 version)
ii grub-efi-amd64-bin 2.04-20 amd64 GRand Unified Bootloader, version 2 (EFI-AMD64 modules)
ii grub-efi-amd64-signed 1+2.04+20 amd64 GRand Unified Bootloader, version 2 (amd64 UEFI signed by Debian)
ii grub2-common 2.04-20 amd64 GRand Unified Bootloader (common files for version 2)

The last time I booted, when everything was normal:

reboot system boot 5.10.0-10-amd64 Sat Dec 18 06:17 still running
[...]
reboot system boot 5.10.0-9-amd64 Sat Oct 9 11:38 - 10:14 (69+23:36)

According to /var/log/dpkg.log.5.gz GRUB was updated to version 2.04-20
back in July, so the current version of GRUB was in place for both boots.
Which I guess makes this either an intermittent problem, or a failing
hardware problem, or it's caused by some package whose name doesn't
begin with "grub".

James Dutton

unread,
Dec 18, 2021, 5:30:04 PM12/18/21
to
Hi,

This is most likely a failing disk.
Please post the output of:
smartctl -a /dev/sda

or whatever your disk device name is, if not sda

Kind Regards

James

Greg Wooledge

unread,
Dec 18, 2021, 5:40:05 PM12/18/21
to
On Sat, Dec 18, 2021 at 10:23:54PM +0000, James Dutton wrote:
> Hi,
>
> This is most likely a failing disk.
> Please post the output of:
> smartctl -a /dev/sda
>
> or whatever your disk device name is, if not sda


smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-10-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Toshiba 3.5" DT01ACA... Desktop HDD
Device Model: TOSHIBA DT01ACA100
Serial Number: Y78SML4NS
LU WWN Device Id: 5 000039 fd3d8d58f
Firmware Version: MS2OA800
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Dec 18 17:38:18 2021 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 7313) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 122) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 099 016 Pre-fail Always - 0
2 Throughput_Performance 0x0027 142 100 054 Pre-fail Always - 71
3 Spin_Up_Time 0x0023 127 100 024 Pre-fail Always - 180 (Average 180)
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 46
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x002f 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 118 100 020 Pre-fail Offline - 33
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 34433
10 Spin_Retry_Count 0x0033 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 46
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
185 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 65535
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 067 059 000 Old_age Always - 33 (Min/Max 32/37)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 58
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 58
194 Temperature_Celsius 0x0022 181 150 000 Old_age Always - 33 (Min/Max 20/41)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Dan Ritter

unread,
Dec 18, 2021, 6:10:05 PM12/18/21
to
Greg Wooledge wrote:
> Today I rebooted my machine for the first time in quite a while, after
> the kernel update that was released along with Debian 11.2.

...

> Eventually, after a minute or two, the system booted. Everything is
> working normally now, post-GRUB.

I upgraded three machines to that kernel, today. Two of them
went smoothly. One of them took a ridiculously long time to
boot.

I suspect it is some interesting interaction specific to machine
quirks and this release and/or kernel.

-dsr-

James Dutton

unread,
Dec 18, 2021, 6:50:04 PM12/18/21
to
Disk looks OK to me.
Next, check no USB devices are connected while it boots.
Disable "quiet" boot mode, so you can see all the boot up messages.
This will give you an idea where it is going slow.

Greg Wooledge

unread,
Dec 18, 2021, 7:00:04 PM12/18/21
to
On Sat, Dec 18, 2021 at 11:42:23PM +0000, James Dutton wrote:
> Disk looks OK to me.
> Next, check no USB devices are connected while it boots.
> Disable "quiet" boot mode, so you can see all the boot up messages.
> This will give you an idea where it is going slow.

"quiet" is a kernel parameter. It's passed to the kernel. It does
nothing until the kernel is executed, as far as I understand things.

The symptoms I experienced were BEFORE the kernel was executed. During
GRUB itself. While sitting at the GRUB menu.

Once the kernel started running, everything was within normal expectations.

Felix Miata

unread,
Dec 18, 2021, 10:10:06 PM12/18/21
to
Greg Wooledge composed on 2021-12-18 17:39 (UTC-0500):

> === START OF INFORMATION SECTION ===
> Model Family: Toshiba 3.5" DT01ACA... Desktop HDD
> Device Model: TOSHIBA DT01ACA100

I have the same model in a little used SFF Dell GX745 test box:
https://paste.debian.net/1224017/
Similarities and differences are interesting. SMART Attributes Data Structure for
yours is longer. Serial numbers suggest mine is newer, but firmware suggests yours
is newer. I checked https://www.toshiba-storage.com/ trying to find updated
firmware, but can't even locate the model number there. Mine's running on an old
ICH8 SATA 2.0 controller, so it feels sluggish running @3.0 Gb/s. Mine's hours are
nominal, so if you're interested in acquiring another, make an offer. :)

Oh, and boot speed has been totally normal, patience trying compared to my NVME PCs.

Maybe it's worth a read of my experience with Grub(1) sloth:
https://www.linuxquestions.org/questions/linux-general-1/grub-legacy-delay-of-more-than-2-minutes-loading-initrd-from-ext4-filesystem-4175599620/
It could be Grub2 is capable of similar delays from a poor BIOS INT13* implementation.
--
Evolution as taught in public schools is, like religion,
based on faith, not based on science.

Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata

Tim Woodall

unread,
Dec 19, 2021, 2:30:04 AM12/19/21
to
The problem, as described, is during the grub countdown. It hasn't even
committed to booting a particular OS at this point, let alone a
particular kernel version.

I don't know how grub does its timings. Does the motherboard use a
coin cell battery backup RTC and the battery is completely dead leading
to bad calibration? But I'm making wild guesses with absolutely no
knowledge of how grub works under the hood.

Check if the kernel log jumps from 1/1/70 to today as it boots. That
would point to the RTC being bad when the kernel first starts.

Brian

unread,
Dec 19, 2021, 7:10:05 AM12/19/21
to
On Sat 18 Dec 2021 at 11:08:37 -0500, Greg Wooledge wrote:

> Today I rebooted my machine for the first time in quite a while, after
> the kernel update that was released along with Debian 11.2.
>
> When it reached the GRUB screen, I pressed Enter, and nothing happened
> as far as I could see. I was initially worried that it had stopped
> seeing my USB keyboard (a thing that I've experienced with GRUB and
> certain USB slots on certain machines in the past). This keyboard
> plugged into this same USB slot had worked in previous versions of GRUB
> on this machine, though.
>
> The next thing I observed was that after 5 seconds, it still hadn't
> booted, nor had the coundown ("will automatically boot in 5s" or whatever)
> advanced. It appeared to be hung.
>
> I waited a bit longer, and the 5s changed to 4s. It just took a really
> long time (like 15+ seconds for each second on the timer).
>
> Eventually, after a minute or two, the system booted. Everything is
> working normally now, post-GRUB.

Some of my machines are booted with

set timeout=5

menuentry 'Debian bullseye on 5740' {
linux /vmlinuz root=LABEL=5740 ro quiet
initrd /initrd.img
}

Perhaps you could try with this; maybe "root=/dev/sdaX" is more
convenient. Also test with /vmlinuz.old and /initrd.img.old.
Remove the first line to simplify the file.

--
Brian.

James Dutton

unread,
Dec 19, 2021, 8:00:04 AM12/19/21
to
On Sat, 18 Dec 2021 at 23:54, Greg Wooledge <gr...@wooledge.org> wrote:
> The symptoms I experienced were BEFORE the kernel was executed. During
> GRUB itself. While sitting at the GRUB menu.
>
> Once the kernel started running, everything was within normal expectations.
>
Sounds like a race condition or infinite loop in grub somewhere.
I have seen articles about it that describe it as a slow display in grub.
No solutions though. I suggest you take this up with the grub developers.
There might be a debug mode for grub, so that you can help track down
the problem for them.
One question, does it boot faster if you just press enter at the grub
menu, and don't wait for the counter?

Greg Wooledge

unread,
Dec 19, 2021, 9:20:04 AM12/19/21
to
On Sun, Dec 19, 2021 at 12:54:04PM +0000, James Dutton wrote:
> One question, does it boot faster if you just press enter at the grub
> menu, and don't wait for the counter?

On Sat, Dec 18, 2021 at 11:08:37AM -0500, Greg Wooledge wrote:
[...]
> When it reached the GRUB screen, I pressed Enter, and nothing happened
> as far as I could see. I was initially worried that it had stopped
> seeing my USB keyboard (a thing that I've experienced with GRUB and
> certain USB slots on certain machines in the past). This keyboard
> plugged into this same USB slot had worked in previous versions of GRUB
> on this machine, though.
>
> The next thing I observed was that after 5 seconds, it still hadn't
> booted, nor had the coundown ("will automatically boot in 5s" or whatever)
> advanced. It appeared to be hung.
>
> I waited a bit longer, and the 5s changed to 4s. It just took a really
> long time (like 15+ seconds for each second on the timer).
>
> Eventually, after a minute or two, the system booted. Everything is
> working normally now, post-GRUB.
[...]

Curt

unread,
Dec 19, 2021, 9:20:04 AM12/19/21
to
Did we see /etc/default/grub? Could this resolution bug lead to a
resolution via a new resolution for the New Year?

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/480159

11 years old, but still extant.

Greg Wooledge

unread,
Dec 19, 2021, 9:30:05 AM12/19/21
to
On Sun, Dec 19, 2021 at 02:17:17PM -0000, Curt wrote:
> Did we see /etc/default/grub? Could this resolution bug lead to a
> resolution via a new resolution for the New Year?
>
> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/480159
>
> 11 years old, but still extant.

I've used GRUB interactively on this machine with this monitor before,
without running into that issue. I mentioned that in my original post.

This is the first time I've ever experienced this symptom on this machine,
which is what really has me confused.

Here's my /etc/default/grub:

# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
# info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
#GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX=""

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

Greg Wooledge

unread,
Dec 19, 2021, 9:30:05 AM12/19/21
to
On Sun, Dec 19, 2021 at 07:19:40AM +0000, Tim Woodall wrote:
> Check if the kernel log jumps from 1/1/70 to today as it boots. That
> would point to the RTC being bad when the kernel first starts.

Not sure which log I'd need to look at for this information. dmesg only
reports time in relative seconds from the kernel's boot.

/var/log/kern.log.1 shows this:

Dec 18 09:58:48 unicorn kernel: [6031224.812397] xor: automatically using best c
hecksumming function avx
Dec 18 09:58:48 unicorn kernel: [6031224.944868] Btrfs loaded, crc32c=crc32c-int
el
Dec 18 11:17:28 unicorn kernel: [ 0.000000] microcode: microcode updated earl
y to revision 0xea, date = 2021-01-05
Dec 18 11:17:28 unicorn kernel: [ 0.000000] Linux version 5.10.0-10-amd64 (de
bian-...@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld
(GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.84-1 (2021-12-08)

Of course, the human-readable timestamps on the left come from syslog,
which is a userspace process that is launched after the clock is
initialized from whatever sources (not sure exactly when NTP kicks in).

The one-hour jump is, I think, an artifact of daylight saving time
ending between my last two boots. The 09:58 lines are from the syslog
which was initialized while I was on daylight saving time (in October),
and the 11:17 lines are from the syslog that was launched on standard
time in December. I see similar time shenanigans in the output of last:

greg pts/5 :pts/0:S.1 Sat Dec 18 10:18 still logged in
greg pts/4 :pts/0:S.0 Sat Dec 18 10:18 still logged in
greg tty1 Sat Dec 18 11:17 still logged in
reboot system boot 5.10.0-10-amd64 Sat Dec 18 06:17 still running
tester tty2 Mon Nov 22 11:18 - 11:18 (00:00)
greg pts/12 :pts/2:S.8 Sat Oct 9 15:39 - 10:14 (69+19:35)

The "reboot" line shows 6:17 which might be accurate UTC. The line
immediately above that is 11:17 (5-hour offset for US/Eastern), and
the line above that is 10:18 (a weird one-hour jump backward which I
can only guess is somehow related to daylight saving time).

James Dutton

unread,
Dec 19, 2021, 9:40:05 AM12/19/21
to
Looks like the fix is this:
# If you need to disable
# gfxpayload=keep on your system, just add this line (uncommented) to
# /etc/default/grub:
#
# GRUB_GFXPAYLOAD_LINUX=text

So, try just adding the above, then run "update-grub" to activate the change.
The problem seems to be some GPU cards have faulty UEFI graphics, and
switching grub to "text" mode works around the problem.

There is even a set of already blacklisted GPUs in this file:
/boot/grub/gfxblacklist.txt

Darac Marjal

unread,
Dec 19, 2021, 9:50:05 AM12/19/21
to

On 18/12/2021 16:08, Greg Wooledge wrote:
> Today I rebooted my machine for the first time in quite a while, after
> the kernel update that was released along with Debian 11.2.
>
> When it reached the GRUB screen, I pressed Enter, and nothing happened
> as far as I could see. I was initially worried that it had stopped
> seeing my USB keyboard (a thing that I've experienced with GRUB and
> certain USB slots on certain machines in the past). This keyboard
> plugged into this same USB slot had worked in previous versions of GRUB
> on this machine, though.
>
> The next thing I observed was that after 5 seconds, it still hadn't
> booted, nor had the coundown ("will automatically boot in 5s" or whatever)
> advanced. It appeared to be hung.
>
> I waited a bit longer, and the 5s changed to 4s. It just took a really
> long time (like 15+ seconds for each second on the timer).
>
> Eventually, after a minute or two, the system booted. Everything is
> working normally now, post-GRUB.
>
> Has anyone experienced this, or does anyone have ideas about how to
> prevent it happening again? I am not interested in trial and error
> for this, because it's far too annoying and disruptive. But if there
> are well-known ideas about things I could try (e.g. "grub 2.04 is known
> to have bugs on Intel motherboards, revert to 2.03") then I'm game.

Not a definitive answer here, but to me, this sounds like the sort of
behaviour a program would have when having to process lots of
interrupts. You say that pressing Enter does nothing and that the
countdown happens really slowly. Imagine you had a stuck key - something
which was repeatedly sending keypresses to GRUB, but which weren't
triggering the "cancel timer" branch. Something like CTRL or Shift,
maybe. Or an ACPI key etc.

I see that you say you're not interested in trial-and-error and I can
understand that. If you can, try using a different keyboard. Or just
unplug the keyboard entirely (You may need to configure your BIOS to
allow booting without the keyboard or just allow the BIOS enough time to
see the keyboard and THEN unplug it before GRUB sees it).

https://sources.debian.org/src/grub2/2.04-20/grub-core/normal/menu.c/#L601
seems to be the loop of code that GRUB executes while waiting for a key.
I can see some functions there that, if not written carefully, COULD
take some time to return.

OpenPGP_signature

Tim Woodall

unread,
Dec 19, 2021, 10:10:05 AM12/19/21
to
On Sun, 19 Dec 2021, Greg Wooledge wrote:

> On Sun, Dec 19, 2021 at 07:19:40AM +0000, Tim Woodall wrote:
>> Check if the kernel log jumps from 1/1/70 to today as it boots. That
>> would point to the RTC being bad when the kernel first starts.
>
> Not sure which log I'd need to look at for this information. dmesg only
> reports time in relative seconds from the kernel's boot.

This is the sort of thing I meant (this is a rpi which doesn't have an
RTC at all so this is expected) - I've just rebooted to show this:

This is from daemon.log but any log should do.


Dec 19 14:52:42 rpi4-minimal haveged: haveged: Stopping due to signal 15
Jan 1 00:00:23 rpi4-minimal haveged: haveged starting up
...
Jan 1 00:00:23 rpi4-minimal ntpd[1522]: Listening on routing socket on fd #24 for interface updates
Jan 1 00:00:23 rpi4-minimal ntpd[1522]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Jan 1 00:00:23 rpi4-minimal ntpd[1522]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Jan 1 01:00:27 rpi4-minimal dbus-daemon[1664]: [session uid=103 pid=1659] Activating service name='org.xfce.Xfconf' requested by ':1.1' (uid=103 pid=1645 comm="x-window-manager ")
Jan 1 01:00:27 rpi4-minimal dbus-daemon[1664]: [session uid=103 pid=1659] Successfully activated service 'org.xfce.Xfconf'
Dec 19 14:53:40 rpi4-minimal dhclient[1176]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6

The clock corrects itself when ntp starts. I'm not exactly clear why the
clock jumps by an hour first but I don't really care...

>
> /var/log/kern.log.1 shows this:
>
> Dec 18 09:58:48 unicorn kernel: [6031224.812397] xor: automatically using best c
> hecksumming function avx
> Dec 18 09:58:48 unicorn kernel: [6031224.944868] Btrfs loaded, crc32c=crc32c-int
> el
> Dec 18 11:17:28 unicorn kernel: [ 0.000000] microcode: microcode updated earl
> y to revision 0xea, date = 2021-01-05
> Dec 18 11:17:28 unicorn kernel: [ 0.000000] Linux version 5.10.0-10-amd64 (de
> bian-...@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld
> (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.84-1 (2021-12-08)
>
> Of course, the human-readable timestamps on the left come from syslog,
> which is a userspace process that is launched after the clock is
> initialized from whatever sources (not sure exactly when NTP kicks in).

Looks to me at first glance that the clock was right - so scratch my
guess that it could be a dead battery on the motherboard.

Tim.

David Wright

unread,
Jan 2, 2022, 9:50:05 PM1/2/22
to
On Sat 18 Dec 2021 at 11:08:37 (-0500), Greg Wooledge wrote:
> Today I rebooted my machine for the first time in quite a while, after
> the kernel update that was released along with Debian 11.2.

Mine's a new installation. I've run buster from an external drive
for a while, but have recently installed bullseye on its SSD.

> When it reached the GRUB screen, I pressed Enter, and nothing happened
> as far as I could see. I was initially worried that it had stopped
> seeing my USB keyboard (a thing that I've experienced with GRUB and
> certain USB slots on certain machines in the past). This keyboard
> plugged into this same USB slot had worked in previous versions of GRUB
> on this machine, though.

Mine's a laptop: HP Spectre x360 Convertable 15-bl012dx.

> The next thing I observed was that after 5 seconds, it still hadn't
> booted, nor had the coundown ("will automatically boot in 5s" or whatever)
> advanced. It appeared to be hung.

Snap. It's happened maybe three or four times (one gets no record,
of course.)

> I waited a bit longer, and the 5s changed to 4s. It just took a really
> long time (like 15+ seconds for each second on the timer).

I'm afraid I just assumed it was permanently hung when Enter did
nothing, so I just force-powered off and started over.

> Eventually, after a minute or two, the system booted. Everything is
> working normally now, post-GRUB.
>
> Has anyone experienced this, or does anyone have ideas about how to
> prevent it happening again? I am not interested in trial and error
> for this, because it's far too annoying and disruptive. But if there
> are well-known ideas about things I could try (e.g. "grub 2.04 is known
> to have bugs on Intel motherboards, revert to 2.03") then I'm game.

I haven't really looked. (I've been sort of off the grid over Christmas.)

> I Googled it, and the only hits I found were for people reporting slow
> interactivity with GRUB on high-resolution displays. I don't think my
> monitor is high resolution, and this has NEVER been a problem on ANY
> previous boot, with this same computer and monitor. I have not changed
> any hardware. Only software versions. (Of course, I can't rule out
> hardware going bad.)

This laptop does have a very high resolution: 3840x2160, which
means using a magnifying glass. I started by typing
setfont /usr/share/consolefonts/Lat15-TerminusBold32x16.psf.gz
blind, then sticking Xscale="0.5" Yscale="0.5" in .xsession,
but lastly, after editing the Grub screen to add video=960x540
and finding that that fixed everything, I just added
GRUB_CMDLINE_LINUX="video=960x540"
to /etc/default/grub, and removed the scaling.

When it first stalled, I had just installed bullseye, and I only
had /boot/efi/EFI/debian, so I booted the installer's rescue,
and reinstalled Grub with the removable device path (whatever
that means), which wrote /boot/efi/EFI/BOOT.

However, stalling has reoccurred just a couple of times since then.
I just assumed the EFI might be slightly flaky. The laptop has a
few faults (which is why I've inherited it), like poor socketry
all round (HDMI, USB3, USB-C x 2), non-functional trackpad "buttons",
and somewhat unreliable keys.

Cheers,
David.

David Wright

unread,
Jan 20, 2022, 11:30:06 AM1/20/22
to
On Sun 02 Jan 2022 at 20:49:12 (-0600), David Wright wrote:
> On Sat 18 Dec 2021 at 11:08:37 (-0500), Greg Wooledge wrote:
> > Today I rebooted my machine for the first time in quite a while, after
> > the kernel update that was released along with Debian 11.2.
>
> Mine's a new installation. I've run buster from an external drive
> for a while, but have recently installed bullseye on its SSD.
>
> > When it reached the GRUB screen, I pressed Enter, and nothing happened
> > as far as I could see. I was initially worried that it had stopped
> > seeing my USB keyboard (a thing that I've experienced with GRUB and
> > certain USB slots on certain machines in the past). This keyboard
> > plugged into this same USB slot had worked in previous versions of GRUB
> > on this machine, though.
>
> Mine's a laptop: HP Spectre x360 Convertable 15-bl012dx.
>
> > The next thing I observed was that after 5 seconds, it still hadn't
> > booted, nor had the coundown ("will automatically boot in 5s" or whatever)
> > advanced. It appeared to be hung.
>
> Snap. It's happened maybe three or four times (one gets no record,
> of course.)
>
> > I waited a bit longer, and the 5s changed to 4s. It just took a really
> > long time (like 15+ seconds for each second on the timer).
>
> I'm afraid I just assumed it was permanently hung when Enter did
> nothing, so I just force-powered off and started over.

My assumption was correct: it sits on 5s for half an hour,
with the fan running fast. It's also more consistent now, so
the only way to boot it successfully is to press Escape F9
immediately at power-on, and then choose anything in the menu
except "Windows Boot Manager".

BTW I have no idea what this last item is: there is no Windows
on the SSD, but presumably something in the UEFI is hanging on
to some other thing. Selecting it goes to a blue screen that says
it's resetting the boot something, but then Grub pops up and
hangs as usual.

Darac's suggestion was being swamped with interrupts. Mine would be
the opposite: with an apparently dead keyboard and no timer countdown,
perhaps their interrupts aren't being generated or serviced.

FYI

# ls -GlgR /boot/efi/
/boot/efi/:
total 4
drwx------ 4 4096 Dec 15 11:10 EFI

/boot/efi/EFI:
total 8
drwx------ 2 4096 Dec 15 11:10 BOOT
drwx------ 2 4096 Dec 12 20:17 debian

/boot/efi/EFI/BOOT:
total 2636
-rwx------ 1 934240 Jan 8 13:24 BOOTX64.EFI
-rwx------ 1 84648 Jan 8 13:24 fbx64.efi
-rwx------ 1 1672576 Jan 8 13:24 grubx64.efi

/boot/efi/EFI/debian:
total 3472
-rwx------ 1 108 Jan 8 23:57 BOOTX64.CSV
-rwx------ 1 84648 Jan 8 23:57 fbx64.efi
-rwx------ 1 117 Jan 8 23:57 grub.cfg
-rwx------ 1 1672576 Jan 8 23:57 grubx64.efi
-rwx------ 1 845480 Jan 8 23:57 mmx64.efi
-rwx------ 1 934240 Jan 8 23:57 shimx64.efi
#

Menu:

OS Boot Manager (UEFI) - debian (<Disk Model as given by partitioners>)
OS Boot Manager (UEFI) - Windows Boot Manager (<ditto>)
Boot From EFI File

When bought, with W10, the middle entry was the sole entry.

Cheers,
David.
0 new messages