Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1036019: debian-installer: Broken X display with QEMU under UEFI with cirrus and std graphics

86 views
Skip to first unread message

Cyril Brulebois

unread,
May 13, 2023, 4:31:32 AM5/13/23
to
Package: debian-installer
Version: 20230427
Severity: important
X-Debbugs-Cc: debia...@lists.debian.org, debian...@lists.debian.org, debi...@lists.debian.org

Hi everyone,

I'm reaching out to all the aforementioned teams because I know nothing
about UEFI, kernel-side DRM modules, or X drivers, and I'd like to get
some feedback here.

If you need a TL;DR, you can skip to “Proposal plan for d-i”, which is
about my plans for the very next few hours, unless someone tells me the
proposal is crazy, unsafe, etc.


Backstory
=========

Since we've been hitting and/or (re)discovering UEFI-specific issues
lately (#1033913), I decided to spend some time extending my usual
tests, traditionally run under QEMU with default settings, therefore
booted under BIOS, to also run them under UEFI (meaning also testing
Secure Boot without having to switch to baremetal).

I've been kindly pointed by regular image testers to the following page:
https://wiki.debian.org/SecureBoot/VirtualMachine

But I was a little shocked to discover a broken X display when booting
under UEFI! It seems I'm not the only one since that page has the
following, even if there are no references to any bug reports:

-vga virtio - The Debian installer seems to have difficulties
working with the standard VGA driver (and virtio
should anyway have better performance)

The test setup is described at the very end of this report, with my
current test target being specifically netboot/gtk/mini.iso for amd64.


Kernel-side
===========

The fb-modules udeb hasn't changed much since 4+ years, with some DRM
modules getting added alongside existing ones, leading to the following
contents in Bullseye (5.10.178-3):

./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/drm_kms_helper.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/drm.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/virtio/virtio-gpu.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/media/cec/core/cec.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/video/fbdev/vga16fb.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/video/vgastate.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/virtio/virtio_dma_buf.ko

and the following contents in Bookworm (6.1.27-1):

./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm_kms_helper.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm_shmem_helper.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/virtio/virtio-gpu.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/video/fbdev/vga16fb.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/video/vgastate.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/virtio/virtio_dma_buf.ko

Those contents are defined via those files in linux.git:

kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat debian/installer/modules/amd64/fb-modules
#include <fb-modules>

vesafb ?
vga16fb

kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat debian/installer/modules/fb-modules
# We don't include all DRM drivers here as on many platforms we can
# call system firmware to get hold of a simple framebuffer

drm
drm_kms_helper
virtio-gpu ?


X-side
======

Now, we know that the contents of xserver-xorg-core-udeb have changed a
little between Bullseye and Bookworm (#1035014), but that doesn't seem
to be a factor here.

I've tested 3 netboot/gtk/mini.iso to assess the situation:

- mini-20210731+deb11u8.iso from Bullseye 11.7
- mini-20230427.iso from D-I Bookworm RC 2
- mini-daily.iso from D-I daily builds (downloaded today)

If people want to replicate those tests, they're available at:
https://people.debian.org/~kibi/bug-drm-vs-uefi/

Or:

wget https://deb.debian.org/debian/dists/bullseye/main/installer-amd64/20210731+deb11u8/images/netboot/gtk/mini.iso -O mini-20210731+deb11u8.iso
wget https://deb.debian.org/debian/dists/bookworm/main/installer-amd64/20230427/images/netboot/gtk/mini.iso -O mini-20230427.iso
wget https://d-i.debian.org/daily-images/amd64/daily/netboot/gtk/mini.iso -O mini-daily.iso


Via QEMU, under BIOS and UEFI, results are:

+-------------+-----------------+-----------------+-----------------+
| Graphics | Bullseye 11.7 | Bookworm RC 2 | Daily builds |
+-------------+--------+--------+--------+--------+--------+--------+
| | BIOS | UEFI | BIOS | UEFI | BIOS | UEFI |
+-------------+--------+--------+--------+--------+--------+--------+
| | OK | OK | OK | KO-G | OK | KO-G |
| -vga std | OK | OK | OK | KO-G | OK | KO-G |
| -vga cirrus | OK | OK | OK | KO-S | OK | KO-S |
| -vga qxl | OK | OK | OK | OK | OK | OK |
| -vga virtio | OK | OK | OK | OK | OK | OK |
| -vga vmware | OK | OK | OK | OK | OK | OK |
+-------------+--------+--------+--------+--------+--------+--------+

Here, we see that the RC 2, that had xserver-xorg-code-udeb without
modesetting_drv.so (#1035014) is actually performing exactly as the
daily builds, where it's been added back.

In the table:
- no options and -vga std grouped together since that seemed to be the
default, confirmed by identical test results; then other -vga sorted
alphabetically.
- KO-G is for garbled: https://people.debian.org/~kibi/bug-drm-vs-uefi/screenshot-std-garbled.png
- KO-S is for split: https://people.debian.org/~kibi/bug-drm-vs-uefi/screenshot-cirrus-split.png

X seems to work in both the garbled case and in the split case (bottom
is the rest of the GRUB prompt, top is the actual GTK window), and one
can navigate the menus using arrows, and also type “fre” to get to the
French entry. I didn't go through a single full install though (even if
that'd be definitely doable, a manual speedrun isn't unheard of…).

I didn't try to extract any logs, but I can definitely do that for
further investigation. My first instinct, as it happens quite a lot, was
wondering whether we could be missing modules on the kernel side, that's
why I started this report by listing the contents of the fb-modules udebs.

Now, there are dedicated DRM modules for various hardware, including…
bochs and cirrus, so I've tried including them in a mini.iso, which can
also be found in the same directory:
https://people.debian.org/~kibi/bug-drm-vs-uefi/mini-hackhackhack.iso

Nasty code:
https://salsa.debian.org/installer-team/debian-installer/-/commit/9fceca63273d0b501ea64d7b719acafc93a5b7fa

(Only tested with a manual netboot-gtk build on amd64.)

Instead of “going the correct way” (meaning patching linux.git, then
rebuilding linux-signed-amd64 to get an updated fb-modules udeb), I've
investigated a nasty but apparently effective approach that could be
used *if* we wanted to add those modules in RC 3. It's very nasty but
doesn't depend on a new round of linux upload, lengthy builds (looking
at you, mips*), manual steps for signing etc. And *if* we want to try
that approach, I'd very much prefer doing that in RC 3, and either
profit, or revert in RC 4… instead of only trying in RC 4, possibly
breaking the graphical installer right when entering the “nobody move!”
stage of the freeze.

Note that I've “resolved” the module dependencies manually, and also
included vboxvideo.ko along the way, which has the same dependencies.
We've had some (unfortunately vague) reports from VirtualBox users,
maybe they're hitting the same kind of issues… But at this point, this
is really a shot in the dark (no pun intended — at least initially).

At least for a friend of mine who was nice enough to run a few tests
under VirtualBox, d-i seems to work fine, with or without the hack, on
both Windows and Mac Intel hosts, so it doesn't appear to regress
obviously…


Questions
=========

- Is it really to be expected that X and standard drivers would regress
this way when moving from Bullseye to Bookworm?
- Or is it expected to require specific kernel modules while that wasn't
the case before? I've discovered this in VM environments, but maybe
similar things could be happening on bare metal as well, and maybe
some more modules should be considered for inclusion?
- Is it acceptable to just bundle bochs, cirrus, and vboxvideo for the
time being (i.e. RC 3, RC 4, 12.0.0), be it via the nasty approach
or via a proper linux fb-modules inclusion?
- Or does shipping those few modules risk breaking the kernel and/or X
on other platforms? (I'd definitely hope not!)
- Should I extract some dmesg/X logs from the KO-G/KO-S cases, so that
one has a chance of understanding what's happening? Since it's likely
to be a little annoying, I'd be happy to take a full list of cells in
the big matrix for which it would make sense to have logs. Another
reason why I haven't started there is that I don't expect us to find
it reasonable to hotpatch the X server at this very late stage of the
freeze, if that was deemed to be a problem in X. Adding some specific
kernel modules seems much more targeted and way less risky… (even if
that might just be a workaround and not a long-term fix).


Wild guess
==========

One obvious difference between BIOS and UEFI booting is the bootloader,
ISOLINUX vs. GRUB. It might be that the latter leaves the graphics stack
in a particular state that no longer pleases the default things in
the kernel and/or X, while that wasn't an issue in Bullseye?


Proposal plan for d-i (Bookworm RC 3, RC 4, and 12.0.0)
=====================

Unless I received strong negative feedback before Monday (May 15th),
I plan on including the nasty approach in RC 3, and to revert it
altogether in RC 4 if big bad regressions are reported:
https://salsa.debian.org/installer-team/debian-installer/-/commit/9fceca63273d0b501ea64d7b719acafc93a5b7fa

As a side note, keeping the bundling in src:debian-installer for the
next few weeks makes us autonomous: we can enable and disable those
extra modules without requiring a new linux upload… so it's nasty but I
actually thought about the few advantages we were getting out of this!

We should also be OK legal-wise, given we already have linux in
Built-Using via its udebs, so copying things around from linux-image
wouldn't change anything there, would it?

Of course in the long run, if having those modules is desired, it will
be better to have them merged in linux and to drop the nasty code, e.g.
in a point release.


Test reproducibility
====================

All tests were performed on an amd64 Bullseye host, with a Bullseye set
of qemu packages. I've installed ovmf from Bookworm though, as enabling
UEFI support was preventing me from being able to boot directly from the
ISO, and would mean going through the UEFI menu to select the boot disk
every single time. The big matrix above was built with that Bookworm
ovmf package, and unless someone insists I should redo all the tests
with the Bullseye one, I don't plan on spending time on this.

- BIOS:

kvm -m 1G -cdrom mini-<TEST>.iso [-vga <GRAPHICS>]

- UEFI:

cp /usr/share/OVMF/OVMF_CODE_4M.ms.fd /tmp/code.fd
cp /usr/share/OVMF/OVMF_VARS_4M.ms.fd /tmp/vars.fd
kvm -m 1G -machine q35,smm=on -pflash /tmp/code.fd -pflash /tmp/vars.fd -cdrom mini-<TEST>.iso [-vga <GRAPHICS>]

(q35,smm=on satisfies Secure Boot related hardware requirements.)

Thanks for your time and your feedback. Hopefully this is my very last
overlong report for this release cycle… Once again, I thought I'd err on
the side of exhaustiveness.

I might still follow up with some more test results from earlier D-I
Bookworm releases (Alpha 1, Alpha 2, RC 1) which might help narrow down
what changed between Bullseye and (current) Bookworm. But that might
happen after RC 3 is published.


Cheers,
--
Cyril Brulebois (ki...@debian.org) <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant

Ben Hutchings

unread,
May 13, 2023, 6:30:05 PM5/13/23
to
On Sat, 2023-05-13 at 10:22 +0200, Cyril Brulebois wrote:
[...]
> Kernel-side
> ===========
>
> The fb-modules udeb hasn't changed much since 4+ years, with some DRM
> modules getting added alongside existing ones, leading to the following
> contents in Bullseye (5.10.178-3):
[...]
> Those contents are defined via those files in linux.git:
>
> kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat debian/installer/modules/amd64/fb-modules
> #include <fb-modules>
>
> vesafb ?
> vga16fb
>
> kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat debian/installer/modules/fb-modules
> # We don't include all DRM drivers here as on many platforms we can
> # call system firmware to get hold of a simple framebuffer

To expand on this comment, in the case of UEFI boot the efifb driver
should provide a simple framebuffer, and on BIOS vesafb should do it.
Those are both built-in on x86, and efifb is also built-in on arm64 and
armhf.


[...]
> X-side
> ======

Both of the kernel drivers are old-style framebuffer drivers so in
Xorg, the appropriate generic driver is "fbdev", not "modesetting".

> Now, we know that the contents of xserver-xorg-core-udeb have changed a
> little between Bullseye and Bookworm (#1035014), but that doesn't seem
> to be a factor here.
>
> I've tested 3 netboot/gtk/mini.iso to assess the situation:
>
> - mini-20210731+deb11u8.iso from Bullseye 11.7
> - mini-20230427.iso from D-I Bookworm RC 2
> - mini-daily.iso from D-I daily builds (downloaded today)
>
> If people want to replicate those tests, they're available at:
> https://people.debian.org/~kibi/bug-drm-vs-uefi/
>
> Or:
>
> wget https://deb.debian.org/debian/dists/bullseye/main/installer-amd64/20210731+deb11u8/images/netboot/gtk/mini.iso -O mini-20210731+deb11u8.iso
> wget https://deb.debian.org/debian/dists/bookworm/main/installer-amd64/20230427/images/netboot/gtk/mini.iso -O mini-20230427.iso
> wget https://d-i.debian.org/daily-images/amd64/daily/netboot/gtk/mini.iso -O mini-daily.iso

These all include fbdev_drv.so, and Xorg.log shows that the fbdev
driver is being used.

So I suppose there's a regression in either efifb or fbdev_drv.

> Via QEMU, under BIOS and UEFI, results are:
>
> +-------------+-----------------+-----------------+-----------------+
> | Graphics | Bullseye 11.7 | Bookworm RC 2 | Daily builds |
> +-------------+--------+--------+--------+--------+--------+--------+
> | | BIOS | UEFI | BIOS | UEFI | BIOS | UEFI |
> +-------------+--------+--------+--------+--------+--------+--------+
> | | OK | OK | OK | KO-G | OK | KO-G |
> | -vga std | OK | OK | OK | KO-G | OK | KO-G |
> | -vga cirrus | OK | OK | OK | KO-S | OK | KO-S |
> | -vga qxl | OK | OK | OK | OK | OK | OK |
> | -vga virtio | OK | OK | OK | OK | OK | OK |
> | -vga vmware | OK | OK | OK | OK | OK | OK |
> +-------------+--------+--------+--------+--------+--------+--------+

I started testing with QEMU and OVMF from unstable, and I'm instead
seeing Xorg failing to start in the same cases you see glitches. The
relevant error message seems to be this one:
http://codesearch.debian.net/show?file=xorg-server_2%3A21.1.7-3%2Fhw%2Fxfree86%2Ffbdevhw%2Ffbdevhw.c&line=504

[...]
> Questions
> =========
>
> - Is it really to be expected that X and standard drivers would regress
> this way when moving from Bullseye to Bookworm?

No.

> - Or is it expected to require specific kernel modules while that wasn't
> the case before? I've discovered this in VM environments, but maybe
> similar things could be happening on bare metal as well, and maybe
> some more modules should be considered for inclusion?

No.

> - Is it acceptable to just bundle bochs, cirrus, and vboxvideo for the
> time being (i.e. RC 3, RC 4, 12.0.0), be it via the nasty approach
> or via a proper linux fb-modules inclusion?
> - Or does shipping those few modules risk breaking the kernel and/or X
> on other platforms? (I'd definitely hope not!)

I would not expect so. They get used on the installed system, so they
probably work.



[...]
> Proposal plan for d-i (Bookworm RC 3, RC 4, and 12.0.0)
> =====================
>
> Unless I received strong negative feedback before Monday (May 15th),
> I plan on including the nasty approach in RC 3, and to revert it
> altogether in RC 4 if big bad regressions are reported:
> https://salsa.debian.org/installer-team/debian-installer/-/commit/9fceca63273d0b501ea64d7b719acafc93a5b7fa
>
> As a side note, keeping the bundling in src:debian-installer for the
> next few weeks makes us autonomous: we can enable and disable those
> extra modules without requiring a new linux upload… so it's nasty but I
> actually thought about the few advantages we were getting out of this!
>
> We should also be OK legal-wise, given we already have linux in
> Built-Using via its udebs, so copying things around from linux-image
> wouldn't change anything there, would it?
>
> Of course in the long run, if having those modules is desired, it will
> be better to have them merged in linux and to drop the nasty code, e.g.
> in a point release.
[...]

Definitely.

I will spend some time investigating this, but I doubt I'll come up
with a better fix in time.

Ben.

--
Ben Hutchings
Life would be so much easier if we could look at the source code.
signature.asc

Cyril Brulebois

unread,
May 13, 2023, 11:40:04 PM5/13/23
to
Hi Ben,

Thanks for all those details!

Ben Hutchings <b...@decadent.org.uk> (2023-05-14):
> >
> > +-------------+-----------------+-----------------+-----------------+
> > | Graphics | Bullseye 11.7 | Bookworm RC 2 | Daily builds |
> > +-------------+--------+--------+--------+--------+--------+--------+
> > | | BIOS | UEFI | BIOS | UEFI | BIOS | UEFI |
> > +-------------+--------+--------+--------+--------+--------+--------+
> > | | OK | OK | OK | KO-G | OK | KO-G |
> > | -vga std | OK | OK | OK | KO-G | OK | KO-G |
> > | -vga cirrus | OK | OK | OK | KO-S | OK | KO-S |
> > | -vga qxl | OK | OK | OK | OK | OK | OK |
> > | -vga virtio | OK | OK | OK | OK | OK | OK |
> > | -vga vmware | OK | OK | OK | OK | OK | OK |
> > +-------------+--------+--------+--------+--------+--------+--------+
>
> I started testing with QEMU and OVMF from unstable, and I'm instead
> seeing Xorg failing to start in the same cases you see glitches. The
> relevant error message seems to be this one:
> http://codesearch.debian.net/show?file=xorg-server_2%3A21.1.7-3%2Fhw%2Fxfree86%2Ffbdevhw%2Ffbdevhw.c&line=504

Checking RC 1, I'm seeing OK results for both `-vga std` (or no options)
and `-vga cirrus`. I should note GRUB itself is “text-like” with RC 1,
while it's “graphical” with RC 2.

Reverting the following commit in debian-installer.git and building a
netboot-gtk image against unstable gives me a working graphical
installer with `-vga std` (or no options) and `-vga cirrus`. I didn't
check the rest of the matrix though.
https://salsa.debian.org/installer-team/debian-installer/-/commit/a4dc8c0fe7ad1a0c1506125ad9985f78819a1bb2

So it looks to me the GRUB config fix uncovered a pre-existing bug, and
the linux version bump (6.1.20-1 → 6.1.20-2) between RC 1 and RC 2 isn't
a factor (xserver-xorg-* udebs didn't change).

Interestingly, switching to the bullseye branch and cherry-picking the
same GRUB config fix there, and rebuilding d-i against current bullseye,
I'm getting exactly the same problem: KO-G for std, KO-S for cirrus!

So it looks like this might be a rather old issue, rather than a
regression during the Bookworm release cycle.


Also, I should note that while my focus was on netboot-gtk mini.iso
(because it's much quicker to rebuild/tweak than a netinst image), I'm
replicating those results with the netinst images:
- Bullseye has a “text-like” GRUB, all good.
- Bookworm RC 1 has a “text-like” GRUB, all good.
- Bookworm RC 2 has a “graphical” GRUB, issues!

> > Questions
> > =========
> >
> > - Is it really to be expected that X and standard drivers would regress
> > this way when moving from Bullseye to Bookworm?
>
> No.
>
> > - Or is it expected to require specific kernel modules while that wasn't
> > the case before? I've discovered this in VM environments, but maybe
> > similar things could be happening on bare metal as well, and maybe
> > some more modules should be considered for inclusion?
>
> No.
>
> > - Is it acceptable to just bundle bochs, cirrus, and vboxvideo for the
> > time being (i.e. RC 3, RC 4, 12.0.0), be it via the nasty approach
> > or via a proper linux fb-modules inclusion?
> > - Or does shipping those few modules risk breaking the kernel and/or X
> > on other platforms? (I'd definitely hope not!)
>
> I would not expect so. They get used on the installed system, so they
> probably work.

Copy all!

Note for a further session: instead of debugging d-i itself, it should
be possible to reproduce those issues in the installed system, by
keeping only a specific list of kernel modules and X drivers. Of course,
that means having GRUB in “graphical” mode as well (a quick check
suggests installing desktop-base, without plymouth*, is sufficient for
that part).

As a very quick experiment, I tried:
- installing xfce4 and desktop-base;
- rebooting;
- X doesn't start directly, one needs to run startxfce4 from the
console.

Then:
- manually removing all X drivers except fbdev_drv.so;
- manually removing both tiny/ drivers (bochs and cirrus);
- rebuilding the initramfs;
- rebooting.

This gives me the following:
- std: black screen, not even seeing a console prompt;
- cirrus: “garbled/split” screen symptoms in the console, and in X;
- qxl: all good in the console and in X.

Interestingly, purging desktop-base gets me back to a “text-only” GRUB
prompt, but both std and cirrus are exhibiting “garbled/split” screen
symptoms in the console and in X.

I'll stop here, I just wanted to confirm one could reproduce those
issues within the installed system, which should almost always be a
debug-friendlier environment than d-i…

> > Proposal plan for d-i (Bookworm RC 3, RC 4, and 12.0.0)
> > =====================
> >
> > Unless I received strong negative feedback before Monday (May 15th),
> > I plan on including the nasty approach in RC 3, and to revert it
> > altogether in RC 4 if big bad regressions are reported:
> > https://salsa.debian.org/installer-team/debian-installer/-/commit/9fceca63273d0b501ea64d7b719acafc93a5b7fa
> >
> > As a side note, keeping the bundling in src:debian-installer for the
> > next few weeks makes us autonomous: we can enable and disable those
> > extra modules without requiring a new linux upload… so it's nasty but I
> > actually thought about the few advantages we were getting out of this!
> >
> > We should also be OK legal-wise, given we already have linux in
> > Built-Using via its udebs, so copying things around from linux-image
> > wouldn't change anything there, would it?
> >
> > Of course in the long run, if having those modules is desired, it will
> > be better to have them merged in linux and to drop the nasty code, e.g.
> > in a point release.
> [...]
>
> Definitely.
>
> I will spend some time investigating this, but I doubt I'll come up
> with a better fix in time.

Thanks so much for your feedback.

It'd probably be even less likely to get to the bottom of it (at least
as far as I would be able to investigate myself) in a timely manner
given the absence of a known good version to use in a bisect session to
track down when that “regression” was introduced…

Monitoring the ppc64el situation (#1033058) was already on my radar for
upcoming point releases, I've added this new issue to the list:
https://salsa.debian.org/installer-team/debian-installer/-/issues/3

Maybe I'll end up learning about FB after all… twice the fun!
signature.asc

Cyril Brulebois

unread,
May 14, 2023, 5:42:22 AM5/14/23
to
Cyril Brulebois <ki...@debian.org> (2023-05-14):
> Also, I should note that while my focus was on netboot-gtk mini.iso
> (because it's much quicker to rebuild/tweak than a netinst image), I'm
> replicating those results with the netinst images:
[…]
> - Bookworm RC 1 has a “text-like” GRUB, all good.
> - Bookworm RC 2 has a “graphical” GRUB, issues!

While adjusting my “nasty” approach to make sure it would build on all
three modified archs (amd64, arm64, i386), it occurred to me that:

- Of course the trivial patch wouldn't work, because some builds aren't
“pure GTK” builds, like cdrom-xen, and that one would also need the
686-pae flavour on i386.

- Of course it wouldn't work on arm64 either, since that one doesn't
ship vboxvideo.ko.

- And more importantly, we have the fb-modules udeb in various places,
including for builds that aren't about the graphical installer…


And at this point, it seems fair to say that at least the Linux kernel
isn't perfect, as problems show up even without X in the picture!

- With Bookworm RC 1 netinst amd64 (again under UEFI), switch from
default “Graphical install” to “Install”: the text installer shows
up with both std and cirrus.

- With Bookworm RC 2 netinst amd64 (again under UEFI), switch from
default “Graphical install” to “Install”: the screen is garbled
with std, split with cirrus.


This is easily confirmed:

- Triggering a debian-installer “netboot” build (not “netboot-gtk”):
the resulting mini.iso exhibits the same problems as Bookwork RC 2
using “Install”, with both std and cirrus.

- Patching that “netboot” build to benefit from the extra DRM modules
makes those issues go away, with both std and cirrus.

- Alternatively, not patching the “netboot” build but reverting the same
patch as mentioned before makes those issues go away, with both std and
cirrus:
https://salsa.debian.org/installer-team/debian-installer/-/commit/a4dc8c0fe7ad1a0c1506125ad9985f78819a1bb2


For the very short term (RC 3), I think I'll implement the following:

1. Consider archs with the graphical installer (that's been my main
focus until a few hours ago, when I started realizing the console
without X was also impacted), even if other archs include fb-modules
as well.
This means: amd64, arm64, i386. Those happen to also do EFI/SB.

2. Hardcode list of of modules to be added:
drm_shmem_helper.ko
drm_ttm_helper.ko
drm_vram_helper.ko
tiny/bochs.ko
tiny/cirrus.ko
ttm/ttm.ko
vboxvideo/vboxvideo.ko [!arm64, i.e. amd64 and i386 only]

3. For each of these 3 archs, deploy each of these modules. Do that for
each build that includes drm.ko (which should be synonymous with
fb-modules being deployed, given drm.ko is mandatory in the common
fb-modules file, included from the arch-specific ones in src:linux),
and do that without a condition on GTK detection or /usr/bin/Xorg's
presence.

This should be targeted enough (touching 3 archs, two of which are getting
a lot of attention; leaving all others entirely untouched), yet generic
enough to work around issues that show up in both text and graphical
versions of the installer, by patching all relevant builds (netboot,
netboot-gtk, those used by debian-cd, etc.).

I'll push a v2 of my nasty branch once I've performed some clean-up and
some more testing.
signature.asc

Ben Hutchings

unread,
May 14, 2023, 1:51:22 PM5/14/23
to
Control: tag -1 patch

On Sun, 2023-05-14 at 00:21 +0200, Ben Hutchings wrote:
[...]
> So I suppose there's a regression in either efifb or fbdev_drv.

I'm not spotting any functional changes in fbdev or the submodules it
depends on between bullseye and bookworm. So this implicates either
efifb or, as you mentioned, GRUB.

> > Via QEMU, under BIOS and UEFI, results are:
> >
> > +-------------+-----------------+-----------------+-----------------+
> > | Graphics | Bullseye 11.7 | Bookworm RC 2 | Daily builds |
> > +-------------+--------+--------+--------+--------+--------+--------+
> > | | BIOS | UEFI | BIOS | UEFI | BIOS | UEFI |
> > +-------------+--------+--------+--------+--------+--------+--------+
> > | | OK | OK | OK | KO-G | OK | KO-G |
> > | -vga std | OK | OK | OK | KO-G | OK | KO-G |
> > | -vga cirrus | OK | OK | OK | KO-S | OK | KO-S |
> > | -vga qxl | OK | OK | OK | OK | OK | OK |
> > | -vga virtio | OK | OK | OK | OK | OK | OK |
> > | -vga vmware | OK | OK | OK | OK | OK | OK |
> > +-------------+--------+--------+--------+--------+--------+--------+
>
> I started testing with QEMU and OVMF from unstable, and I'm instead
> seeing Xorg failing to start in the same cases you see glitches. The
> relevant error message seems to be this one:
> http://codesearch.debian.net/show?file=xorg-server_2%3A21.1.7-3%2Fhw%2Fxfree86%2Ffbdevhw%2Ffbdevhw.c&line=504
[...]

I tested with QEMU from bullseye and OVMF from unstable, and again I
saw Xorg failing to start, rather than glitches. Weird.

I also patched the kernel to report the internal screen_info structure
and the fb_var_screeninfo structure passed in and out of
FBIOPUT_VSCREENINFO. The key difference is:

- With -vga qxl, screen_info says 32 bpp, X wants 32 bpp, the kernel
agrees with that.
- With -vga std or -vga cirrus screen_info says 24 bpp, X wants 32
bpp, and the kernel says 24 bpp.

I think the problem is this GRUB has native drivers for Bochs and
Cirrus that reprogram the framebuffer bit depth, and the kernel is then
confused about what the bit depth is supposed to be. With QXL, GRUB
doesn't have a native driver so it doesn't reconfigure the framebuffer.

Unfortunately, with Secure Boot we have to use a monolithic GRUB build
so I can't easily exclude video_bochs and video_cirrus to see if that
improves matters.

But what does works for me is:

--- a/build/boot/x86/grub/grub-efi.cfg
+++ b/build/boot/x86/grub/grub-efi.cfg
@@ -5,7 +5,7 @@ else
fi

if loadfont $font ; then
- set gfxmode=800x600
+ set gfxmode=800x600x32
set gfxpayload=keep
insmod efi_gop
insmod efi_uga
--- END ---

A full patch is attached.

This works for me with all the QEMU graphics devices. But I haven't
tested on real hardware.


Ben.

--
Ben Hutchings
Absolutum obsoletum. (If it works, it's out of date.) - Stafford Beer
0001-Always-use-32-bpp-for-GRUB-EFI-graphical-menu-Closes.patch
signature.asc

Ben Hutchings

unread,
May 14, 2023, 2:40:06 PM5/14/23
to
On Sun, 2023-05-14 at 19:40 +0200, Ben Hutchings wrote:
[...]
> This works for me with all the QEMU graphics devices. But I haven't
> tested on real hardware.

Now tested successfully on 2 custom desktops:

- Asus P8Z68-V LX motherboard, Intel Core i5 2500 CPU, integrated GPU
- ASRock B450 PRO4, AMD Ryzen 5 3600 CPU, Radeon RX580 GPU

and 2 laptops:

- Lenovo ThinkPad T420, Intel Core i5 2nd gen CPU, integrated GPU
- Lenovo ThinkPad T460, Intel Core i5 6th gen CPU, integrated GPU
signature.asc

Cyril Brulebois

unread,
May 17, 2023, 6:30:06 AM5/17/23
to
Control: tag -1 - patch

Hi,

Thanks for the proposed patch but as discussed elsewhere it seemed too
risky to force 32 bpp on everyone, so I went for what looked like the
least risky (adding bochs.ko and cirrus.ko, manually and for the time
being).

Ben Hutchings <b...@decadent.org.uk> (2023-05-14):
> I think the problem is this GRUB has native drivers for Bochs and
> Cirrus that reprogram the framebuffer bit depth, and the kernel is then
> confused about what the bit depth is supposed to be. With QXL, GRUB
> doesn't have a native driver so it doesn't reconfigure the framebuffer.

I've spent some time trying to reproduce these issues under UEFI but
without Secure Boot, and I failed. So I've moved to learning how to sign
a Linux kernel (certutil, pesign, mokutil, etc.), and I've added some
debugging information in various places.

Under Secure Boot, with the default QEMU driver (std aka. bochs),
initialization happens via:

drivers/firmware/efi/libstub/x86-stub.c

and its setup_graphics() that grabs the screen info part of boot params
and starts by zero-ing it:

si = &boot_params->screen_info;
memset(si, 0, sizeof(*si));

before trying efi_setup_gop() and setup_uga() in turn; the former being
current, the latter being the old standard.

Moving on to:

drivers/firmware/efi/libstub/gop.c

we see that its efi_setup_gop() calls setup_gop(), which in turn calls
find_gop(). That last one gets hold of a suitable GOP pointer:
https://uefi.org/specs/UEFI/2.10/12_Protocols_Console_Support.html#graphics-output-protocol

The rest of setup_gop() then uses information contained within that
structure to derive all relevant information, filling the screen_info
structure. That structure is then trusted by efifb, which can do nothing
else but fail miserably…

The si (screen_info) is set starting here:
https://elixir.bootlin.com/linux/v6.1.27/source/drivers/firmware/efi/libstub/gop.c#L534

Adding some debug, here's what I get with GRUB set to 800x600x24:

info->version: 0
info->horizontal_resolution: 1024
info->vertical_resolution: 768
info->pixel_format: 1
info->pixel_information.red_mask: 0
info->pixel_information.green_mask: 0
info->pixel_information.blue_mask: 0
info->pixel_information.reserved_mask: 0
info->pixels_per_scan_line: 1024

Let's see:

- Of course width, height, and pixels_per_scan_line are incorrect.

- pixel_format 1 means PIXEL_BGR_RESERVED_8BIT_PER_COLOR aka
PixelBlueGreenRedReserved8BitPerColor in the spec, which means:

A pixel is 32-bits and byte zero represents blue, byte one
represents green, byte two represents red, and byte three is
reserved. This is the definition for the physical frame buffer.
The byte values for the red, green, and blue components represent
the color intensity. This color intensity value range from a
minimum intensity of 0 to maximum intensity of 255.

- And masks are all 0.

So for this particular GRUB configuration to work, I've verified that
fixing all those fields was leading to a correct display via efifb
(having dropped bochs.ko to stick to efifb):

info->horizontal_resolution = 800;
info->vertical_resolution = 600;
info->pixels_per_scan_line = 800;

info->pixel_format = PIXEL_BIT_MASK;
info->pixel_information.red_mask = 0x00ff0000;
info->pixel_information.green_mask = 0x0000ff00;
info->pixel_information.blue_mask = 0x000000ff;
info->pixel_information.reserved_mask = 0x00000000;

Setting PIXEL_BIT_MASK means masks become relevant, and bits set in
those are added to determine the actual color depth, instead of an
hardcoded 32, giving me (and efifb) 24. And even:

efifb: mode is 800x600x24, linelength=2400, pages=1
efifb: scrolling: redraw
efifb: Truecolor: size=0:8:8:8, shift=0:16:8:0

instead of the dreaded:

efifb: mode is 1024x768x32, linelength=4096, pages=1
efifb: scrolling: redraw
efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0

Now, where to go from here? It seems pretty clear to me at this point
that the Linux kernel only relies on information that was obtained via
that GOP pointer, and does its best afterward.

The way the function call works seems pretty similar to what happens in
GRUB, so I'd think that the problem is likely *not* in the kernel, but
rather:

- GRUB fails to set mode information properly.

- OVMF drops the ball and return some default information.


> Unfortunately, with Secure Boot we have to use a monolithic GRUB build
> so I can't easily exclude video_bochs and video_cirrus to see if that
> improves matters.

Applying my new pesign skills on GRUB is the next step, but I have to
spend some time on another topic before Bookworm… It it possible that
trying to build a debug-enabled OVMF package might yield interesting
results, since AFAIUI that's the one implementing the back and forth…
If that's indeed the case, it should be easy to see what's written by
GRUB vs. what's read by Linux?
signature.asc
0 new messages