On Fri, Oct 12, 2018 at 02:12:08PM -0700,
alist...@gmail.com wrote:
> I agree that we shouldn't trust vendor firmware. It is always buggy at best.
>
> The problem that I see with starting coreboot or u-boot (the last stage
> bootloaders) in M mode is that we then end up with duplicated work between
> the projects.
>
> This is where a model like Arm Trusted Firmware fits in well. We can see
> something like this:
> 1. CPU boots up and ROM does it's thing
> 2. ROM boots into an open source firmware (ATF equivalent) in M mode
> - This will end up being pretty SoC specific
> 3. The firmware boots into u-boot or core boot in S mode (or H mode when
> that exists)
> - Hopefully this can then be as generic as possible
> 4. From there we boot the system
>
> The advantage I see with this is that we only need to implement the M mode
> functionality once and then coreboot, u-boot and anything else can utilise
> it. It also keeps bigger later stage boot loaders out of M mode which has a
> mild security benefit of smaller attack surfaces.
>
> The other aim is to allow a single RISC-V image to boot on a variety of
> different SoCs. The more generic we can make the last stage boot loaders
> the more achievable this is.
Hello,
I'd like to chime in here with some thoughts about
firmware/bootloader interfaces from a Linux distribution's point
of view. This isn't strictly about M-mode vs. S-mode, but
nonetheless related. I've changed the subject of the email to
show that this is somewhat of a side-thread, but IMHO important
nonetheless in the context of the existing discussion.
As an ideal, Linux distributions would like to see a situation
where (provided that the hardware is technicially capable of
running the distribution), a single installer image can be used
on all systems and the resulting installed system "just works".
Having to build per-machine-type installer images is a nightmare
and doesn't scale - I have been doing that for ARM-based systems
for quite some time, so I can speak from first-hand experience
here.
In reality unfortunately this ideal isn't achieved anywhere, not
even on x86 which tends to be commonly mentioned as the platform
where "things just work". Before laying out what I would like to
see standardized on RISC-V, I'll try to describe some of the
experiences that Debian developers have made on x86 and ARM as
those are usually the two platforms that come up as examples in
discussions related to boot interfaces. This hopefully helps to
explain why I would like to see certain things standardized in a
certain way.
- x86 has classic BIOS and UEFI, and the latter in 32bit and
64bit variants, sometimes in severely broken hybrid forms
(64bit CPUs combined with 32bit-only UEFI implementations that
require dirty hacks to boot a 64bit OS).
Even when declaring the classic BIOS to be legacy and just
concentrating on UEFI with ACPI, there are still lots of broken
implementations in the wild that require tons of workarounds on
multiple levels, be it the plethora of in-kernel workarounds
for broken ACPI implementations or the broken boot handling in
quite a number of existing UEFI implementations that make the
theoretical advantages of having a proper bootloader interface
defined in the UEFI specification often moot in practice. Add
to that the problem that in practice many UEFI and ACPI
implementations don't do what the spec says but instead do what
works with Windows. As a consequence, the Linux kernel had to
move to emulating the Windows behaviour instead of doing things
the way that the spec actually provides because otherwise it
wouldn't work properly on real-world systems. And then there
are of course such great things as real-world UEFI
implementations that kill themselves (as in converting a
working Laptop to a paperweight) as soon one actually uses the
interfaces that the UEFI spec provides but that aren't commonly
used by Windows. Remember the "Linux kills Laptops" uproar?
In fact that was "broken UEFI kills Laptops" - Linux called a
UEFI function to write a crash dump into a UEFI NVRAM variable
in a perfectly spec-compliant way, but a number of existing
UEFI implementations didn't properly handle large writes
resulting in a tight loop on UEFI initialization at the next
reboot, turning the Laptop into a paperweight. Windows doesn't
do large writes by default, so the problem usually didn't show
up under Windows. You can probably imagine the reactions that
Linux distributions faced with people claiming "You have killed
my laptop!".
- On armhf (32bit) platforms, there hasn't traditionally been a
standard firmware interface for a long time. The SoC bootroom
usually loads a bitstream from a vendor-specific location
(either from SPI NOR flash or from a vendor-defined MMC block
and executes that, whatever there is. In the embedded space,
vendors have usually forked u-boot and hacked in support for
their SoC, defining a vendor-specific method for loading the
kernel, and left it at being incompatible with the rest of the
world.
In mainline u-boot there have been efforts to standardize the
boot interface that is provided to the later stages since IIRC
2014. This goes under the label of "distro bootcommand" and
provides a vendor-neutral boot interface to e.g. Linux
distributions. With this, one can boot the same installer
image on all armhf platforms that use a u-boot with "distro
bootcommand" support. All platforms that are newly added to
mainline u-boot are expected to implement "distro bootcommand"
support and many existing platforms have been converted to it
in the meantime.
On arm64 there are two worlds - the embedded space, using
u-boot+device-tree (usually combined with the "ARM Trusted
Firmware" (ATF)), and the server space with either UEFI+ACPI or
UEFI+device-tree. Pure enterprise distributions like RHEL
usually only target UEFI+ACPI on arm64 and ignore everything
else. Community distributions like Debian usually target all
of the aforementioned options: u-boot+device-tree,
UEFI+device-tree and UEFI+ACPI.
ATF is published under BSD license by ARM, but there are two
big problems with ATF in practice:
- There are two incompatible ATF interface definitions, an
older and a newer one. Unfortunately some SoC vendors use
the old one while others use the new one :-(.
- The "upstream" ATF originally didn't have support for any
hardware besides some reference systems from ARM (similar to
the situation with Tianocore in the UEFI world). SoC vendors
usually used the BSD-licensed ATF as the base for a
closed-source fork that included hardware support for their
specific SoC and didn't contribute back anything to the
upstream ATF, and of course also often haven't pulled in any
later ATF bug fixes. In the meantime, open-source developers
have contributed (partially reverse-engineered) support for
some arm64 platforms to upstream ATF, but that are still
rather few.
The problem that Linux distributions have with the
closed-source vendor forks of ATF is that they a) are often
broken and b) aren't stored in some form of onboard memory
(like an SPI NOR flash) but are shipped by the vendors as
part of their SD card boot images. In consequence, Linux
distributions would have to include those proprietary ATF
builds in their boot images, which they often can't for legal
(nobody except the SoC vendor has the right to distribute the
proprietary code) and/or policy reasons (Debian doesn't ship
any software that isn't Open Source as part of the main
distribution). Exactly the same problem exists with the code
that initializes the DRAM controller. Like with ATF the
actual implementation is usually proprietary (at least until
some open-source developer has managed to reverse-engineer
the way the DRAM controller needs to be set up) and in
contrast to PCs, where the DRAM initialization is always done
by code that is stored in an onboard flash chip, on arm
platforms the DRAM init code is often shipped as part of the
SD card boot image. In consequence, the corresponding
systems cannot be supported by Debian.
Alex Graf from SUSE has been working on making it possible to
install distributions that expect UEFI, but can use device-tree
instead of ACPI (which is a configuration that is covered by
the UEFI spec) on systems using u-boot as their boot firmware
by adding a UEFI emulation layer on top of u-boot. The
important point hereby is that this layer implements the
UEFI boot services, but not the UEFI runtime services, i.e.
it can be used to load a UEFI-GRUB, but it cannot be used
with operating systems that expect to be able to call
UEFI functions during runtime, like writing to UEFI NVRAM
variables, so this is not a replacement for a full UEFI
implementation.
For the "low-level" part of system firmware (i.e. the part that
handles stuff like initial voltage regulator setup, DRAM
initialization, etc.), I would like to see the following things
standardized for RISC-V as part of the unix platform spec:
* The platform vendor must provide DRAM initialization code
with a standardized interface in an on-board memory, so that
an OS provider doesn't have to ship the code as part of the
OS image, but can call it when needed.
The reason for this is outlined above in the section about
ATF - most DRAM init code is only provided under NDA by the
DRAM controller IP vendor, so OS providers may not be able to
use it if it isn't already available on the system.
* Define a standard partition table layout that is taken into
account by anything on the system, including the boot
firmware. My proposal would be GPT because it's flexible,
supported by about everything and the kitchen sink, and
supports large volumes (which is a problem with the classic
BIOS partition table format).
* If a SoC supports loading its raw firmware bitstream from a
block device (typically SD/eMMC) by loading a certain amount
of bytes from a certain offset, define _one_ standard offset
and length where it is to be loaded from, making sure that it
is compatible with the chosen partition table layout. The
reason for that is that without such a definition, one would
again risk incompatibilities with OS, userland applications
and possible alternative firmwares.
SoCs can usually boot their firmware from multiple places,
selectable by external pins or in a fixed order. When using
a fixed order, usually removable storage (SD card) has
highest priority, then comes eMMC, then SPI NOR flash. This
is really nice because it allows to easily "unbrick" a broken
system, e.g. if flashing a new firmware into the onboard NOR
flash has failed and the system doesn't come up anymore.
Code is only loaded from a device if it has a valid header
(usually some magic number and a CRC), so if there is no
valid header on removeable media because nobody has installed
a firmware image there, the boot process simply continues
with the in-flash vendor code. If there is a valid firmware
image on the block device, it gets loaded and executed. This
allows either to fix the broken NOR flash contents or just
use this firmware as a flexible boot-time replacement for the
firmware in the NOR flash. Usually the area the SoC loads
its firmware from is somewhere after the partition table but
below 1MB (where normally the first partition starts for
BIOS-style and GPT-style partition tables). In the arm
world, a number of SoCs have placed their firmware location
as a 32kB block at 8k offset from the beginning of the
device. Unfortunately while this works for systems using
BIOS-style partition tables, it breaks for systems using
GPT-style partition tables, because those are larger. The
maximum length of the firmware boot block needs to be
specified as well, because firmware like u-boot usually uses
the space between the first-stage boot image and the start of
the first partition for its own (non-SPL) code and data, so
it needs to know from which offset onwards blocks can be used
without interfering.
For the "high-level" part of boot firmware (the part that
implements the "user-facing" part of the boot process and
implements methods to boot from SATA disks, network interfaces,
provide system configuration options, etc.), finding a reasonable
standard that covers all use cases is really difficult.
I can understand the "simply go with UEFI and ACPI" argument that
Richard Jones had made in his talk during the Barcelona workshop,
but only when limited to the premise that it has been made under,
i.e. "RedHat is only interested in the server market and nothing
else".
One can surely with good reason argue "UEFI+ACPI is used on x86
and arm64, do the same for RISC-V" as there is existing support
for UEFI-based systems in many operating systems and bootloaders.
On the other hand, there is a lot to be said against UEFI+ACPI,
including that it is way too complex (not only for the embedded
Linux use case, but also too complex for reasonably creating
secure and bug-free implementations in the server space), has a
large attack surface (see Ron Minnichs talks), and reality shows
that things that are fine in theory are way too often broken in
real-world implementations (which I believe is also to a certain
amount the fault of UEFI's enormous complexity).
ACPI is a no-go from my point of view, I've seen way too many
broken ACPI implementations in the PC world that cannot be fixed
by the user and in practice usually don't get fixed by the
manufacturers. Things are different if you "live" in the
high-end server market (both on x86 and on ARM), but for
commodity hardware my experience has been rather devastating.
If one would want to standardize on using UEFI at all, then IMHO
only in the form of UEFI+device-tree.
For the embedded Linux use case, UEFI+ACPI isn't even worth
a single thought - from my experience with the embedded Linux
world on ARM platforms I am sure that there isn't the chance of
a snowball in hell that platform vendors for the embedded Linux
market would be willing or capable of providing a reasonable
UEFI+ACPI implementation for their systems.
U-boot on the other hand is a rather nice firmware and bootloader
for embedded systems, but it is definitely not designed for the
server world. For the embedded space, u-boot can surely provide
a reasonably easy and platform-neutral interface for booting an
operating system.
The obvious compromise could indeed be an interface standard that
defines some stripped-down UEFI boot services as the common
denominator. This would allow hardware vendors to choose between
various existing solutions, i.e. between using either some
Tianocore-based UEFI implementation with their own HAL, using
coreboot for the lower layer with a Tianocore-based payload
providing the UEFI interface to the outside, or using a
u-boot-based implementation with u-boot's UEFI emulation layer.
The latter would provide the standardized interface for embedded
use cases without significant cost (the code size increase due to
the emulation layer is rather negligible).
Comments welcome :).
Regards,
Karsten
--
Gem. Par. 28 Abs. 4 Bundesdatenschutzgesetz widerspreche ich der Nutzung
sowie der Weitergabe meiner personenbezogenen Daten für Zwecke der
Werbung sowie der Markt- oder Meinungsforschung.