Machine Configuration Description

189 views
Skip to first unread message

Karsten Merker

unread,
May 14, 2017, 12:55:59 PM5/14/17
to isa...@groups.riscv.org, sw-...@groups.riscv.org
[crossposted to sw-dev and isa-dev as the privileged spec is
usually discussed on isa-dev, but the specific topic is largely
software/firmware-related and therefore also ontopic on sw-dev]

Hello everybody,

I'd like to revisit the "Machine Configuration Description"
chapter in the privileged ISA spec. Currently it states "To
reduce porting effort for OS boots, we have reverted back to
using Device Trees to communicate platform information to the
kernel, so this chapter is out of date. Config string was
designed for other uses in addition, but for now, we are staying
with a standard device tree model."

I'd like to work on that chapter, but I would like to gather
consensus about a number of points before attempting to do
anything in this direction:

- Result of the last discussion on sw-dev was that allowing
different formats of the "Machine Configuration Description"
doesn't really make sense as that would end up with different
vendors using different formats which would mean that any
"consumer" of the description (be it firmware, a bootloader or
an operating system) would need to implement support for all of
them, and that really doesn't make sense.

- Many people didn't like the flattened devicetree binary format
(dtb) and preferred the devicetree source (dts) format instead.
Can we agree that dts is _the_ _one-and-only_ format for the
"Machine Configuration Description" which gets embedded into
the hardware? While I personally tend to prefer dtb (for which
we already have a rather compact, BSD-licensed parser library),
the majority of people on isa-dev seemed to prefer a textual
format during the previous discussions, so that appears to be
the way to go.

- The privileged spec currently doesn't define how a firmware or
operating system actually finds the "Machine Configuration
Description", it just says "The platform must describe how to
locate a pointer to find this string, for example, by
specifying a fixed physical address at which the pointer
resides". This means that any software that tries to find this
information requires platform-specific knowledge, which I
consider to be a bad thing. The idea behind the "Machine
Configuration Description" is that it allows a standardized way
to gather all information that is necessary to bring up a
system, which gets kind of thwarted by the fact that one needs
platform-specific knowledge to find this information in the
first place.

The (now-removed) SBI chapter had defined an SBI call
"sbi_get_config" that would return a pointer to the "Machine
Configuration Description" for the current platform, so one
could argue that the standardization across all platforms
doesn't happen at the hardware level but at the SBI level.
This would be ok if the corresponding SBI call were mandatory
for all RISC-V platforms, but with the privileged spec v1.10
this isn't the case. Therefore the privileged spec would need
to be amended in this regard.

Handling this at the SBI level would also mean that SBI
implementations necessarily have to be platform-dependent. If
the access to the "Machine Configuration Description" would be
defined at the hardware level, there is no reason why one would
need platform-specific SBI implementations. AFAICS all
information that is needed by the SBI can be derived from the
devicetree, so we could have one single open-source SBI
implementation that could run on all platforms if we had a
standardized location for the machine description. This would
provide the advantage of everybody being able to profit from bug
fixes and updates to the SBI implementation without being
dependent on a specific vendor (assuming the SBI is placed in
flash and not in mask ROM). There is one practical limitation to
this approach, though: the privileged spec currently makes the
reset vector and the memory map completely platform-specific,
i.e. such an implemention could only use PC-relative addressing
and has the problem of not having any scratch RAM available from
the start. Is there a particular reason for not specifying a
standard memory map (at least as far as the reset vector and a
certain minimum amount (say 64kB) of SRAM for firmware/bootloader
purposes are concerned)?

- Another topic that I would consider important is how to handle
multiple layers of machine configuration information. Let's say
we have a SoC that includes a devicetree for its internal
peripherals, but there are further peripherals on
non-discoverable buses (I2C, SPI, etc.) on the mainboard or on
add-on-cards in the style of BeagleBone Capes or RaspberryPi
HATs. Such peripherals could e.g. include a battery-backed
RTC chip, a number of I/O contollers, temperature or voltage
sensors, an SPI-driven LCD or an I2C-controlled touchscreen.
These are common components, some of which exist on most modern
systems (e.g. battery-backed RTC and temperature/voltage
sensors on desktop/server-class hardware, I2C-contolled
touchscreens on tablets, I2C- and SPI-attached I/O controllers
in the embedded systems field), so this is definitely not an
uncommon scenario.

Generally all such components can be described in form of an
external devicetree fragment (be it in source form or as a
binary devicetree overlay) that could e.g. be stored in a
small 1-wire or I2C eeprom. The question is: how to find these
fragments/overlays? For components that are directly on the
mainboard, this could be rather easily handled by a
platform-specific SBI that has this information hardcoded, but
this doesn't solve the problem for the general case (add-on
cards or other hotpluggable components). I suppose that any
solution that one could come up with would pose the problem of
being inadequate for a certain class of system, so it might
make sense to include something like that into the planned
system class specifications that are being worked on at the
foundation. Thoughts and ideas welcome :-).

Regards,
Karsten
--
Gem. Par. 28 Abs. 4 Bundesdatenschutzgesetz widerspreche ich der Nutzung
sowie der Weitergabe meiner personenbezogenen Daten für Zwecke der
Werbung sowie der Markt- oder Meinungsforschung.

Karsten Merker

unread,
May 14, 2017, 7:32:54 PM5/14/17
to sw-...@groups.riscv.org, isa...@groups.riscv.org, Karsten Merker
On Sun, May 14, 2017 at 01:55:28PM -0700, Stefan O'Rear wrote:
> On Sun, May 14, 2017 at 9:55 AM, Karsten Merker <mer...@debian.org> wrote:
> > [crossposted to sw-dev and isa-dev as the privileged spec is
> > usually discussed on isa-dev, but the specific topic is largely
> > software/firmware-related and therefore also ontopic on sw-dev]
>
> Removing isa-dev because the software interfaces are going to be split
> out of the privileged spec and moved to their own SBI spec fairly
> soon.

[Crossposting one-time to both sw-dev and isa-dev with a
followup-to sw-dev to avoid the thread ending up in two split
threads on both lists - please honour the followup-to and send
replies only to sw-dev]
> Our current branches of spike, rocket-chip, pk/bbl, jor1k, and Linux
> all use dtb. This is no longer up for negotiation. Qemu is still on
> config-string. dts is not used and will not be used.
>
> > - The privileged spec currently doesn't define how a firmware or
> > operating system actually finds the "Machine Configuration
> > Description", it just says "The platform must describe how to
> > locate a pointer to find this string, for example, by
> > specifying a fixed physical address at which the pointer
> > resides". This means that any software that tries to find this
> > information requires platform-specific knowledge, which I
> > consider to be a bad thing. The idea behind the "Machine
> > Configuration Description" is that it allows a standardized way
> > to gather all information that is necessary to bring up a
> > system, which gets kind of thwarted by the fact that one needs
> > platform-specific knowledge to find this information in the
> > first place.
>
> The S-mode and M-mode entry points are invoked with a0 = hart id, a1 =
> physical-address pointer to a flattened device tree structure. I
> thought this was documented in 1.10 but apparently not.

As just discussed on IRC: the above information (both the
decision of going with dtb all the way as well as the convention
of how to pass the dtb) never made it to the sw-dev list and
doesn't show up in the just-released privileged spec. The last
official position posted to sw-dev from the foundation's side
were plans to go with dts...

I am personally happy about the decision to use dtb and about the
existance of a convention for passing the dtb, but the lack of
communication about such important decisions bothers me. In
every free-software project that I have been involved in, all
important decisions have been posted to the corresponding project
mailinglist and I am rather surprised that that doesn't seem to
happen here. How is somebody interested in making RISC-V a
success on the software side supposed to stay up to date and put
work into things that need to get done if important decisions
don't get communicated on the primary medium for that purpose,
i.e. on the sw-dev list? Stefan O'Rear pointed out to me that
some of the relevant information could be gathered from scraping
the repository-local issue trackers of various repository forks,
but that really shouldn't be the way a project like this works.

Please hold the relevant discussions on the mailinglist and if
discussions come up elsewhere, at least post the results here so
that people don't end up doing needless work that can better be
spent elsewhere.

As a side point to this: we have a patches mailinglist
(pat...@groups.riscv.org) onto which all patches for RISC-V
related software infrastructure code (i.e. binutils, gcc,
kernel, llvm etc.) and the corresponding review comments should
be CCed. Unfortunately the patches list isn't yet listed on
the RISC-V website - who has access to the website and could
add the corresponing information to https://riscv.org/mailing-lists/?
The list info is:

Patches List, pat...@groups.riscv.org
* Subscribe: patches+...@groups.riscv.org
* Archive: http://groups.google.com/a/groups.riscv.org/forum/#!forum/patches

Jacob Bachmeyer

unread,
May 15, 2017, 12:57:56 AM5/15/17
to Karsten Merker, isa...@groups.riscv.org
Karsten Merker wrote:
> [crossposted to sw-dev and isa-dev as the privileged spec is
> usually discussed on isa-dev, but the specific topic is largely
> software/firmware-related and therefore also ontopic on sw-dev]
>
> Hello everybody,
>
> I'd like to revisit the "Machine Configuration Description"
> chapter in the privileged ISA spec. Currently it states "To
> reduce porting effort for OS boots, we have reverted back to
> using Device Trees to communicate platform information to the
> kernel, so this chapter is out of date. Config string was
> designed for other uses in addition, but for now, we are staying
> with a standard device tree model."
>
> I'd like to work on that chapter, but I would like to gather
> consensus about a number of points before attempting to do
> anything in this direction:
>
> - Result of the last discussion on sw-dev was that allowing
> different formats of the "Machine Configuration Description"
> doesn't really make sense as that would end up with different
> vendors using different formats which would mean that any
> "consumer" of the description (be it firmware, a bootloader or
> an operating system) would need to implement support for all of
> them, and that really doesn't make sense.
>

I believe that a similar consensus was reached on isa-dev: we need a
single standard format.

> - Many people didn't like the flattened devicetree binary format
> (dtb) and preferred the devicetree source (dts) format instead.
> Can we agree that dts is _the_ _one-and-only_ format for the
> "Machine Configuration Description" which gets embedded into
> the hardware? While I personally tend to prefer dtb (for which
> we already have a rather compact, BSD-licensed parser library),
> the majority of people on isa-dev seemed to prefer a textual
> format during the previous discussions, so that appears to be
> the way to go.
>

As I understand it, the only consensus reached so far on isa-dev was
that the format should be human-readable text. The primary reason that
consensus could not be reached between dts and config string was that we
did not have a clear, definitive specification for dts when this was
last discussed.
One of the purposes of SBI was to provide functionality for which the
existence is standard, but the means is implementation-specific. SBI
had a combination of HAL and hypervisor functionality that should
probably be better split into two modules, but the combination logically
makes sense as "SBI". My understanding is exactly that the SBI is
standard, but the SBI implementation is hardware-specific. In other
words, the SBI is the layer where hardware differences are
"papered-over" to provide "standard RISC-V".

(On a side note, I earlier suggested an "SBI virtio interface" that
turned out to be amazingly poorly named, since it has nothing to do with
OASIS VirtIO that at a minimum provided access to the config string.)
The simple solution would be to extend dts with some kind of reference
type, permitting the board configuration to indicate "also read device X
if present" within hardware definitions. One way to do this would be to
add descriptions for outboard ROMs (I2C/SPI/etc.) with subkeys that
indicate "device is optional; ignore persistent read failure" and "this
device contains additional configuration information at offset X; read it".


In summary, I believe that a consensus was reached that dts is
acceptable, binary formats are a very bad idea, and that we could not
move forward to standardizing on dts without a definitive specification
for dts. I do not recall any serious objections to dts over config
string or vice versa, only that the consensus was: the configuration
data shall be text, preferably ASCII (to avoid Unicode weirdness in
parsers).


-- Jacob

Olof Johansson

unread,
May 15, 2017, 4:04:33 PM5/15/17
to jcb6...@gmail.com, Karsten Merker, isa...@groups.riscv.org
The self-description of for example add-on cards is a standard that is
best pushed down to whatever add-on card standard is used on a
particular system. Trying to direct this from the architecture is not
the right way to do it.

Self-describing buses such as USB and PCI-e already handles this (with
some complexities added when it's no longer "pure" adapters being
used, i.e. off-bus GPIO or power controls, etc). So what you need to
describe is only the USB or PCI-e host controller, not the devices on
the bus.

In either case, leave that up to the people who at some point will
want to standardize on said connectors.


-Olof

Jacob Bachmeyer

unread,
May 15, 2017, 6:49:27 PM5/15/17
to Olof Johansson, Karsten Merker, isa...@groups.riscv.org
This is not relevant to self-describing buses like USB and PCI; those
will need only a bus controller node in the platform description, with
other keys mapping bus ports to relevant GPIO or power controls.

The issue here is divisions between, for example, peripherals internal
to a SoC and peripherals on the surrounding board. The goal is to allow
the board to have its own EEPROM describing the board peripherals, which
may include further non-discoverable bus ports on the board, modules for
which can have their own EEPROMs and so forth. We need *some* level of
standardization here so that the people standardizing those connectors
can simply "drop-in" a configuration supplement that standard software
will *already* understand.

As an example (probably with bogus syntax, but I hope the idea is clear):

[In on-SoC ROM]
spi@0 {
...
rom@0 {
chip-select = gpio@3,4
#optional
#include
}
...
}

[In on-board ROM spi@0/rom@0]
port@0 {
...
bus i2c@1 {
rom@43 {
#optional
#include
}
}
...
bus map {
i2c@0 = i2c@1
...
}
...
}

[In ROM on a module in port@0]
i2c@0 {
sensor@0x4c {
type = lm90
}
}


-- Jacob

Olof Johansson

unread,
May 15, 2017, 7:09:57 PM5/15/17
to Jacob Bachmeyer, Karsten Merker, isa...@groups.riscv.org
The standard way of doing this has always been that the people who
design and develop the board- or system-level product handle the
top-level description of the hardware. They need to develop and test
things anyway; having them make the changes and construct the system
description makes sense.

You are likely already going to have a system-level firmware on the
system, stored on flash (usually SPI or eMMC on today's systems).
Adding a bunch of little SPI/I2C flash chips on the board just to
configure each bus is a substantial cost increase that doesn't make
sense here. It's a lot more sane to have a full system- or board-level
description that's passed down. It'd be based on the SoC- or
module-level description with additions for what's been attached.

For off-board peripherals it's a little bit different (a great analogy
here is the beaglebone capes and how they handle that with capemgr and
other components, to pick the right fragments).

>
> As an example (probably with bogus syntax, but I hope the idea is clear):

If you want to engage in device-tree discussions I think you might
find that it's useful to read up more on how they work, their syntax,
how they're normally used today (and the best practices people already
use for some of these use cases), and how others have looked to solve
the problem.

> [In on-SoC ROM]
> spi@0 {
> ...
> rom@0 {
> chip-select = gpio@3,4

What does gpio@3,4 mean in this pseudo-syntax?

Is it expected to hard code a pin? What if my design needs that pin
for something else and I'm short on GPIOs? And why would this be
hard-coded in ROM?

> #optional
> #include
> }
> ...
> }
>
> [In on-board ROM spi@0/rom@0]
> port@0 {
> ...
> bus i2c@1 {
> rom@43 {
> #optional
> #include
> }
> }
> ...
> bus map {
> i2c@0 = i2c@1

What does bus map mean?

> ...
> }
> ...
> }
>
> [In ROM on a module in port@0]
> i2c@0 {
> sensor@0x4c {
> type = lm90
> }
> }

What if the sensor is on address 43?


-Olof

Jacob Bachmeyer

unread,
May 15, 2017, 7:36:52 PM5/15/17
to Olof Johansson, Karsten Merker, isa...@groups.riscv.org
The goal is *one* board-level description ROM, at most, per board.
There would be a ROM internal to a SoC, and it would refer to a board ROM.

> For off-board peripherals it's a little bit different (a great analogy
> here is the beaglebone capes and how they handle that with capemgr and
> other components, to pick the right fragments).
>
>
>> As an example (probably with bogus syntax, but I hope the idea is clear):
>>
>
> If you want to engage in device-tree discussions I think you might
> find that it's useful to read up more on how they work, their syntax,
> how they're normally used today (and the best practices people already
> use for some of these use cases), and how others have looked to solve
> the problem.
>

I will, but I needed to explain why we have an issue with this on the
list now, not after reading up on device-tree.

>> [In on-SoC ROM]
>> spi@0 {
>> ...
>> rom@0 {
>> chip-select = gpio@3,4
>>
>
> What does gpio@3,4 mean in this pseudo-syntax?
>
> Is it expected to hard code a pin? What if my design needs that pin
> for something else and I'm short on GPIOs? And why would this be
> hard-coded in ROM?
>

It is a pin designation. Where "gpio@3" refers to a device elsewhere in
the tree and pin 4 of that device would be designated as "board
description ROM chip select" on the SoC pinout.

>> #optional
>> #include
>> }
>> ...
>> }
>>
>> [In on-board ROM spi@0/rom@0]
>> port@0 {
>> ...
>> bus i2c@1 {
>> rom@43 {
>> #optional
>> #include
>> }
>> }
>> ...
>> bus map {
>> i2c@0 = i2c@1
>>
>
> What does bus map mean?
>

The "bus map" that I introduced inside a "port" description allows a
port to contain multiple buses, but the "port buses" are numbered
differently from the "board buses". A "port@1" might map "i2c@0 =
i2c@2" instead. This way, the configuration ROM on a module connected
to the port can refer to devices on "i2c@0" with no concern about which
port that module is actually installed in.

>> ...
>> }
>> ...
>> }
>>
>> [In ROM on a module in port@0]
>> i2c@0 {
>> sensor@0x4c {
>> type = lm90
>> }
>> }
>>
>
> What if the sensor is on address 43?
>

Are you asking about a conflict between the module ROM at address 43 and
a sensor at address 43? That would be an incorrect module design;
address 43 was a placeholder for an address that would be chosen by the
people who specify the actual port interface. In other words, modules
of that hypothetical type are defined to have a configuration ROM on
their first I2C bus at address 43.


-- Jacob

Olof Johansson

unread,
May 15, 2017, 7:45:15 PM5/15/17
to Jacob Bachmeyer, Karsten Merker, isa...@groups.riscv.org
You've missed the forest for all the trees.

If you have one board-level ROM, then you just put the whole
description of this system, including the board-level components, in
that ROM. That's likely the same ROM that you'll store your
(upgradeable) firmware in as well, so you have a reasonable amount of
space for this.

Should there be a reasonable way for someone to construct that ROM
using the base contents for the SoC? Sure! But that's implementation
details that likely doesn't have to be part of the architecture and
will possibly vary for the firmware software stack you choose to use
anyway.


-Olof

Jacob Bachmeyer

unread,
May 15, 2017, 9:15:56 PM5/15/17
to Olof Johansson, Karsten Merker, isa...@groups.riscv.org
This works well for simple embedded systems with soldered-in SoCs, but
does not work with interchangeable SoCs, where the board has a socket
and the user can install any of a variety of SoCs on that board.


-- Jacob

Olof Johansson

unread,
May 16, 2017, 12:28:12 AM5/16/17
to jcb6...@gmail.com, Karsten Merker, isa...@groups.riscv.org
Ah, custom white box configs. They're quite a challenge, and there are
a lot of problems to solve (and specifications to write) for that to
work well. Form factors, pinouts, thermal constraints, physical
design, power constraints, etc, etc. Most of this is probably best
captured in system-level requirements and things such as machine class
definitions and standards. Sockets also have several drawbacks: price,
pin density, through-hole designs causing limitations for board
routing, and a few more.

And when it comes to producing such generic board/CPU combos, I expect
most manufacturers to want to verify and test for the supported
combinations anyway, so it's going to be hard to set up for a
future-proof standard.

But, let's get back to the original question we were discussing: Even
if all those problems were solved, I'm still not seeing how a
clear-text representation of the system is to be preferred over a
machine-readable one. Especially when considering the need to do
string parsing and processing in low-level code to deal with it. Happy
to help solve problems with using DTB if any are found though!


-Olof

Samuel Falvo II

unread,
May 16, 2017, 8:37:48 AM5/16/17
to Olof Johansson, Jacob Bachmeyer, Karsten Merker, RISC-V ISA Dev
On Mon, May 15, 2017 at 1:04 PM, Olof Johansson <ol...@lixom.net> wrote:
> The self-description of for example add-on cards is a standard that is
> best pushed down to whatever add-on card standard is used on a
> particular system. Trying to direct this from the architecture is not
> the right way to do it.

I have been saying this from the beginning. At worst, any official
position on standardization can be a recommendation; at best, it's
probably better to create a separate platform reference standard (a la
PReP) which applies to motherboards as a whole, and not so much to
RISC-V in particular. The ISA standard should not dictate everything;
it's an instruction set architecture, after all.


--
Samuel A. Falvo II

Jacob Bachmeyer

unread,
May 16, 2017, 6:27:00 PM5/16/17
to Olof Johansson, Karsten Merker, isa...@groups.riscv.org
Olof Johansson wrote:
> On Mon, May 15, 2017 at 6:15 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> Olof Johansson wrote:
>>
>> [ earlier messages snipped ]
>>> You've missed the forest for all the trees.
>>>
>>> If you have one board-level ROM, then you just put the whole
>>> description of this system, including the board-level components, in
>>> that ROM. That's likely the same ROM that you'll store your
>>> (upgradeable) firmware in as well, so you have a reasonable amount of
>>> space for this.
>>>
>>> Should there be a reasonable way for someone to construct that ROM
>>> using the base contents for the SoC? Sure! But that's implementation
>>> details that likely doesn't have to be part of the architecture and
>>> will possibly vary for the firmware software stack you choose to use
>>> anyway.
>>>
>> This works well for simple embedded systems with soldered-in SoCs, but does
>> not work with interchangeable SoCs, where the board has a socket and the
>> user can install any of a variety of SoCs on that board.
>>
>
> Ah, custom white box configs. They're quite a challenge, and there are
> a lot of problems to solve (and specifications to write) for that to
> work well. Form factors, pinouts, thermal constraints, physical
> design, power constraints, etc, etc. Most of this is probably best
> captured in system-level requirements and things such as machine class
> definitions and standards. Sockets also have several drawbacks: price,
> pin density, through-hole designs causing limitations for board
> routing, and a few more.
>

Yet sockets are still the standard for PCs, and there are developing
standards, like EOMA-68, that feature a processor module with some
peripherals that plugs into an independently-designed board with more
peripherals. I expect more of this space to be explored over time. The
importance of standardizing configuration is so that new hardware
platforms will require minimal work to port software, since the software
will already understand modular configuration.

> And when it comes to producing such generic board/CPU combos, I expect
> most manufacturers to want to verify and test for the supported
> combinations anyway, so it's going to be hard to set up for a
> future-proof standard.
>

Future-proofing here as I see it is more along the lines of software
being able to work on hardware that has not yet been designed, ideally
with no changes.

> But, let's get back to the original question we were discussing: Even
> if all those problems were solved, I'm still not seeing how a
> clear-text representation of the system is to be preferred over a
> machine-readable one. Especially when considering the need to do
> string parsing and processing in low-level code to deal with it. Happy
> to help solve problems with using DTB if any are found though!
>

But ASCII text *is* machine-readable: how else would this email reach
the list, compilers work, or DTS be translated into DTB? :)

Joke aside, that seems to be the big concern: Why not use a format that
is *both* machine-readable and human-readable?

On a more serious note, how do DTB readers handle malformed inputs?
With text, the possibilities for incorrect input are limited, since the
stream is parsed as a stream. DTB combines strings, for example, so
what happens if a string reference points to something other than a
string, or to a non-existent string?


-- Jacob

Olof Johansson

unread,
May 16, 2017, 7:47:06 PM5/16/17
to Jacob Bachmeyer, Karsten Merker, isa...@groups.riscv.org
I'll focus this discussion on DTS vs DTB, so I'm not going to keep on
elaborating on generic machine description comments above. It's an
interesting topic but it's orthogonal to the representation format of
the description.

>> But, let's get back to the original question we were discussing: Even
>> if all those problems were solved, I'm still not seeing how a
>> clear-text representation of the system is to be preferred over a
>> machine-readable one. Especially when considering the need to do
>> string parsing and processing in low-level code to deal with it. Happy
>> to help solve problems with using DTB if any are found though!
>>
>
>
> But ASCII text *is* machine-readable: how else would this email reach the
> list, compilers work, or DTS be translated into DTB? :)
>
> Joke aside, that seems to be the big concern: Why not use a format that is
> *both* machine-readable and human-readable?

Because the drawbacks of keeping it human readable outweigh the so far
very limited benefits.

> On a more serious note, how do DTB readers handle malformed inputs? With
> text, the possibilities for incorrect input are limited, since the stream is
> parsed as a stream. DTB combines strings, for example, so what happens if a
> string reference points to something other than a string, or to a
> non-existent string?

Untrusted input is untrusted input no matter the format: You'd need to
do appropriate bounds checking and validate data, just like with all
other programming. The same is true for string parsing and copying.



-Olof

Allen J. Baum

unread,
May 16, 2017, 9:11:21 PM5/16/17
to Olof Johansson, Jacob Bachmeyer, Karsten Merker, isa...@groups.riscv.org
At 4:47 PM -0700 5/16/17, Olof Johansson wrote:
> >> On Mon, May 15, 2017 at 6:15 PM, Jacob Bachmeyer <jcb6...@gmail.com>
> >> wrote:
> > Joke aside, that seems to be the big concern: Why not use a format that is
>> *both* machine-readable and human-readable?
>
>Because the drawbacks of keeping it human readable outweigh the so far
>very limited benefits.

Um, could you quantify what the drawbacks are?
And i do mean quantify. Yes, I understand that the representation will be larger, and the parser might take a bit more code, but how much more of each, and what is the cost of that increase? ROM is pretty cheap and small, onchip - unless wer'e talking really substantial (>64KB range) sizes.
I can see very small IOT apps will be very cost-sensitive - but they are also likely to require much smaller ROMs
So, what exactly are you thinking here?

--
**************************************************
* Allen Baum tel. (908)BIT-BAUM *
* 248-2286 *
**************************************************

Olof Johansson

unread,
May 16, 2017, 10:59:34 PM5/16/17
to Allen J. Baum, Jacob Bachmeyer, Karsten Merker, isa...@groups.riscv.org
Hi,

On Tue, May 16, 2017 at 6:11 PM, Allen J. Baum
<allen...@esperantotech.com> wrote:
> At 4:47 PM -0700 5/16/17, Olof Johansson wrote:
>> >> On Mon, May 15, 2017 at 6:15 PM, Jacob Bachmeyer <jcb6...@gmail.com>
>> >> wrote:
>> > Joke aside, that seems to be the big concern: Why not use a format that is
>>> *both* machine-readable and human-readable?
>>
>>Because the drawbacks of keeping it human readable outweigh the so far
>>very limited benefits.
>
> Um, could you quantify what the drawbacks are?

What comes to mind at the moment:

* Dealing with all the variations of text formats and getting it
right in ALL firmware implementations
- whitespace
- various ways of specifying data values (cells, arrays of
different element sizes, etc)
- integer specifications (hex, octal, decimal)
- comments (single and multi-line, nested and not, etc)
- syntax verification
* DTS doesn't have a header that, specifies how large the text is,
instead you need to process the whole stream (while keeping context)
* Lack of versioning support in the data format (DTB has a version
field in the header)
* Overhead of parsing text format to build the data structure and
value representations
* Having to build the binary representation anyway, since that is the
format the OS will need (at the least Linux)
* Locking into a language format that nobody considers to be truly
standardized in the same way as the binary representation is
* (Possibly requiring custom extensions to handle connecting external
information)

Benefits:
* I can read it without running dtc -I dtb -O dts on the data first.
* ...

> And i do mean quantify. Yes, I understand that the representation will be larger, and the parser might take a bit more code, but how much more of each, and what is the cost of that increase? ROM is pretty cheap and small, onchip - unless wer'e talking really substantial (>64KB range) sizes.

For some of the larger devicetrees in the kernel sources, decompiled
DTBs are ~40% larger, growing from about 88kB to 120kB (these are
mostly for TIs DRA7 platforms).

Small platforms (such as some of the stm32 platforms) grow by about
20% depending on the platform.

These are decompiled binaries, not the original sources which might be
more verbose, have comments, etc.

In an out-of-tree product tree I am working with, there's one DTB that
grows from 185kB to 251kB when decompiled.

None of the above covers the code and runtime cost, of course, just
the size differences of the data.

> I can see very small IOT apps will be very cost-sensitive - but they are also likely to require much smaller ROMs
> So, what exactly are you thinking here?

For tiny IOT platforms where every byte counts, you're likely using
custom-built firmware that isn't trying to be generic and probe at
runtime. There, you might use the machine description on the toolchain
side instead (to configure your build). Those might make sense to have
in text/source format.


-Olof

Jacob Bachmeyer

unread,
May 16, 2017, 11:02:53 PM5/16/17
to Olof Johansson, Karsten Merker, isa...@groups.riscv.org
Olof Johansson wrote:
> On Tue, May 16, 2017 at 3:26 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> Olof Johansson wrote:
>>
>>> And when it comes to producing such generic board/CPU combos, I expect
>>> most manufacturers to want to verify and test for the supported
>>> combinations anyway, so it's going to be hard to set up for a
>>> future-proof standard.
>>>
>> Future-proofing here as I see it is more along the lines of software being
>> able to work on hardware that has not yet been designed, ideally with no
>> changes.
>>
>
> I'll focus this discussion on DTS vs DTB, so I'm not going to keep on
> elaborating on generic machine description comments above. It's an
> interesting topic but it's orthogonal to the representation format of
> the description.
>

I agree with focusing the discussion, since consensus as I perceive it
is that the DeviceTree data model is acceptable, but we do have a
requirement that whatever representation we use be able to express a
modular configuration.

>> On a more serious note, how do DTB readers handle malformed inputs? With
>> text, the possibilities for incorrect input are limited, since the stream is
>> parsed as a stream. DTB combines strings, for example, so what happens if a
>> string reference points to something other than a string, or to a
>> non-existent string?
>>
>
> Untrusted input is untrusted input no matter the format: You'd need to
> do appropriate bounds checking and validate data, just like with all
> other programming. The same is true for string parsing and copying.
>

It has been some time since I read the Inferno specs and I may be
misremembering, but I believe that Inferno took an interesting approach
to the problem. While Java relies on a bytecode verifier, Inferno's Dis
VM uses an instruction set that simply cannot express operations that
would break the system's process isolation. In short, the risks of
handling untrusted input can be reduced by carefully defining that input
format such that no valid syntax can express forbidden semantics.

DTB deduplicates strings, which implies the existence of pointer-like
string references or the use of a string table. How can a DTB reader
distinguish a valid string reference from an invalid string reference
and what happens if an invalid reference goes unnoticed?


-- Jacob

Jacob Bachmeyer

unread,
May 16, 2017, 11:53:17 PM5/16/17
to Olof Johansson, Allen J. Baum, Karsten Merker, isa...@groups.riscv.org
Olof Johansson wrote:
> Hi,
>
> On Tue, May 16, 2017 at 6:11 PM, Allen J. Baum
> <allen...@esperantotech.com> wrote:
>
>> At 4:47 PM -0700 5/16/17, Olof Johansson wrote:
>>
>>>>> On Mon, May 15, 2017 at 6:15 PM, Jacob Bachmeyer <jcb6...@gmail.com>
>>>>> wrote:
>>>>>
>>>> Joke aside, that seems to be the big concern: Why not use a format that is
>>>> *both* machine-readable and human-readable?
>>>>
>>> Because the drawbacks of keeping it human readable outweigh the so far
>>> very limited benefits.
>>>
>> Um, could you quantify what the drawbacks are?
>>
>
> What comes to mind at the moment:
>

Most of this list actually are good points, and I will address them
individually.

> * Dealing with all the variations of text formats and getting it
> right in ALL firmware implementations
> - whitespace
> - various ways of specifying data values (cells, arrays of
> different element sizes, etc)
> - integer specifications (hex, octal, decimal)
> - comments (single and multi-line, nested and not, etc)
> - syntax verification
>

It was previously agreed that canonicalization would be expected, so all
whitespace is ASCII space (2/0) and newline (0/10), data value formats
would be standardized, hex/octal/decimal is pretty trivial to support,
comments would either be standard or removed entirely, and syntax
verification is the responsibility of the implementor, so a reader can
simply report "unparsable input" if given bad input. (There is no
excuse for bad input other than corrupted data due to hardware failure.)

> * DTS doesn't have a header that, specifies how large the text is,
> instead you need to process the whole stream (while keeping context)
>

Adding a "/length:X/" marker would not be a problem.

> * Lack of versioning support in the data format (DTB has a version
> field in the header)
>

Previously, a "/dts-v1/" marker was proposed and I do not recall any
complaint.

> * Overhead of parsing text format to build the data structure and
> value representations
>

How much overhead is this and how does it compare to merging DTB blobs?

> * Having to build the binary representation anyway, since that is the
> format the OS will need (at the least Linux)
>

That is what bootloaders are for: bridging gaps between firmware and
operating systems.

> * Locking into a language format that nobody considers to be truly
> standardized in the same way as the binary representation is
>

Then, at worst, we have to standardize our own subset.

> * (Possibly requiring custom extensions to handle connecting external
> information)
>

How does DTB handle connecting external information without custom
extensions?

> Benefits:
> * I can read it without running dtc -I dtb -O dts on the data first.
> * ...
>

I recall that at least one person on this list is developing a kernel
that really does prefer its system description be passed in as text. (I
think it is a Plan 9-alike.)

There is also the question of "how bad can bad get?": is it possible to
construct a DTB that "dtc -I dtb -O dts" will mis-parse, either by
accident or malice?

>> And i do mean quantify. Yes, I understand that the representation will be larger, and the parser might take a bit more code, but how much more of each, and what is the cost of that increase? ROM is pretty cheap and small, onchip - unless wer'e talking really substantial (>64KB range) sizes.
>>
>
> For some of the larger devicetrees in the kernel sources, decompiled
> DTBs are ~40% larger, growing from about 88kB to 120kB (these are
> mostly for TIs DRA7 platforms).
>
> Small platforms (such as some of the stm32 platforms) grow by about
> 20% depending on the platform.
>
> These are decompiled binaries, not the original sources which might be
> more verbose, have comments, etc.
>
> In an out-of-tree product tree I am working with, there's one DTB that
> grows from 185kB to 251kB when decompiled.
>
> None of the above covers the code and runtime cost, of course, just
> the size differences of the data.
>

I will admit that size difference may be a concern, although I must ask
(at the risk of opening another can of worms) how well generic data
compression like deflate and LZO compress that text.

There is also another issue here: For your analysis, you are using DTBs
from the Linux kernel sources, where quality control is strict, and from
a project you are working on, where the people are competent and
trustworthy. In both cases, I presume that the DTBs are being produced
using a standard "dtc" tool. Please consider the potential for mischief
with non-standard "dtc-like" tools. I believe that similar issues have
been observed "in the wild" with ACPI, another format that uses binary
blobs.

Whatever is adopted to describe RISC-V configuration, there will be
bargain-basement boards with $DEITY-knows-what in the configuration
ROM. I argue that keeping that "lowest passable bar" as sane as
possible is important. I generally believe that producing
syntactically-valid-but-semantically-bogus text is more difficult than
producing (possibly deliberately obfuscated) binary blobs.

>> I can see very small IOT apps will be very cost-sensitive - but they are also likely to require much smaller ROMs
>> So, what exactly are you thinking here?
>>
>
> For tiny IOT platforms where every byte counts, you're likely using
> custom-built firmware that isn't trying to be generic and probe at
> runtime. There, you might use the machine description on the toolchain
> side instead (to configure your build). Those might make sense to have
> in text/source format.

Exactly, the lowest of the low end will probably hardwire configuration,
compile it into the code, and not actually follow any of the standards
for hardware description. These will probably not be running Linux,
either. (RV32E cannot support a supervisor, per "3.3 RV32E Extensions"
in the user ISA spec.)


-- Jacob

Olof Johansson

unread,
May 17, 2017, 3:36:34 AM5/17/17
to Jacob Bachmeyer, Allen J. Baum, Karsten Merker, isa...@groups.riscv.org
Hi,
The problem is that when you introduce those restrictions, you now all
of the sudden have a third format: It's not DTS, and it's not DTB.
It's something new that lacks both the flexibility of DTS and the
compact representation of DTB. So you might as well go with DTB.

I haven't been able to see the point in trying to differentiate in
this specific area -- there are already-adopted workable solutions
available and doing this special custom solution doesn't solve any
fundamental problems with those solutions.

>> * DTS doesn't have a header that, specifies how large the text is,
>> instead you need to process the whole stream (while keeping context)
>>
>
>
> Adding a "/length:X/" marker would not be a problem.

Again, now you're no longer talking DTS, but a brand new standard.

>
>> * Lack of versioning support in the data format (DTB has a version
>> field in the header)
>>
>
> Previously, a "/dts-v1/" marker was proposed and I do not recall any
> complaint.

Right, you'd need a marker to specify that this is _not_ dts-v1.

>> * Overhead of parsing text format to build the data structure and
>> value representations
>>
>
> How much overhead is this and how does it compare to merging DTB blobs?

I don't know -- would you mind measuring or estimating it?

>> * Having to build the binary representation anyway, since that is the
>> format the OS will need (at the least Linux)
>>
>
> That is what bootloaders are for: bridging gaps between firmware and
> operating systems.

But why add the gap in the first place just for the sake of it? It
makes little sense to me.

>> * Locking into a language format that nobody considers to be truly
>> standardized in the same way as the binary representation is
>>
>
> Then, at worst, we have to standardize our own subset.

We are already at worst, given the above additional restrictions. It
really boils down to needless differentiation.

>> * (Possibly requiring custom extensions to handle connecting external
>> information)
>>
>
> How does DTB handle connecting external information without custom
> extensions?

Check out how capemgr handles it -- it's not an extension to the base
description, but instead it has information in the snippet that is
grafted in.

>> Benefits:
>> * I can read it without running dtc -I dtb -O dts on the data first.
>> * ...
>>
>
>
> I recall that at least one person on this list is developing a kernel that
> really does prefer its system description be passed in as text. (I think it
> is a Plan 9-alike.)

Hmm, seems apt to quote you above about bootloaders and bridging gaps. :-)

> There is also the question of "how bad can bad get?": is it possible to
> construct a DTB that "dtc -I dtb -O dts" will mis-parse, either by accident
> or malice?

I'm not sure what you're getting at here. If a provided DTB is
corrupted, parsing it will at some point fail or produce invalid
results.

The string references are all offsets into the string block, for
example. So the worst thing that would happen (if you have bounds
checking for offsets that reference past the end of the string block)
is that a "bad pointer" into the string block would land in the middle
of a string and thus be interpreted as something that it isn't -- it'd
still be properly \0-terminated, etc.

Have you seen the actual description of the format? It's quite
straightforward, the main structure definition is really an in-order
traversal of the tree.

See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/devicetree/booting-without-of.txt?id=HEAD
for the format, section 1, 3 and 4.

>>> And i do mean quantify. Yes, I understand that the representation will be
>>> larger, and the parser might take a bit more code, but how much more of
>>> each, and what is the cost of that increase? ROM is pretty cheap and small,
>>> onchip - unless wer'e talking really substantial (>64KB range) sizes.
>>>
>>
>>
>> For some of the larger devicetrees in the kernel sources, decompiled
>> DTBs are ~40% larger, growing from about 88kB to 120kB (these are
>> mostly for TIs DRA7 platforms).
>>
>> Small platforms (such as some of the stm32 platforms) grow by about
>> 20% depending on the platform.
>>
>> These are decompiled binaries, not the original sources which might be
>> more verbose, have comments, etc.
>>
>> In an out-of-tree product tree I am working with, there's one DTB that
>> grows from 185kB to 251kB when decompiled.
>>
>> None of the above covers the code and runtime cost, of course, just
>> the size differences of the data.
>>
>
>
> I will admit that size difference may be a concern, although I must ask (at
> the risk of opening another can of worms) how well generic data compression
> like deflate and LZO compress that text.

Then you're back to binary format, so previous arguments would be null
and void. We'd be going in circles.

> There is also another issue here: For your analysis, you are using DTBs
> from the Linux kernel sources, where quality control is strict, and from a
> project you are working on, where the people are competent and trustworthy.
> In both cases, I presume that the DTBs are being produced using a standard
> "dtc" tool. Please consider the potential for mischief with non-standard
> "dtc-like" tools. I believe that similar issues have been observed "in the
> wild" with ACPI, another format that uses binary blobs.

The kernel sources include DTS (actually, DTS that is passed through
cpp, then to dtc). So far we've done quite well on ARM with it, with
all the range of quality of systems delivered there. So far, there's
been little need for people to innovate in DTB-producing tooling; most
vendors work with the tools included in the kernel (or distributed
separately).

Areas where I've seen vendors do crazy stuff is when they don't really
understand the spirit of how to describe things in DT, but that can be
taught. Once there are good examples to base off of things tend to
stabilize.

> Whatever is adopted to describe RISC-V configuration, there will be
> bargain-basement boards with $DEITY-knows-what in the configuration ROM. I
> argue that keeping that "lowest passable bar" as sane as possible is
> important. I generally believe that producing
> syntactically-valid-but-semantically-bogus text is more difficult than
> producing (possibly deliberately obfuscated) binary blobs.

Take a look at the data format at the link above, it's really simple
and it's not really something where you can go all that far off the
tracks.

As mentioned already, on ARM we've seen quite the range of vendors
already, and it hasn't been an issue.

>>> I can see very small IOT apps will be very cost-sensitive - but they are
>>> also likely to require much smaller ROMs
>>> So, what exactly are you thinking here?
>>>
>>
>>
>> For tiny IOT platforms where every byte counts, you're likely using
>> custom-built firmware that isn't trying to be generic and probe at
>> runtime. There, you might use the machine description on the toolchain
>> side instead (to configure your build). Those might make sense to have
>> in text/source format.
>
>
> Exactly, the lowest of the low end will probably hardwire configuration,
> compile it into the code, and not actually follow any of the standards for
> hardware description. These will probably not be running Linux, either.
> (RV32E cannot support a supervisor, per "3.3 RV32E Extensions" in the user
> ISA spec.)

I'm glad to see that we are agreeing about something!


-Olof

Olof Johansson

unread,
May 17, 2017, 3:46:46 AM5/17/17
to Jacob Bachmeyer, Karsten Merker, isa...@groups.riscv.org
A string reference is an offset into the block of strings -- they're
really just concatenated strings with \0 inbetween.

So, as long as the reference is in the range of said string block, the
worst that would happen if it's pointing to something random, is that
you'd pick up the last bits of a valid string through it. No more, no
less.

You could try to validate a string by making sure that the character
_before_ the one referenced is \0; but that would actually reduce
potential compactness, since without that verification you can share
string suffixes without duplicating them ("bcd" would really just be a
reference one character into a pre-existing "abcd").

The same is essentially true for the cleartext format; if you have
random character garbling on a string, you wouldn't necessarily be
able to detect it, and you'd use the result as if it was valid.


-Olof

kr...@berkeley.edu

unread,
May 17, 2017, 6:49:07 AM5/17/17
to isa...@groups.riscv.org, sw-...@groups.riscv.org


I put a post up on sw-dev for this thread, and would like to move this
strand back to sw-dev, where it belongs.

Krste
| --
| You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
| To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAOesGMgWnnJEcsMVrcrkWvGa929v%3DVCs_ZACWUiyBKjrhGEedw%40mail.gmail.com.

Jacob Bachmeyer

unread,
May 17, 2017, 6:37:00 PM5/17/17
to Olof Johansson, Karsten Merker, isa...@groups.riscv.org
So DTB uses a string table; that is better than potentially "wandering"
references.

> So, as long as the reference is in the range of said string block,
> the worst that would happen if it's pointing to something random, is
> that you'd pick up the last bits of a valid string through it. No
> more, no less.

So bogus output, but no chance of corrupting internal state. That is good.

> You could try to validate a string by making sure that the character
> _before_ the one referenced is \0; but that would actually reduce
> potential compactness, since without that verification you can share
> string suffixes without duplicating them ("bcd" would really just be
> a reference one character into a pre-existing "abcd").

Permitting shared suffixes is fine; my larger concern was about the
possibility of "string references" that do not actually refer to
strings. Glad to see that DTB is sane in this regard.

> The same is essentially true for the cleartext format; if you have
> random character garbling on a string, you wouldn't necessarily be
> able to detect it, and you'd use the result as if it was valid.

Yes, bogus output is produced but there is no chance to send the parser
"off the rails".


-- Jacob

Jacob Bachmeyer

unread,
May 17, 2017, 9:45:23 PM5/17/17
to Olof Johansson, Allen J. Baum, Karsten Merker, isa...@groups.riscv.org
Part of this is an example of path-dependence: RISC-V began with its
own custom "configuration string" format, and pushing for a change as
far as text->binary will meet significant resistance simply due to the
extent of the change.

>>> * DTS doesn't have a header that, specifies how large the text is,
>>> instead you need to process the whole stream (while keeping context)
>> Adding a "/length:X/" marker would not be a problem.
>>
> Again, now you're no longer talking DTS, but a brand new standard.
>

To say that adding a header makes a DTS utterance somehow no longer a
DTS utterance makes no sense. It is like saying that an HTTP
Content-Length header preceding a ZIP archive somehow makes the response
entity no longer a ZIP archive. Read the header, then call your DTS parser.

>>> * Lack of versioning support in the data format (DTB has a version field in the header)
>> Previously, a "/dts-v1/" marker was proposed and I do not recall any
>> complaint.
>>
> Right, you'd need a marker to specify that this is _not_ dts-v1.
>

It would be a restricted subset of dts-v1, which any dts-v1 parser can
read, so yes, it would be dts-v1.

>>> * Overhead of parsing text format to build the data structure and value representations
>> How much overhead is this and how does it compare to merging DTB blobs?
>>
> I don't know -- would you mind measuring or estimating it?
>

I do not know how to estimate that and do not have time right now to
write a DTB merge tool that would run in an early-boot environment.
(Which means linear processing into output buffers, rather than
unflattening the trees.)

>>> * Having to build the binary representation anyway, since that is the format the OS will need (at the least Linux)
>> That is what bootloaders are for: bridging gaps between firmware and
>> operating systems.
>>
>
> But why add the gap in the first place just for the sake of it? It
> makes little sense to me.
>

For a distinction between configuration data burned into ROM (possibly
mask ROM) and boot protocols that can change with a kernel and
bootloader update.

>>> * Locking into a language format that nobody considers to be truly
>>> standardized in the same way as the binary representation is
>>>
>> Then, at worst, we have to standardize our own subset.
>>
>
> We are already at worst, given the above additional restrictions. It
> really boils down to needless differentiation.
>

And path-dependence--RISC-V started with an ISC-like platform
configuration format.

>>> * (Possibly requiring custom extensions to handle connecting external
>>> information)
>> How does DTB handle connecting external information without custom
>> extensions?
>>
>
> Check out how capemgr handles it -- it's not an extension to the base
> description, but instead it has information in the snippet that is
> grafted in.
>

Are BeagleBone capes hot-pluggable? It appears not, so these reasonably
correspond. However, according to <URL:http://elinux.org/Capemgr>
capemgr loads a DTB object from the filesystem at runtime, choosing the
file based on information in an EEPROM. The approach envisioned for
RISC-V is to skip the intermediate step and simply store that
information in the EEPROM and merge it into the main device tree at boot
instead of reading a file after the system is up.

Of course, locating that outboard configuration EEPROM is not as easy as
BeagleBone has it, either--BeagleBone has a single "cape" port,
standardized as part of the hardware, while the RISC-V ISA must support
a wide variety of different hardware. Consider a RISC-V "BeagleBone",
not unlike the ARM-based "Arduino" boards: the board ROM must describe
the "cape" port, including the buses that are connected to it, and how
to identify a module installed on that port, including indicating that
the config EEPROM stores a "foreign" configuration descriptor in
BeagleBone cape format.

Ideally, the same model should apply to DRAM SPD ROMs on RISC-V: the
board configuration ROM maps the appropriate bus ports to the memory
slots and indicates configuration ROMs in SPD format at the relevant
addresses.

>>> Benefits:
>>> * I can read it without running dtc -I dtb -O dts on the data first.
>>> * ...
>>>
>> I recall that at least one person on this list is developing a kernel that
>> really does prefer its system description be passed in as text. (I think it
>> is a Plan 9-alike.)
>>
>
> Hmm, seems apt to quote you above about bootloaders and bridging gaps. :-)
>

Fair enough, but the point is that either way we slice this, *someone*
will have a gap to bridge. (On the other hand, there could be an
argument here for using DTB. Is DTB->DTS simpler than DTS->DTB?)

>> There is also the question of "how bad can bad get?": is it possible to
>> construct a DTB that "dtc -I dtb -O dts" will mis-parse, either by accident
>> or malice?
>>
>
> I'm not sure what you're getting at here. If a provided DTB is
> corrupted, parsing it will at some point fail or produce invalid
> results.
>

What I am getting at is the possibility of a vendor attempting to
"cheat" and obfuscate the DTB blob. Is such a thing possible or will
dtc correctly decode any DTB acceptable to the Linux kernel?

> The string references are all offsets into the string block, for
> example. So the worst thing that would happen (if you have bounds
> checking for offsets that reference past the end of the string block)
> is that a "bad pointer" into the string block would land in the middle
> of a string and thus be interpreted as something that it isn't -- it'd
> still be properly \0-terminated, etc.
>

So the Linux kernel and all other common DTB readers have that bounds
checking? What do they do if given invalid input? Fail the entire
parse? Skip the node with the bad item? Something else? If we adopt
DTB as the RISC-V configuration format, can we mandate that all readers
perform this bounds checking with specific behavior on failure or would
we then be adopting a new format that is not quite DTB?

> Have you seen the actual description of the format? It's quite
> straightforward, the main structure definition is really an in-order
> traversal of the tree.
>
> See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/devicetree/booting-without-of.txt?id=HEAD
> for the format, section 1, 3 and 4.
>

Reading that, I now have more questions and some comments. (In order
with the spec you provided; line numbers are in [square brackets]. I am
reading blob 280d283304bb82d8b6b210beb97fb954d25c756d from
<URL:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/booting-without-of.txt>.)

How stable is this spec? Would we need to transcribe it into our own
standards or could we simply reference it? Would we need to effectively
write at least partial OpenFirmware bindings for RISC-V in order to use
this? Some of those "ToDo" items make me nervous, like "Add some
definitions of interrupt tree" and "Add some definitions for PCI host
bridges". Those are big gaps.

** Chapter I

Are these existing boot protocols the reason that people wanted the
supervisor to be able to turn off paging?

** Chapter II

[346] That spec says the DeviceTree block must be in RAM; we are talking
about defining a ROM format. While simple ARM bootloaders can copy a
DTB from ROM to RAM, we also need the capability to merge multiple "DTB
fragments" from different ROMs.

[370] I like the "boot_cpuid_phys" field; it looks like a good place to
put a hart ID on entry to a supervisor. Can we use hart IDs as
"physical CPU numbers"?

[396] Byte-swap is rather tedious on RISC-V, enough that an instruction
for it is expected to be in the "B" extension. Could we use a variant
format with little-endian encoding and the same magic number? (That
magic number reads as 0xedfe0dd0 if byte-swap is required. PowerPC is
big-endian, so the kernel code must already be able to support
native-endian DTB.)

[419] The memory map lists reservations rather than available regions.
How does the kernel get the actual memory map? Does it scan the entire
tree before initializing the allocator?

[438] Do I correctly gather that we would need to use at least version
17, since we must splice device trees from multiple sources?

[600] Along the lines of my "bad input" concerns, what happens if a node
references a phandle, but no node actually has that phandle? In the
example, what if a node references phandle <5> when the highest phandle
actually defined is <4>? (A range check on phandles is not sufficient,
since phandles can be sparse.)

[682] Do I correctly understand that inserting a subtree read from
another source is essentially text insertion? That would address one of
my concerns. What edits to the inserted block could be required?
Renumbering phandles? (That can be done linearly by renumbering them
all while copying nodes to RAM.) Remapping strings? (Could get
interesting if merging duplicates, but the string block only needs to be
linear at the end of the process and multiple string blocks could simply
be concatenated and string references adjusted.)

** Chapter III

[703] Could we use "riscv," instead of "linux," as a prefix on quasi-OF
property names in ROM rather than hardwiring property names that refer
to a specific supervisor? The translation is trivial and I really do
not like the idea of using Linux's vendor tag in standard RISC-V
configuration ROMs. (Or have these "linux,*" properties become
quasi-standard? Do other DTB-using supervisors also recognize them?)

[731] Since the base word size in RISC-V is 32 bits, 32-bit cells are a
good fit, I admit. I would also suggest retaining big-endian cell order
even with native-endian cells, both to simplify translation and because
big-endian is easier to read into a register piecemeal (load one
element; shift running value left; OR new element into running value) on
RV64 and RV128. In other words, the format is cell-based, except for
unit names and property data, and those are padded to the next cell
boundary. The only problem with native-endian cells is that the
equivalence between cells and byte arrays changes, but a property SHOULD
be defined as either a cell-list or byte-array anyway, not both. Using
native-endian cells would change that SHOULD to MUST.

[829] I like restricting the structural text to an ASCII subset; this is
good.

[847] I will just ask: How does the RISC-V PLIC fit into the OF
interrupt tree specification?

[890] How is a hierarchy of modules containing cores containing harts
expressed in the CPU subtree? "/cpus/module@0/core@0/hart@0"? This is
important because the expectation is that CPU modules will have their
own configuration ROMs describing the processors, while the board ROM
describes the onboard peripherals and bus ports and outboard ROMs on
otherwise non-discoverable buses describe expansion hardware. The boot
firmware will splice it all together.

[929] I presume that systems with dynamic CPU clocks simply omit
"clock-frequency"?

[947] Can memory nodes additionally be placed underneath CPU nodes to
represent module-local, core-local, or hart-local memory?

[984] Do I correctly assume that "/chosen" will never appear in
configuration ROM?

[1168] Do I correctly understand that there is no indication in DTB
whether a given property is a cell-list or a byte-array that happens to
have a length divisible by 4? That a DTB->DTS translation must either
*know* that a given property is actually a byte-array or be able to
infer it from the length? And that every property value is either null,
a string, a cell-list, or a byte-array? Are strings stored as
byte-arrays? (And yes, storing cells as big-endian values does reduce
the cell-list/byte-array distinction to whitespace and extra "base
markers".)

>>>> And i do mean quantify. Yes, I understand that the representation will be
>>>> larger, and the parser might take a bit more code, but how much more of
>>>> each, and what is the cost of that increase? ROM is pretty cheap and small,
>>>> onchip - unless wer'e talking really substantial (>64KB range) sizes.
>>>>
>>> For some of the larger devicetrees in the kernel sources, decompiled
>>> DTBs are ~40% larger, growing from about 88kB to 120kB (these are
>>> mostly for TIs DRA7 platforms).
>>>
>>> Small platforms (such as some of the stm32 platforms) grow by about
>>> 20% depending on the platform.
>>>
>>> These are decompiled binaries, not the original sources which might be
>>> more verbose, have comments, etc.
>>>
>>> In an out-of-tree product tree I am working with, there's one DTB that
>>> grows from 185kB to 251kB when decompiled.
>>>
>>> None of the above covers the code and runtime cost, of course, just
>>> the size differences of the data.
>>>
>> I will admit that size difference may be a concern, although I must ask (at
>> the risk of opening another can of worms) how well generic data compression
>> like deflate and LZO compress that text.
>>
>
> Then you're back to binary format, so previous arguments would be null
> and void. We'd be going in circles.
>

There is a difference between a binary format (compressed text) that is
processed with generic tools and a binary format (DTB) that is processed
with special tools, and I did say that introducing compression would
open another can of worms. :)

>> There is also another issue here: For your analysis, you are using DTBs
>> from the Linux kernel sources, where quality control is strict, and from a
>> project you are working on, where the people are competent and trustworthy.
>> In both cases, I presume that the DTBs are being produced using a standard
>> "dtc" tool. Please consider the potential for mischief with non-standard
>> "dtc-like" tools. I believe that similar issues have been observed "in the
>> wild" with ACPI, another format that uses binary blobs.
>>
>
> The kernel sources include DTS (actually, DTS that is passed through
> cpp, then to dtc). So far we've done quite well on ARM with it, with
> all the range of quality of systems delivered there. So far, there's
> been little need for people to innovate in DTB-producing tooling; most
> vendors work with the tools included in the kernel (or distributed
> separately).
>

While the availability of a free reference dtc implementation will
probably help, I expect that the bar for RISC-V vendors will be somewhat
lower than it is for ARM vendors, unless there are "pirate" ARM chips
already out there. As I understand it, one of the goals is that anyone
can download Rocket, tweak it, synthesize it, and start cranking out chips.

A lack of an actual need to innovate will not stop mischievous vendors
from doing so anyway. NIH is out there. (You could even argue that the
original RISC-V config string is itself an example of NIH.) :)

> Areas where I've seen vendors do crazy stuff is when they don't really
> understand the spirit of how to describe things in DT, but that can be
> taught. Once there are good examples to base off of things tend to
> stabilize.
>

So we would then need to write at least partial OpenFirmware bindings
for RISC-V?

>> Whatever is adopted to describe RISC-V configuration, there will be
>> bargain-basement boards with $DEITY-knows-what in the configuration ROM. I
>> argue that keeping that "lowest passable bar" as sane as possible is
>> important. I generally believe that producing
>> syntactically-valid-but-semantically-bogus text is more difficult than
>> producing (possibly deliberately obfuscated) binary blobs.
>>
>
> Take a look at the data format at the link above, it's really simple
> and it's not really something where you can go all that far off the
> tracks.
>

How robust are the existing DTB readers? What about blatant structural
errors: OF_DT_END_NODE at top-level, OF_DT_END with nodes still open,
very deeply nested nodes (stack overrun possible?), and such? I agree
that DTB is fairly simple, but when I hear "binary format" I still get
nervous thinking of the monstrosities that have come from Redmond.

> As mentioned already, on ARM we've seen quite the range of vendors
> already, and it hasn't been an issue.
>

This is somewhat reassuring, but I still expect RISC-V vendors to face a
lower bar. That lowered bar is a good thing in general, since it lowers
the cost of using RISC-V in larger systems where the processor is not
itself the product, but it also opens the door to new lows for processor
vendors.


-- Jacob

Sean Halle

unread,
May 18, 2017, 5:05:36 AM5/18/17
to Karsten Merker, RISC-V ISA Dev
May I make an orthogonal request, outside the debate over format, and
instead make a request for content of the configuration data?

There is a separate discussion about CSRs, that talks about implementing
them "for real", versus via trap handler versus wiring to zeros.  I'd like to
be sure that this aspect is covered in the configuration data.  It turns
out that it may be important for the configuration data to state which of
the 3 modes of CSR implementation:
1) Fully implemented
2) emulated via trap handler
3) present, but permanently zero

This comes into play, for example, with the performance counters.  It may
be necessary to "implement" the performance counter CSRs so that a binary
that accesses them won't crash.  But a given processor may do any one of
those three approaches.  Without the configuration information,  the
executable may accesses the CSR "successfully", then believes that the CSR
is there..  but it's always tied to zero..  so the code behaves poorty.

For example, think about an advanced compiler or JIT that is relying on
profile information collected through the CSRs..  it has to know what
flavor of CSR implementation is in the hardware.

Thanks,

Sean
Intensivate
http://intensivate.com

P.S. I'm not on sw-dev, and this seems to land in the isa-dev side of the machine configuration topic

*CONFIDENTIALITY NOTICE*: This e-mail and its attachments contain
information which is confidential. It's use is strictly limited to the
stated intent of the provider and is only for the intended recipient(s). If
the reader of this e-mail is not the intended recipient, or the employee,
agent or representative responsible for delivering the e-mail to the
intended recipient, you are hereby notified that any dissemination,
distribution, copying or other use of this e-mail is strictly prohibited.
If you have received this e-mail in error, please reply immediately to the
sender. Thank you.



CONFIDENTIALITY NOTICEThis e-mail and its attachments contain information which is confidential. It's use is strictly limited to the stated intent of the provider and is only for the intended recipient(s). If the reader of this e-mail is not the intended recipient, or the employee, agent or representative responsible for delivering the e-mail to the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this e-mail is strictly prohibited. If you have received this e-mail in error, please reply immediately to the sender. Thank you.



--
Gem. Par. 28 Abs. 4 Bundesdatenschutzgesetz widerspreche ich der Nutzung
sowie der Weitergabe meiner personenbezogenen Daten für Zwecke der
Werbung sowie der Markt- oder Meinungsforschung.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Karsten Merker

unread,
May 18, 2017, 12:13:11 PM5/18/17
to Jacob Bachmeyer, Benjamin Herrenschmidt, Olof Johansson, Allen J. Baum, Karsten Merker, isa...@groups.riscv.org
On Wed, May 17, 2017 at 08:45:19PM -0500, Jacob Bachmeyer wrote:
> Olof Johansson wrote:
> >I haven't been able to see the point in trying to differentiate
> >in this specific area -- there are already-adopted workable
> >solutions available and doing this special custom solution
> >doesn't solve any fundamental problems with those solutions.
>
> Part of this is an example of path-dependence: RISC-V began
> with its own custom "configuration string" format, and pushing
> for a change as far as text->binary will meet significant
> resistance simply due to the extent of the change.

Well, whether using dtb is a "change" or not somewhat depends on
the personal viewpoint. Neither the original RISC-V config
string nor dtb have ever been formally specified in a RISC-V
specification document. They both are formats that were
predominantly used in certain "circles" - the original RISC-V
config string format was largely an idea from the "hardware
development" side of things while dtb is largely the standard
format and natural choice for people coming from the "bootloader"
or "operating system" side. The latter - at least that is my
impression from the past discussion on sw-dev - largely appear to
consider the original RISC-V config string format to be an
attempt to change away from cross-platform established practice.
So both sides in this discussion can validly claim that the "other"
side is trying to change things away from established practice :-).

> ** Chapter II

I am not a specialist for devicetree, but I'll try to comment on
your questions to the best of my knowledge. Olof and Benjamin
will hopefully correct me if I get things wrong.

> [346] That spec says the DeviceTree block must be in RAM; we are
> talking about defining a ROM format. While simple ARM
> bootloaders can copy a DTB from ROM to RAM, we also need the
> capability to merge multiple "DTB fragments" from different ROMs.

As I understand it, the DTB must be available somewhere in the
processor address space, but not _necessarily_ in RAM, although
having it in RAM is recommended. The wording is "in main memory"
which I understand as "is directly CPU-addressable" in contrast to
"sits on an SPI flash chip that is not memory-mapped".

Any merge process of different fragments (be it dtb or dts) needs
RAM anyway, so this point is somewhat moot, though.

> [396] Byte-swap is rather tedious on RISC-V, enough that an
> instruction for it is expected to be in the "B" extension. Could
> we use a variant format with little-endian encoding and the same
> magic number? (That magic number reads as 0xedfe0dd0 if
> byte-swap is required. PowerPC is big-endian, so the kernel code
> must already be able to support native-endian DTB.)

Please don't do that. The format specifies big-endian for all
platforms, so stay with that. A few CPU cycles for byte-swapping
are not a valid reason for introducing an incompatibility.

> [419] The memory map lists reservations rather than available regions. How
> does the kernel get the actual memory map? Does it scan the entire tree
> before initializing the allocator?

The physical memory layout is described by the /memory node. So
yes, to get the memory map, the tree needs to be parsed, but that
is not different from the way it works in the original RISC-V
config string format.

> [438] Do I correctly gather that we would need to use at least version 17,
> since we must splice device trees from multiple sources?

To my understanding: yes.

> [600] Along the lines of my "bad input" concerns, what happens if a node
> references a phandle, but no node actually has that phandle? In the
> example, what if a node references phandle <5> when the highest phandle
> actually defined is <4>? (A range check on phandles is not sufficient,
> since phandles can be sparse.)

The same as with the original RISC-V config string format or dts
when containing a reference to a non-existant path - the
reference is unresolvable and that is an error condition.

> [703] Could we use "riscv," instead of "linux," as a prefix on quasi-OF
> property names in ROM rather than hardwiring property names that refer to a
> specific supervisor? The translation is trivial and I really do not like
> the idea of using Linux's vendor tag in standard RISC-V configuration ROMs.
> (Or have these "linux,*" properties become quasi-standard? Do other
> DTB-using supervisors also recognize them?)

The text we are currently referring to was the first attempt at
specifying device-tree. AFAIK all of the common OF-properties
that were prefixed with a "linux" vendor in the beginning have
become top-level properties without a vendor prefix in the
current devicetree spec (at devicetree.org).

The original text only has the following two properties with a
"linux" vendor prefix, and both have become top-level properties:
- linux,phandle -> phandle
- linux,stdout-path -> stdout-path

The old syntax with the "linux" vendor prefix is still supported
for compatibility reasons, but the standard is just "phandle" and
"stdout-path", so there is actually no need to rename any vendor
prefixes.

> [929] I presume that systems with dynamic CPU clocks simply omit
> "clock-frequency"?

AFAIK dynamic frequency (and voltage) scaling is a matter of the
operating system; on all systems that I know of, the firmware
only does a static clock setting and leaves DVFS to the OS as
only the OS has the necessary runtime information to actually
perform proper DVFS switching. The DVFS OPP parameters are
provided in a separate node, please cf.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/devicetree/bindings/opp/opp.txt

> [984] Do I correctly assume that "/chosen" will never appear in
> configuration ROM?

Technically it can appear, but it doesn't have to.

Jacob Bachmeyer

unread,
May 18, 2017, 6:52:31 PM5/18/17
to Karsten Merker, Benjamin Herrenschmidt, Olof Johansson, Allen J. Baum, isa...@groups.riscv.org
Then the two sides met, and DTS was somehow suggested as a compromise.

>> ** Chapter II
>>
>
> I am not a specialist for devicetree, but I'll try to comment on
> your questions to the best of my knowledge. Olof and Benjamin
> will hopefully correct me if I get things wrong.
>
>
>> [346] That spec says the DeviceTree block must be in RAM; we are
>> talking about defining a ROM format. While simple ARM
>> bootloaders can copy a DTB from ROM to RAM, we also need the
>> capability to merge multiple "DTB fragments" from different ROMs.
>>
>
> As I understand it, the DTB must be available somewhere in the
> processor address space, but not _necessarily_ in RAM, although
> having it in RAM is recommended. The wording is "in main memory"
> which I understand as "is directly CPU-addressable" in contrast to
> "sits on an SPI flash chip that is not memory-mapped".
>

Line 72: "Rev 0.3 - Precise that DT block has to be in RAM"

> Any merge process of different fragments (be it dtb or dts) needs
> RAM anyway, so this point is somewhat moot, though.
>

Yes, as does translating DTS->DTB. The only real effect of this
constraint is that lazy implementations cannot just have DTB in ROM and
pass a ROM pointer to the kernel.

>> [396] Byte-swap is rather tedious on RISC-V, enough that an
>> instruction for it is expected to be in the "B" extension. Could
>> we use a variant format with little-endian encoding and the same
>> magic number? (That magic number reads as 0xedfe0dd0 if
>> byte-swap is required. PowerPC is big-endian, so the kernel code
>> must already be able to support native-endian DTB.)
>>
>
> Please don't do that. The format specifies big-endian for all
> platforms, so stay with that. A few CPU cycles for byte-swapping
> are not a valid reason for introducing an incompatibility.
>

It is more than "a few CPU cycles" on RISC-V without a byte-swap
instruction; more like "over a dozen instructions per 32-bit word".
(Shift input, mask to extract byte, shift result, OR to insert byte in
result, repeat until all bytes have been transferred.) Reading
big-endian input one byte at a time reduces this to 11 instructions per
word but increases memory traffic. (Load MSB, shift, repeat thrice:
load next byte, OR, shift.)

And an incompatibility for a format that exists only ephemerally in RAM
while booting is of no serious concern. (If DTB is not used in
configuration ROMs.)

>> [600] Along the lines of my "bad input" concerns, what happens if a node
>> references a phandle, but no node actually has that phandle? In the
>> example, what if a node references phandle <5> when the highest phandle
>> actually defined is <4>? (A range check on phandles is not sufficient,
>> since phandles can be sparse.)
>>
>
> The same as with the original RISC-V config string format or dts
> when containing a reference to a non-existant path - the
> reference is unresolvable and that is an error condition.
>

How do existing DTB readers handle this error condition? Is it always
handled safely? Can we enumerate all possible errors in DTB and mandate
that conforming implementations handle them in specific, safe, manners?

>> [703] Could we use "riscv," instead of "linux," as a prefix on quasi-OF
>> property names in ROM rather than hardwiring property names that refer to a
>> specific supervisor? The translation is trivial and I really do not like
>> the idea of using Linux's vendor tag in standard RISC-V configuration ROMs.
>> (Or have these "linux,*" properties become quasi-standard? Do other
>> DTB-using supervisors also recognize them?)
>>
>
> The text we are currently referring to was the first attempt at
> specifying device-tree. AFAIK all of the common OF-properties
> that were prefixed with a "linux" vendor in the beginning have
> become top-level properties without a vendor prefix in the
> current devicetree spec (at devicetree.org).
>
> The original text only has the following two properties with a
> "linux" vendor prefix, and both have become top-level properties:
> - linux,phandle -> phandle
> - linux,stdout-path -> stdout-path
>
> The old syntax with the "linux" vendor prefix is still supported
> for compatibility reasons, but the standard is just "phandle" and
> "stdout-path", so there is actually no need to rename any vendor
> prefixes.
>

So the standard does change, then. Are there any risks that ROMs burned
with a DTB dump today would be unreadable due to "standards drift" in
the future? (1 yr, 5 yrs, 10 yrs, 50 yrs, 100 yrs, expected lifespan of
a durable mask ROM?)


-- Jacob

Karsten Merker

unread,
May 18, 2017, 8:23:31 PM5/18/17
to Jacob Bachmeyer, Karsten Merker, Benjamin Herrenschmidt, Olof Johansson, Allen J. Baum, isa...@groups.riscv.org
On Thu, May 18, 2017 at 05:52:28PM -0500, Jacob Bachmeyer wrote:
> Karsten Merker wrote:
> >On Wed, May 17, 2017 at 08:45:19PM -0500, Jacob Bachmeyer wrote:

> >>** Chapter II
> >
> >I am not a specialist for devicetree, but I'll try to comment on
> >your questions to the best of my knowledge. Olof and Benjamin
> >will hopefully correct me if I get things wrong.
> >
> >>[346] That spec says the DeviceTree block must be in RAM; we are
> >>talking about defining a ROM format. While simple ARM
> >>bootloaders can copy a DTB from ROM to RAM, we also need the
> >>capability to merge multiple "DTB fragments" from different ROMs.
> >
> >As I understand it, the DTB must be available somewhere in the
> >processor address space, but not _necessarily_ in RAM, although
> >having it in RAM is recommended. The wording is "in main memory"
> >which I understand as "is directly CPU-addressable" in contrast to
> >"sits on an SPI flash chip that is not memory-mapped".
>
> Line 72: "Rev 0.3 - Precise that DT block has to be in RAM"

Hm, perhaps Benjamin Herrenschmidt can comment on this.
Sorry, but I don't understand why you speak of "only
ephemerally". First, topic of the discussion is to put dtb into
configuration ROMs. Second, the final devicetree is passed down
to the bootloader and the OS and it needs to be in big endian for
that, so it IMHO absolutely doesn't make sense to first have it
in an incompatible little-endian representation and then convert
it back again to its proper big-endian representation. I also
still don't see the amount of overhead for the byteswapping
during parsing the tree as anyhow critical.

> >>[600] Along the lines of my "bad input" concerns, what
> >>happens if a node references a phandle, but no node actually
> >>has that phandle? In the example, what if a node references
> >>phandle <5> when the highest phandle actually defined is <4>?
> >>(A range check on phandles is not sufficient, since phandles
> >>can be sparse.)
> >
> >The same as with the original RISC-V config string format or dts
> >when containing a reference to a non-existant path - the
> >reference is unresolvable and that is an error condition.
>
> How do existing DTB readers handle this error condition? Is it
> always handled safely? Can we enumerate all possible errors in
> DTB and mandate that conforming implementations handle them in
> specific, safe, manners?

I am not familiar with the specific implementations of the
existing dtb parsers, I'm just a "user" of the devicetree in that
code I wrote uses the existing accessor functions to read nodes
and their properties. Probably David Chisnall and Benjamin
Herrenschmidt, who have worked on such implementations, can
comment further.
Of course the standard evolves, as does any standard. Nonetheless
the devicetree maintainers put very much emphasis on keeping
compatibility. Policy for devicetree is that changes must be
backwards-compatible, so that newer implementations still support
devicetrees written with older bindings. There has been quite a
bit of churn in the beginning of devicetree, but things have
settled for quite some time already, people have learned from
mistakes at the beginning and there is a reason why any new
bindings must go through public review: once they are accepted,
they constitute a non-revocable ABI.

No standard is perfect on the first take; mistakes can happen.
Even in the all-so-great-and-compatible PC world. ACPI has had
quite a bunch of specification updates during its lifetime, and
don't get me started on real-world problems due to incompatible
ACPI tables...

Jacob Bachmeyer

unread,
May 18, 2017, 9:40:55 PM5/18/17
to Karsten Merker, Benjamin Herrenschmidt, Olof Johansson, Allen J. Baum, isa...@groups.riscv.org
Karsten Merker wrote:
> On Thu, May 18, 2017 at 05:52:28PM -0500, Jacob Bachmeyer wrote:
>
>> Karsten Merker wrote:
>>
>>> On Wed, May 17, 2017 at 08:45:19PM -0500, Jacob Bachmeyer wrote:
>>>
>>>> ** Chapter II
>>>>
>>> I am not a specialist for devicetree, but I'll try to comment on
>>> your questions to the best of my knowledge. Olof and Benjamin
>>> will hopefully correct me if I get things wrong.
>>>
>>>
>>>
>>>> [396] Byte-swap is rather tedious on RISC-V, enough that an
>>>> instruction for it is expected to be in the "B" extension. Could
>>>> we use a variant format with little-endian encoding and the same
>>>> magic number? (That magic number reads as 0xedfe0dd0 if
>>>> byte-swap is required. PowerPC is big-endian, so the kernel code
>>>> must already be able to support native-endian DTB.)
>>>>
>>> Please don't do that. The format specifies big-endian for all
>>> platforms, so stay with that. A few CPU cycles for byte-swapping
>>> are not a valid reason for introducing an incompatibility.
>>>
>> It is more than "a few CPU cycles" on RISC-V without a
>> byte-swap instruction; more like "over a dozen instructions per
>> 32-bit word". (Shift input, mask to extract byte, shift
>> result, OR to insert byte in result, repeat until all bytes
>> have been transferred.) Reading big-endian input one byte at a
>> time reduces this to 11 instructions per word but increases
>> memory traffic. (Load MSB, shift, repeat thrice: load next
>> byte, OR, shift.)
>>
>> And an incompatibility for a format that exists only
>> ephemerally in RAM while booting is of no serious concern. (If
>> DTB is not used in configuration ROMs.)
>>
>
> Sorry, but I don't understand why you speak of "only
> ephemerally". First, topic of the discussion is to put dtb into
> configuration ROMs.

There are a few topics being conflated here: format given to supervisor
(the last draft ISA spec that specified this specified config string;
current practice appears to use DTB, at least for Linux), format
produced by boot firmware (which a boot loader can translate), and data
stored in configuration ROMs (which boot firmware must
splice/translate). These should all use the same data model to ensure
that faithful translation is possible, but there is no requirement that
they all use the same syntax. Indeed they cannot, since the
configuration ROM contents cannot be directly passed to a supervisor due
to the requirement for modular configuration. (Unless, of course,
making the supervisor go hunt for all the pieces is considered
acceptable; I believe otherwise.)

Also, please do not phrase it that way, as that can be interpreted as
presupposing a conclusion that, to my knowledge, has not been reached
and flies in the face of past consensus (as I understand it) from
discussions on this list, which was that configuration ROMs should store
text, preferably ASCII, with a possible future extension to UTF-8. The
purpose of this discussion is not to decide that configuration ROMs
should store DTB; it is to assess the suitability of DTB for that
purpose and possibly produce a well-developed outline for using DTB for
that purpose to prove that claimed suitability. I apologize in advance
if you were unaware of that subtext of "topic of the discussion is to
put dtb into configuration ROMs" when you wrote those words.

> Second, the final devicetree is passed down
> to the bootloader and the OS and it needs to be in big endian for
> that, so it IMHO absolutely doesn't make sense to first have it
> in an incompatible little-endian representation and then convert
> it back again to its proper big-endian representation. I also
> still don't see the amount of overhead for the byteswapping
> during parsing the tree as anyhow critical.
>

The magic number reliably indicates whether byte-swapping is required;
the incompatible representation would be the one passed to a kernel.
(Someone will want that last tiny bit of improved performance and will
do it eventually. Do not pretend that you can prevent that; someone
will do it.)

What is stored in configuration ROMs needs to be very well defined, and
ideally should be a single future-proof option, but using ASN.1 would be
begging for trouble. The format given to a boot loader is comparatively
more flexible, since boot firmware can be updated without having to
touch every add-on card in the field that contains a configuration ROM.
The format given to a kernel is the most flexible of all, since boot
loaders can translate whatever the firmware provides to whatever the
kernel expects and can themselves be easily replaced. To be clear, if
DTB is adopted as the configuration ROM format, ROMs would store
big-endian DTB; they must, since they are expected to be potentially
useful on other platforms as well. Native-endian DTB would be a
performance optimization to avoid repeated swaps from and to network
order and would exist only in RAM.

The endian-mismatch issue can be minimized another way: Is
byte-swapping actually necessary to read and splice DTB trees? The DTB
tokens are actually 1 byte with 3 zeros to form a cell; that one byte
can be directly read, instead of loading and byte-swapping the entire
cell. More robustly, the value can be loaded and compared to
"pre-swapped" constants. The unit name is inline ASCII in the structure
stream; only property names and values that the firmware actually itself
needs would need to be byte-swapped--and phandle values are arbitrary
anyway, so no need to swap them. Provided that unit names are
sufficient to locate splice points, byte-swapping might not be such a
high cost after all, at least for the boot firmware, which can simply
copy large chunks of tree structure without examining the values.

>>>> [600] Along the lines of my "bad input" concerns, what
>>>> happens if a node references a phandle, but no node actually
>>>> has that phandle? In the example, what if a node references
>>>> phandle <5> when the highest phandle actually defined is <4>?
>>>> (A range check on phandles is not sufficient, since phandles
>>>> can be sparse.)
>>>>
>>> The same as with the original RISC-V config string format or dts
>>> when containing a reference to a non-existant path - the
>>> reference is unresolvable and that is an error condition.
>>>
>> How do existing DTB readers handle this error condition? Is it
>> always handled safely? Can we enumerate all possible errors in
>> DTB and mandate that conforming implementations handle them in
>> specific, safe, manners?
>>
>
> I am not familiar with the specific implementations of the
> existing dtb parsers, I'm just a "user" of the devicetree in that
> code I wrote uses the existing accessor functions to read nodes
> and their properties. Probably David Chisnall and Benjamin
> Herrenschmidt, who have worked on such implementations, can
> comment further.
>

That is okay, but this is an important issue. Binary formats in general
tend to be more susceptible to security exploits from malformed inputs
and I advocate caution before potentially burying landmines in the
platform configuration. As I understand it, DTB readers have not yet
had to worry about invalid or maliciously invalid input, but baking DTB
into configuration ROMs on hardware that may not be fully trusted raises
a new risk that may not (or may have!) been previously considered.
This is reassuring.

> No standard is perfect on the first take; mistakes can happen.
>

That raises another question: If DTB is adopted, how do we get a
semantics version, not of the binary structure, but of the expected tree
structure?

> Even in the all-so-great-and-compatible PC world. ACPI has had
> quite a bunch of specification updates during its lifetime, and
> don't get me started on real-world problems due to incompatible
> ACPI tables...
>

ACPI is a good example of what not to do, and probably a source for at
least some of the aversion towards committing to a binary format in
ROM. The "all-so-great-and-compatible PC world" is revealed to be quite
different than it first seems after you scratch the veneer.


-- Jacob

Jacob Bachmeyer

unread,
May 18, 2017, 9:55:38 PM5/18/17
to Benjamin Herrenschmidt, Karsten Merker, Olof Johansson, Allen J. Baum, isa...@groups.riscv.org
Benjamin Herrenschmidt wrote:
> On Thu, 2017-05-18 at 18:07 +0200, Karsten Merker wrote:
>
>> As I understand it, the DTB must be available somewhere in the
>> processor address space, but not _necessarily_ in RAM, although
>> having it in RAM is recommended. The wording is "in main memory"
>> which I understand as "is directly CPU-addressable" in contrast to
>> "sits on an SPI flash chip that is not memory-mapped".
>>
>
> But it's trivial for some system specific boot firmware to suck it off
> an SPI flash chip if needed.
>

Indeed, and modular configuration will require boot firmware to pull in
configuration fragments from (potentially) many such chips, not
necessarily all SPI.

>> Any merge process of different fragments (be it dtb or dts) needs
>> RAM anyway, so this point is somewhat moot, though.
>>
>>
>>> [396] Byte-swap is rather tedious on RISC-V, enough that an
>>> instruction for it is expected to be in the "B" extension. Could
>>> we use a variant format with little-endian encoding and the same
>>> magic number? (That magic number reads as 0xedfe0dd0 if
>>> byte-swap is required. PowerPC is big-endian, so the kernel code
>>> must already be able to support native-endian DTB.)
>>>
>> Please don't do that. The format specifies big-endian for all
>> platforms, so stay with that. A few CPU cycles for byte-swapping
>> are not a valid reason for introducing an incompatibility.
>>
>
> Absolutely. Changing the endianness of the DTB would be an instant
> trainwreck for no benefit.
>

To be clear, if DTB is used in configuration ROMs, the ROMs would store
big-endian DTB, to support use of peripherals with other platforms. A
native-endian DTB would be ephemeral in RAM and used to avoid repeatedly
byte-swapping a received tree, processing it, and byte-swapping it back.

And I think that the required splice operations can be done on a
"wrong-endian" tree with only a handful of byte-swaps, rather than
swapping the entire tree.

>>> [419] The memory map lists reservations rather than available regions. How
>>> does the kernel get the actual memory map? Does it scan the entire tree
>>> before initializing the allocator?
>>>
>> The physical memory layout is described by the /memory node. So
>> yes, to get the memory map, the tree needs to be parsed, but that
>> is not different from the way it works in the original RISC-V
>> config string format.
>>
>
> There are a few additional things here:
>
> - The calling convention on ePAPR for Power specifies that an initial
> pool of memory can be passed via register that is known to be usable.
> This allows very early boot code to have some play room before it gets
> to parse the DT (for example to setup a stack for C etc...)
>

This is up in the air again for RISC-V; earlier drafts of the privileged
ISA required the SEE to load the supervisor from an ELF binary, so the
early boot code would have whatever .data segment the ELF headers
demand, mapped at the supervisor's choice of virtual address, but that
is back to not-yet-defined now.

> - The binary reserve map is slowly being replaced by a new in-tree
> representation which is much more flexible and powerful, though to
> simplify early boot code, it's required that the reserve-map contains
> the entries as well.
>

Have you considered a simple "early region map", listing
otherwise-unused contiguous blocks of physical memory? (In other words,
at least "some" memory other than that which holds the kernel image, the
DTB buffer, an initrd, etc..)


-- Jacob

Jacob Bachmeyer

unread,
May 18, 2017, 10:48:13 PM5/18/17
to Benjamin Herrenschmidt, Karsten Merker, Olof Johansson, Allen J. Baum, isa...@groups.riscv.org
Benjamin Herrenschmidt wrote:
> On Thu, 2017-05-18 at 17:52 -0500, Jacob Bachmeyer wrote:
>
>> Then the two sides met, and DTS was somehow suggested as a compromise.
>>
>
> DTS is not a good compromise for all the reasons already exposed by
> Olof and I.
>

The full DTS certainly is not, but a restricted subset has been
suggested. Calling it "DTT" (DeviceTree Text) would probably be the
most accurate description.

>>> As I understand it, the DTB must be available somewhere in the
>>> processor address space, but not _necessarily_ in RAM, although
>>> having it in RAM is recommended. The wording is "in main memory"
>>> which I understand as "is directly CPU-addressable" in contrast to
>>> "sits on an SPI flash chip that is not memory-mapped".
>>>
>>>
>> Line 72: "Rev 0.3 - Precise that DT block has to be in RAM"
>>
>
> I don't see a strong reason not to allow it to be in ROM. Of course a
> bootloader or firmware that wishes to update it at runtime would have
> to copy it into RAM first.
>

The kernel does not and will never assume that the DTB buffer is
writable? (Not that it will matter with modular configuration.)

>>>> [396] Byte-swap is rather tedious on RISC-V, enough that an
>>>> instruction for it is expected to be in the "B" extension. Could
>>>> we use a variant format with little-endian encoding and the same
>>>> magic number? (That magic number reads as 0xedfe0dd0 if
>>>> byte-swap is required. PowerPC is big-endian, so the kernel code
>>>> must already be able to support native-endian DTB.)
>>>>
>>>>
>>> Please don't do that. The format specifies big-endian for all
>>> platforms, so stay with that. A few CPU cycles for byte-swapping
>>> are not a valid reason for introducing an incompatibility.
>>>
>>>
>> It is more than "a few CPU cycles" on RISC-V without a byte-swap
>> instruction; more like "over a dozen instructions per 32-bit word".
>> (Shift input, mask to extract byte, shift result, OR to insert byte in
>> result, repeat until all bytes have been transferred.)
>>
>
> val = ((val << 8) & 0xFF00FF00 ) | ((val >> 8) & 0xFF00FF );
> return (val << 16) | (val >> 16);
>
> Is a bit faster (depends how many insn you need to build the mask, I'm
> not yet familiar with the RISC-V ISA).
>

The masks are two instructions each, although you could either shift one
copy back and forth, or build one mask and shift it to build the other,
for three instructions to build both masks. To save one temporary, you
could build the first mask (2 insn), use it, then shift it (1 insn) to
get the second mask. Each shift, AND, and OR is one instruction, for a
total of ... (4 shifts, 2 ANDs, 2 ORs, 3 insns to build the masks) ...
11 instructions the first time; 10 if you can keep the mask in a
register; 8 if you can keep both masks in registers. There is a reason
the "B" extension is expected to include byte-swap, but I doubt that "B"
will be added to RVG (the set of standard extensions for general-purpose
systems).

> In any case, it's peanuts overall at boot time. I don't think it would
> be noticeable.
>

It is however many instructions _per_ _cell_ in the DTB buffer, repeated
swaps on a large tree could become noticeable, although I suspect that
this would be a sign of bad software design. (Fixable if the firmware
is Free, not so much otherwise.)

> On the other hand, changing the format would be a complete trainwreck
> of patches to everything under the sun (dtc, libfdt, linux, u-boot,
> etc...)
>
>
>> Reading
>> big-endian input one byte at a time reduces this to 11 instructions per
>> word but increases memory traffic. (Load MSB, shift, repeat thrice:
>> load next byte, OR, shift.)
>>
>> And an incompatibility for a format that exists only ephemerally in RAM
>> while booting is of no serious concern. (If DTB is not used in
>> configuration ROMs.)
>>
>
> Good luck getting that into Linux...
>

Some vendor will do it, to claim that irrational last bit of "ours is
faster!". And they will maintain their own out-of-tree patch if it is
rejected (or, more likely, make the patch once and *not* maintain it,
rrrrrrrrrr).

Configuration ROMs would need to store big-endian DTB, however, to
support other platforms. A native-endian variant would exist only in RAM.

>>>> [600] Along the lines of my "bad input" concerns, what happens if a node
>>>> references a phandle, but no node actually has that phandle? In the
>>>> example, what if a node references phandle <5> when the highest phandle
>>>> actually defined is <4>? (A range check on phandles is not sufficient,
>>>> since phandles can be sparse.)
>>>>
>>>>
>>> The same as with the original RISC-V config string format or dts
>>> when containing a reference to a non-existant path - the
>>> reference is unresolvable and that is an error condition.
>>>
>>>
>> How do existing DTB readers handle this error condition? Is it always
>> handled safely? Can we enumerate all possible errors in DTB and mandate
>> that conforming implementations handle them in specific, safe, manners?
>>
>
> libfdt has pretty thorough error checking, worst case that's a detail
> if more needs to be added, I wouldn't make this a concern.
>

How thorough? Can we prove that all possible structural errors in DTB
are safely handled? (This seems to me to require an enumeration of all
possible structural errors, otherwise how can you know what fraction of
them are covered?)

This is a fundamental concern if DTB is to be used as the configuration
ROM format. We do not want the possibility of tampered hardware
presenting a maliciously invalid configuration image to act as a
persistence mechanism for malware. (Mallory's special edition would
present the bad DTB once after power up and exploit the firmware's DTB
splicer to insert a rootkit into the monitor. Subsequent reads of the
configuration "ROM" would return a correct image. Preventing low-level
exploits like this is important, since they could go undetected for
years, until the wrong information falls into the wrong hands (or the
wrong hands get to an honest factory's "golden master" and wait) and we
get another global ransomware worm. Worse, this does not necessarily
require modified hardware--at least some vendors are likely to skip
actual configuration ROMs and emulate them with a microcontroller. If
Mallory's malware can reflash that MCU, otherwise untampered hardware
can become "Mallory's special edition", without Mallory ever physically
touching it.)

>>>> [703] Could we use "riscv," instead of "linux," as a prefix on quasi-OF
>>>> property names in ROM rather than hardwiring property names that refer to a
>>>> specific supervisor? The translation is trivial and I really do not like
>>>> the idea of using Linux's vendor tag in standard RISC-V configuration ROMs.
>>>> (Or have these "linux,*" properties become quasi-standard? Do other
>>>> DTB-using supervisors also recognize them?)
>>>>
>>>>
>>> The text we are currently referring to was the first attempt at
>>> specifying device-tree. AFAIK all of the common OF-properties
>>> that were prefixed with a "linux" vendor in the beginning have
>>> become top-level properties without a vendor prefix in the
>>> current devicetree spec (at devicetree.org).
>>>
>>> The original text only has the following two properties with a
>>> "linux" vendor prefix, and both have become top-level properties:
>>> - linux,phandle -> phandle
>>> - linux,stdout-path -> stdout-path
>>>
>>> The old syntax with the "linux" vendor prefix is still supported
>>> for compatibility reasons, but the standard is just "phandle" and
>>> "stdout-path", so there is actually no need to rename any vendor
>>> prefixes.
>>>
>>>
>> So the standard does change, then. Are there any risks that ROMs burned
>> with a DTB dump today would be unreadable due to "standards drift" in
>> the future? (1 yr, 5 yrs, 10 yrs, 50 yrs, 100 yrs, expected lifespan of
>> a durable mask ROM?)
>>
>
> Not really, we tend to always support old trees. The above examples
> come from birth-time struggle when we transitioned from something based
> on Open Firmware to something that stands on its own. Such changes are
> rather seldom.

That is good.

Overall, this leaves two major concerns that need to be addressed before
DTB can be viable for RISC-V configuration ROMs: the aforementioned
security risks (generally assumed, rightly or wrongly, to be greater
with binary formats) and how to represent modular configuration.
Various minor sub-issues also exist, like how to represent the CPU
hierarchy that larger RISC-V systems will have (modules contain
processors which contain hardware threads ("harts")) in the tree--I
would really like to see RISC-V systems with some standard interconnect
that can run with a heterogeneous mix of processor modules. :)


-- Jacob

Bruce Hoult

unread,
May 19, 2017, 7:30:08 AM5/19/17
to Jacob Bachmeyer, Karsten Merker, Benjamin Herrenschmidt, Olof Johansson, Allen J. Baum, RISC-V ISA Dev
On Fri, May 19, 2017 at 1:52 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
Karsten Merker wrote:
[396] Byte-swap is rather tedious on RISC-V, enough that an
instruction for it is expected to be in the "B" extension.  Could
we use a variant format with little-endian encoding and the same
magic number?  (That magic number reads as 0xedfe0dd0 if
byte-swap is required.  PowerPC is big-endian, so the kernel code
must already be able to support native-endian DTB.)
   

Please don't do that. The format specifies big-endian for all
platforms, so stay with that. A few CPU cycles for byte-swapping
are not a valid reason for introducing an incompatibility.
 

It is more than "a few CPU cycles" on RISC-V without a byte-swap instruction; more like "over a dozen instructions per 32-bit word".  (Shift input, mask to extract byte, shift result, OR to insert byte in result, repeat until all bytes have been transferred.)  Reading big-endian input one byte at a time reduces this to 11 instructions per word but increases memory traffic.  (Load MSB, shift, repeat thrice:  load next byte, OR, shift.)

There may well be better ways, but the following is pretty obvious...

uint swap(uint i){
    return (i<<24) | (i>>24) | (((i>>16)<<24)>>16) | (((i<<16)>>24)<<16);
}

... and gcc compiles it to ...

00010164 <_Z4swapj>:
   10164:       01055713                srli    a4,a0,0x10
   10168:       01855693                srli    a3,a0,0x18
   1016c:       01851793                slli    a5,a0,0x18

   10170:       0762                    slli    a4,a4,0x18
   10172:       0542                    slli    a0,a0,0x10
   10174:       8fd5                    or      a5,a5,a3

   10176:       8341                    srli    a4,a4,0x10
   10178:       8161                    srli    a0,a0,0x18

   1017a:       8fd9                    or      a5,a5,a4
   1017c:       0542                    slli    a0,a0,0x10

   1017e:       8d5d                    or      a0,a0,a5
   10180:       8082                    ret

.. which is 11 instructions (excluding the return), which is "less than a dozen" rather than "more than a dozen", and can run in five cycles on a 3-wide machine, or six cycles on a 2-wide. Or, obviously, 11 cycles on a single issue machine.

That's 580 MB/s on a 1.6 GHz single issue machine. That's slower than RAM speeds, but it's a lot faster than gigE. (I don't count the load, because you have to do that anyway .. this is the extra work)

I really don't think this is going to be the speed-limiting factor in parsing a DTB.

Bruce Hoult

unread,
May 19, 2017, 8:38:29 AM5/19/17
to Jacob Bachmeyer, Karsten Merker, Benjamin Herrenschmidt, Olof Johansson, Allen J. Baum, RISC-V ISA Dev
And furthermore, byte swapping is completely trivial compared to parsing a text version.
Reply all
Reply to author
Forward
0 new messages