Part of this is an example of path-dependence: RISC-V began with its
own custom "configuration string" format, and pushing for a change as
far as text->binary will meet significant resistance simply due to the
extent of the change.
>>> * DTS doesn't have a header that, specifies how large the text is,
>>> instead you need to process the whole stream (while keeping context)
>> Adding a "/length:X/" marker would not be a problem.
>>
> Again, now you're no longer talking DTS, but a brand new standard.
>
To say that adding a header makes a DTS utterance somehow no longer a
DTS utterance makes no sense. It is like saying that an HTTP
Content-Length header preceding a ZIP archive somehow makes the response
entity no longer a ZIP archive. Read the header, then call your DTS parser.
>>> * Lack of versioning support in the data format (DTB has a version field in the header)
>> Previously, a "/dts-v1/" marker was proposed and I do not recall any
>> complaint.
>>
> Right, you'd need a marker to specify that this is _not_ dts-v1.
>
It would be a restricted subset of dts-v1, which any dts-v1 parser can
read, so yes, it would be dts-v1.
>>> * Overhead of parsing text format to build the data structure and value representations
>> How much overhead is this and how does it compare to merging DTB blobs?
>>
> I don't know -- would you mind measuring or estimating it?
>
I do not know how to estimate that and do not have time right now to
write a DTB merge tool that would run in an early-boot environment.
(Which means linear processing into output buffers, rather than
unflattening the trees.)
>>> * Having to build the binary representation anyway, since that is the format the OS will need (at the least Linux)
>> That is what bootloaders are for: bridging gaps between firmware and
>> operating systems.
>>
>
> But why add the gap in the first place just for the sake of it? It
> makes little sense to me.
>
For a distinction between configuration data burned into ROM (possibly
mask ROM) and boot protocols that can change with a kernel and
bootloader update.
>>> * Locking into a language format that nobody considers to be truly
>>> standardized in the same way as the binary representation is
>>>
>> Then, at worst, we have to standardize our own subset.
>>
>
> We are already at worst, given the above additional restrictions. It
> really boils down to needless differentiation.
>
And path-dependence--RISC-V started with an ISC-like platform
configuration format.
>>> * (Possibly requiring custom extensions to handle connecting external
>>> information)
>> How does DTB handle connecting external information without custom
>> extensions?
>>
>
> Check out how capemgr handles it -- it's not an extension to the base
> description, but instead it has information in the snippet that is
> grafted in.
>
Are BeagleBone capes hot-pluggable? It appears not, so these reasonably
correspond. However, according to <URL:
http://elinux.org/Capemgr>
capemgr loads a DTB object from the filesystem at runtime, choosing the
file based on information in an EEPROM. The approach envisioned for
RISC-V is to skip the intermediate step and simply store that
information in the EEPROM and merge it into the main device tree at boot
instead of reading a file after the system is up.
Of course, locating that outboard configuration EEPROM is not as easy as
BeagleBone has it, either--BeagleBone has a single "cape" port,
standardized as part of the hardware, while the RISC-V ISA must support
a wide variety of different hardware. Consider a RISC-V "BeagleBone",
not unlike the ARM-based "Arduino" boards: the board ROM must describe
the "cape" port, including the buses that are connected to it, and how
to identify a module installed on that port, including indicating that
the config EEPROM stores a "foreign" configuration descriptor in
BeagleBone cape format.
Ideally, the same model should apply to DRAM SPD ROMs on RISC-V: the
board configuration ROM maps the appropriate bus ports to the memory
slots and indicates configuration ROMs in SPD format at the relevant
addresses.
>>> Benefits:
>>> * I can read it without running dtc -I dtb -O dts on the data first.
>>> * ...
>>>
>> I recall that at least one person on this list is developing a kernel that
>> really does prefer its system description be passed in as text. (I think it
>> is a Plan 9-alike.)
>>
>
> Hmm, seems apt to quote you above about bootloaders and bridging gaps. :-)
>
Fair enough, but the point is that either way we slice this, *someone*
will have a gap to bridge. (On the other hand, there could be an
argument here for using DTB. Is DTB->DTS simpler than DTS->DTB?)
>> There is also the question of "how bad can bad get?": is it possible to
>> construct a DTB that "dtc -I dtb -O dts" will mis-parse, either by accident
>> or malice?
>>
>
> I'm not sure what you're getting at here. If a provided DTB is
> corrupted, parsing it will at some point fail or produce invalid
> results.
>
What I am getting at is the possibility of a vendor attempting to
"cheat" and obfuscate the DTB blob. Is such a thing possible or will
dtc correctly decode any DTB acceptable to the Linux kernel?
> The string references are all offsets into the string block, for
> example. So the worst thing that would happen (if you have bounds
> checking for offsets that reference past the end of the string block)
> is that a "bad pointer" into the string block would land in the middle
> of a string and thus be interpreted as something that it isn't -- it'd
> still be properly \0-terminated, etc.
>
So the Linux kernel and all other common DTB readers have that bounds
checking? What do they do if given invalid input? Fail the entire
parse? Skip the node with the bad item? Something else? If we adopt
DTB as the RISC-V configuration format, can we mandate that all readers
perform this bounds checking with specific behavior on failure or would
we then be adopting a new format that is not quite DTB?
Reading that, I now have more questions and some comments. (In order
with the spec you provided; line numbers are in [square brackets]. I am
reading blob 280d283304bb82d8b6b210beb97fb954d25c756d from
<URL:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/booting-without-of.txt>.)
How stable is this spec? Would we need to transcribe it into our own
standards or could we simply reference it? Would we need to effectively
write at least partial OpenFirmware bindings for RISC-V in order to use
this? Some of those "ToDo" items make me nervous, like "Add some
definitions of interrupt tree" and "Add some definitions for PCI host
bridges". Those are big gaps.
** Chapter I
Are these existing boot protocols the reason that people wanted the
supervisor to be able to turn off paging?
** Chapter II
[346] That spec says the DeviceTree block must be in RAM; we are talking
about defining a ROM format. While simple ARM bootloaders can copy a
DTB from ROM to RAM, we also need the capability to merge multiple "DTB
fragments" from different ROMs.
[370] I like the "boot_cpuid_phys" field; it looks like a good place to
put a hart ID on entry to a supervisor. Can we use hart IDs as
"physical CPU numbers"?
[396] Byte-swap is rather tedious on RISC-V, enough that an instruction
for it is expected to be in the "B" extension. Could we use a variant
format with little-endian encoding and the same magic number? (That
magic number reads as 0xedfe0dd0 if byte-swap is required. PowerPC is
big-endian, so the kernel code must already be able to support
native-endian DTB.)
[419] The memory map lists reservations rather than available regions.
How does the kernel get the actual memory map? Does it scan the entire
tree before initializing the allocator?
[438] Do I correctly gather that we would need to use at least version
17, since we must splice device trees from multiple sources?
[600] Along the lines of my "bad input" concerns, what happens if a node
references a phandle, but no node actually has that phandle? In the
example, what if a node references phandle <5> when the highest phandle
actually defined is <4>? (A range check on phandles is not sufficient,
since phandles can be sparse.)
[682] Do I correctly understand that inserting a subtree read from
another source is essentially text insertion? That would address one of
my concerns. What edits to the inserted block could be required?
Renumbering phandles? (That can be done linearly by renumbering them
all while copying nodes to RAM.) Remapping strings? (Could get
interesting if merging duplicates, but the string block only needs to be
linear at the end of the process and multiple string blocks could simply
be concatenated and string references adjusted.)
** Chapter III
[703] Could we use "riscv," instead of "linux," as a prefix on quasi-OF
property names in ROM rather than hardwiring property names that refer
to a specific supervisor? The translation is trivial and I really do
not like the idea of using Linux's vendor tag in standard RISC-V
configuration ROMs. (Or have these "linux,*" properties become
quasi-standard? Do other DTB-using supervisors also recognize them?)
[731] Since the base word size in RISC-V is 32 bits, 32-bit cells are a
good fit, I admit. I would also suggest retaining big-endian cell order
even with native-endian cells, both to simplify translation and because
big-endian is easier to read into a register piecemeal (load one
element; shift running value left; OR new element into running value) on
RV64 and RV128. In other words, the format is cell-based, except for
unit names and property data, and those are padded to the next cell
boundary. The only problem with native-endian cells is that the
equivalence between cells and byte arrays changes, but a property SHOULD
be defined as either a cell-list or byte-array anyway, not both. Using
native-endian cells would change that SHOULD to MUST.
[829] I like restricting the structural text to an ASCII subset; this is
good.
[847] I will just ask: How does the RISC-V PLIC fit into the OF
interrupt tree specification?
[890] How is a hierarchy of modules containing cores containing harts
expressed in the CPU subtree? "/cpus/module@0/core@0/hart@0"? This is
important because the expectation is that CPU modules will have their
own configuration ROMs describing the processors, while the board ROM
describes the onboard peripherals and bus ports and outboard ROMs on
otherwise non-discoverable buses describe expansion hardware. The boot
firmware will splice it all together.
[929] I presume that systems with dynamic CPU clocks simply omit
"clock-frequency"?
[947] Can memory nodes additionally be placed underneath CPU nodes to
represent module-local, core-local, or hart-local memory?
[984] Do I correctly assume that "/chosen" will never appear in
configuration ROM?
[1168] Do I correctly understand that there is no indication in DTB
whether a given property is a cell-list or a byte-array that happens to
have a length divisible by 4? That a DTB->DTS translation must either
*know* that a given property is actually a byte-array or be able to
infer it from the length? And that every property value is either null,
a string, a cell-list, or a byte-array? Are strings stored as
byte-arrays? (And yes, storing cells as big-endian values does reduce
the cell-list/byte-array distinction to whitespace and extra "base
markers".)
>>>> And i do mean quantify. Yes, I understand that the representation will be
>>>> larger, and the parser might take a bit more code, but how much more of
>>>> each, and what is the cost of that increase? ROM is pretty cheap and small,
>>>> onchip - unless wer'e talking really substantial (>64KB range) sizes.
>>>>
>>> For some of the larger devicetrees in the kernel sources, decompiled
>>> DTBs are ~40% larger, growing from about 88kB to 120kB (these are
>>> mostly for TIs DRA7 platforms).
>>>
>>> Small platforms (such as some of the stm32 platforms) grow by about
>>> 20% depending on the platform.
>>>
>>> These are decompiled binaries, not the original sources which might be
>>> more verbose, have comments, etc.
>>>
>>> In an out-of-tree product tree I am working with, there's one DTB that
>>> grows from 185kB to 251kB when decompiled.
>>>
>>> None of the above covers the code and runtime cost, of course, just
>>> the size differences of the data.
>>>
>> I will admit that size difference may be a concern, although I must ask (at
>> the risk of opening another can of worms) how well generic data compression
>> like deflate and LZO compress that text.
>>
>
> Then you're back to binary format, so previous arguments would be null
> and void. We'd be going in circles.
>
There is a difference between a binary format (compressed text) that is
processed with generic tools and a binary format (DTB) that is processed
with special tools, and I did say that introducing compression would
open another can of worms. :)
>> There is also another issue here: For your analysis, you are using DTBs
>> from the Linux kernel sources, where quality control is strict, and from a
>> project you are working on, where the people are competent and trustworthy.
>> In both cases, I presume that the DTBs are being produced using a standard
>> "dtc" tool. Please consider the potential for mischief with non-standard
>> "dtc-like" tools. I believe that similar issues have been observed "in the
>> wild" with ACPI, another format that uses binary blobs.
>>
>
> The kernel sources include DTS (actually, DTS that is passed through
> cpp, then to dtc). So far we've done quite well on ARM with it, with
> all the range of quality of systems delivered there. So far, there's
> been little need for people to innovate in DTB-producing tooling; most
> vendors work with the tools included in the kernel (or distributed
> separately).
>
While the availability of a free reference dtc implementation will
probably help, I expect that the bar for RISC-V vendors will be somewhat
lower than it is for ARM vendors, unless there are "pirate" ARM chips
already out there. As I understand it, one of the goals is that anyone
can download Rocket, tweak it, synthesize it, and start cranking out chips.
A lack of an actual need to innovate will not stop mischievous vendors
from doing so anyway. NIH is out there. (You could even argue that the
original RISC-V config string is itself an example of NIH.) :)
> Areas where I've seen vendors do crazy stuff is when they don't really
> understand the spirit of how to describe things in DT, but that can be
> taught. Once there are good examples to base off of things tend to
> stabilize.
>
So we would then need to write at least partial OpenFirmware bindings
for RISC-V?
>> Whatever is adopted to describe RISC-V configuration, there will be
>> bargain-basement boards with $DEITY-knows-what in the configuration ROM. I
>> argue that keeping that "lowest passable bar" as sane as possible is
>> important. I generally believe that producing
>> syntactically-valid-but-semantically-bogus text is more difficult than
>> producing (possibly deliberately obfuscated) binary blobs.
>>
>
> Take a look at the data format at the link above, it's really simple
> and it's not really something where you can go all that far off the
> tracks.
>
How robust are the existing DTB readers? What about blatant structural
errors: OF_DT_END_NODE at top-level, OF_DT_END with nodes still open,
very deeply nested nodes (stack overrun possible?), and such? I agree
that DTB is fairly simple, but when I hear "binary format" I still get
nervous thinking of the monstrosities that have come from Redmond.
> As mentioned already, on ARM we've seen quite the range of vendors
> already, and it hasn't been an issue.
>
This is somewhat reassuring, but I still expect RISC-V vendors to face a
lower bar. That lowered bar is a good thing in general, since it lowers
the cost of using RISC-V in larger systems where the processor is not
itself the product, but it also opens the door to new lows for processor
vendors.
-- Jacob