Sorry for top posting. I’m still thinking about this...
If we store ELF in ROM on RV32E systems, we can solve these problems with a very tiny loader. Perhaps we could define a subset of ELF called TELF that has the bare minimum needed information e.g. .bss has no data. Header is not page aligned (like Linux/binutils which is designed for big system).
sizeof(ELF32Header) + sizeof(ELF32ProgramHeader) * 4 +
sizeof(.text) +
sizeof(.rodata) +
sizeof(.data) +
0 /* .bss has size and NO_BITS */
We can make some assumption like .sdata is at the beginning of .data and so forth to reduce the number of PT_LOAD entries. We also don’t need any section entries or symbols for ROM, but they should be saved for debugging with an external debugger.
The loader can be smart i.e. if the text address equals the address of the address of the text in the ELF, then it’s XIP (Execute in place), so we just interpret the PT_LOAD commands necessary to copy .data to RAM and initialise .bss to zero. Of course we can just omit redundant PT_LOAD commands.
binutils executables are quite fat. We could use binutils and a regular linker script but then run them through a telf-util to slim them down and pack them into the TELF in-ROM format.
There are some problems with doing this post link as without PIC/PIE, we’d need the text addresses to be relative to a start address after the ELF headers.
XIP is one use case. Relocation of text into RAM is another use case (scratchpad may run faster on some chips). I think PIE is very important.
We actually need relocs so that we can re-write pointers to functions and data in the case that, the text and data addresses are not known at compile time. This is needed for static PIE as well as dynamic modules. Fortunately this is all relatively easy.
BTW I would really like shared to be re-enabled on the ELF toolchain. We just need to remove all of the .gnu and Glibc hacks from the resulting ELFs. ELF shared and runtime relocs for PIE is a very well understood and relatively simple problem.
The code to perform RELATIVE relocs at runtime is tiny and would not take too much ROM or nvram space. I’ll measure...
It looks like it would just be several bytes which could run out of presumably slow XIP, assuming an architecture with slow XIP, no instruction or data caches and some fast (zero-wait state) SRAM attached to the main memory port.
Here static PIE makes a lot of sense and assuming XIP size is not too strict (e.g. more than 128KiB), then having the relocation code run slowly out of XIP at reset would not be too costly. Assuming XIP non-volatile memories are cheaper than the SRAMs (this seems to be the case based on data on sizes and pricing).
I will measure musl’s static_pie init code.
musl has some nice properties. It doesn’t strictly need to be Linux specific because it has a very modular design with low coupling and uses weak linkage strategically. If I was making a femto-libc (vs newlib-nano) I would start with musl and do the following:
- remove POSIX and Linux syscalls
- keep the C stdlib stuff like <string.h>
- keep stdio - musl’s files have callbacks which don’t need to be POSIX. They could be linked to uart read write functions
- keep all the libm stuff (optional)
- make printf lighter e.g. make float support optional. one could even do a runtime check on a weak symbol to see if any math is linked in, so that linking printf doesn’t link in all of the math support.
- make it work on RV32E in M mode with a tiny XIP loader that relocates an ELF payload, optionally .text (if XIP is slow and there is no cache)