Q1
Are there any resources or examples on embedding LLVM into an ARM-based
bare-metal application? Searching in this area only turns up
information on how to use LLVM to target bare-metal when I want to
compile LLVM for linking against a bare-metal application.
Q2
Are there any memory usage benchmarks for LLVM across the common tasks
(especially loading bytecode, doing the optimization passes and finally
emitting machine code)? My target (embedded) system has only 1GB of RAM.
Background:
I'm about to embark on an effort to integrate LLVM into my bare-metal
application (for AM335x, Cortex-A8, also known as beaglebone black).
The application area is sound synthesis and the reason for embedding
LLVM is to allow users to develop their own "plugins" on the desktop
(using a live coding approach) and then load them (as LLVM bytecode) on
the embedded device. LLVM would be responsible for generating
(optimized, and especially vectorized for NEON) machine code directly on
the embedded device and it would take care of the relocation and
run-time linking duties. This last task is very important because the
RTOS (Texas Instrument's SYS/BIOS) that I'm using does not have any
dynamic linking facilities. Sharing code in the form of LLVM bytecode
also seems to sidestep the complex task of setting up a cross-compiling
toolchain which is something that I would prefer not to have to force my
users to do. In fact, my goal is to have a live coding environment
provided as a desktop application (which might also embed Clang as well
as LLVM) that allows the user to rapidly and playfully build their sound
synthesis idea (in simple C/C++ at first, Faust later maybe) and then
save the algorithm as bytecode to be copied over to the AM335x-based
device.
Thank you in advance for any help or pointers to resources that you can
provide!
Kind regards
Brian
--
Orthogonal Devices
Tokyo, Japan
www.orthogonaldevices.com
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
I'm afraid I can't answer your actual questions, but do have a couple
of comments on the background...
On Thu, 27 Jun 2019 at 09:50, Brian Clarkson via llvm-dev
<llvm...@lists.llvm.org> wrote:
> LLVM would be responsible for generating
> (optimized, and especially vectorized for NEON) machine code directly on
> the embedded device and it would take care of the relocation and
> run-time linking duties.
That's a much smaller task than what you'd get from embedding all of
LLVM. lldb is probably an example of a program with a similar problem
to you, and it gets by with just a pretty small stub of a
"debugserver" on a device. It does all CodeGen and even prelinking on
the host side, and then transfers binary data across.
The concept is called "remote JIT" in the LLVM codebase if you want to
research it more.
I think the main advantage you'd get from embedding LLVM itself over a
scheme like that would be a certain resilience to updating the RTOS on
the device (it would cope with a function sliding around in memory
even if the host is no longer available to recompile), but I bet there
are simpler ways to do that. The API surface you need to control is
probably pretty small.
> Sharing code in the form of LLVM bytecode
> also seems to sidestep the complex task of setting up a cross-compiling
> toolchain which is something that I would prefer not to have to force my
> users to do.
If you can produce bitcode on the host, you can produce an ARM binary
without forcing the users to install extra stuff. The work involved
would be pretty comparable to what you'd have to do on the RTOS side
anyway (you're unlikely to be running GNU ld against system libraries
on the RTOS), and made slightly easier by the host being more of a
"normal" LLVM environment.
Cheers.
Tim.
Thank you for taking to time to comment on the background!
I will definitely study lldb and remote JIT for ideas. I worry that I
will not be able to pre-link on the host side because the host cannot(?)
know the final memory layout of code on the client side, especially when
there are multiple plugins being loaded in different combinations on the
host and client. Is that an unfounded worry?
I suppose it is also possible to share re-locatable machine code (ELF?)
and only use client-side embedded LLVM for linking duties? Does that
simplify things appreciably? I was under the impression that if I can
compile and embed the LLVM linker then embedding LLVM's codegen
libraries would not be much extra work. Then I can allow users to use
Faust (or any other frontend) to generate bytecode in addition to my
"live coding" desktop application. So many variables to consider... :-)
Kind regards
Brian Clarkson
Orthogonal Devices
Tokyo, Japan
www.orthogonaldevices.com
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
_______________________________________________
I'm not aware of any examples unfortunately. I suspect that this could
be quite challenging depending on how rich an environment your RTOS
offers. It is possible that LLVM depends on Posix or Posix like OS
calls for things like mmap and other file abstractions. I've not
looked at this in any detail as it may be possible to strip these out
with the right configuration options, for example thread support can
be disabled. One possible approach would be to build LLVM for a linux
target and look at the dependencies. That might give you an idea of
what your are up against.
> Q2
> Are there any memory usage benchmarks for LLVM across the common tasks
> (especially loading bytecode, doing the optimization passes and finally
> emitting machine code)? My target (embedded) system has only 1GB of RAM.
>
I don't have anything specific unfortunately. It is, or at least was
possible a couple of years ago, for Clang to compile Clang on a 1GB
Raspberry Pi. I'm assuming the plugins will be smaller than the IR
generated by the largest Clang C++ file, but my Rasberry PI wasn't
doing anything else but compiling Clang.
It is possible to build a position independent code on the host and
run it on the device without needing the full complexity of a SysV
dynamic linker. As you say there are many different options depending
on how much your plugins need to communicate with the main program, or
each other, and how sophisticated a plugin loader you are comfortable
writing. There is probably much more information available online
about how to do that than embedding LLVM.
One possible approach is build your plugins on the host as some kind
of position independent ELF executable. Your program on the device
could extract the loadable parts of the ELF, copy them to memory,
resolve potential fixups (relocations in ELF) and and branch to the
entry point. In general ELF isn't compact enough for embedded systems
and it is common to post-process it into some more easily processed
form first.
Orthogonal Devices
Tokyo, Japan
www.orthogonaldevices.com
If I've understood you correctly; probably not. The LLVM linker (LLD)
is a static linker, it doesn't have any image loading functionality.
It also isn't really suited for running on top of an RTOS in the same
way that Clang isn't. There is something called llvm-link but that is
a linker for multiple bitcode files to produce a single bitcode file
which I'm guessing isn't what you want either.
I think there is a dynamic linker in one of the JITs, but I can't
remember where it is off the top of my head.
If I get some time this afternoon I'll try and find some links on
either how to write a simple dynamic loader or some examples.
Peter
So I guess I can tentatively identify my list of functional requirements as:
- load relocatable (but highly optimized) machine code
- relocate the machine code
- export symbols from the loaded machine code (available exports are not
known at compile-time)
- import symbols into the loaded machine code (required imports are not
known at compile-time)
- finally, actually execute functions exported from the loaded machine code
I latched on to LLVM because I nearly lost my mind trying read the Linux
source code for libdl.
Kind regards
Brian Clarkson
Orthogonal Devices
Tokyo, Japan
www.orthogonaldevices.com
Thank you for your helpful comments, especially on the RPI. Since my
use case is lot simpler than compiling all of Clang, I hopefully can
take your experience as a good sign.
The RTOS that TI provides for the AM335x actually has pretty complete
posix layer and other standard libraries. However, I am working without
any virtual memory subsystem, so no mmap. However, I was under the
impression that LLVM (ORC specifically) should be able to relocate code
at any memory location so the lack of mmap shouldn't be a problem?
Kind regards
Brian
Orthogonal Devices
Tokyo, Japan
www.orthogonaldevices.com
Apologies I don't know a lot about ORC, most of my knowledge is on the
static linker side. I don't think mmap is a requirement, just that a
lot of the code may have been written assuming it was present.
Hopefully there are some other people on the list with more experience
of JITs that can help.
Thinking about the requirements in your earlier mail:
> - load relocatable (but highly optimized) machine code
> - relocate the machine code
> - export symbols from the loaded machine code (available exports are not known at compile-time)
> - import symbols into the loaded machine code (required imports are not known at compile-time)
> - finally, actually execute functions exported from the loaded machine code
It sounds like you would need some kind of dynamic loader to handle
the symbol resolution and perform relocation. If the communication is
just Kernel (for want of a better name for the main program) to
Module, and not Module to Module then something like a PIE executable
for each module with the symbols exported with export-dynamic. This
would result in only a small number of relocation types that you would
need to handle, with the majority being R_ARM_RELATIVE which is just
the displacement from the static link address (usually 0), and
R_ARM_ABS32 for those requiring the address of a symbol. The major
restriction of PIE is that there is a fixed offset between code and
data.
As I understand it the linux kernel uses something like ld -r for a
relocatable link, which is essentially combines many relocatable
objects into a single one and loads that. That means that a lot of
awkward to handle relocations, especially in Thumb could be exposed.
Apologies I couldn't easily find many examples in open source projects
or guides on how to write a dynamic linker. I have had some experience
with ARM's proprietary linker which had several dynamic linking models
for more bare-metal systems
(http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0242a/index.html),
however I'm guessing you would prefer to stick to open source
components.
Peter
-Chris