_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
On 21 May 2018 at 13:31, Eric Gorr via llvm-dev <llvm...@lists.llvm.org> wrote:
> I am working in an embedded environment with somewhat restrictive memory
> requirements where the page alignment requirements of an ADRP instruction
> cannot be guaranteed.
It sounds like you're relying on the linker optimization hints that
Clang emits. As you've seen they're designed to allow the linker to
convert adrp/add pairs into simpler nop/ldr sequences. If it works for
your purposes, great; but bear in mind it was designed as a
microarchitectural optimization so it's not guaranteed to trigger or
be able to remove all adrps if it does.
> As near as I can determine, ld.lld does not have this same feature. I am
> wondering if I am missing something, if such a feature is being planned,
MachO support in lld is pretty immature compared to ELF and it
certainly doesn't look like it's supported yet. I'm afraid I'm not
sure about the longer-term plans.
> or if there is an alternative I have not considered yet.
Ideally this would probably be handled by implementing proper
-mcmodel=tiny support in LLVM so that only ADR instructions are
emitted in the first place (instead of leaving you with a bunch of
NOPs). In ELF-land that probably wouldn't be too hard (there are
already relocations for it in the spec), but MachO is chronically
starved of free locations so that might get very nasty very quickly.
Cheers.
Tim.
Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does
require 12-bit alignment of any code section.
Now that you mention the MIPS & RISC-V alternatives, I'm not sure why
ARM actually made that choice. It obviously saves you a handful of
transistors but I can't quite believe that's all there is to it.
Cheers.
Tim.
On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev
<llvm...@lists.llvm.org> wrote:
> "ADRL produces position-independent code, because the address is calculated
> relative to PC."
>
> From this, I'd expect ADRP to simply do Xd <- PC + n*4096, where n is a 20
> bit number, just like AUIPC in RISC-V (also a 20 literal multiplied by 4096)
> or AUIPC in MIPS (16 bits multiplied by 65636 there).
Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does
require 12-bit alignment of any code section.
Now that you mention the MIPS & RISC-V alternatives, I'm not sure why
ARM actually made that choice. It obviously saves you a handful of
transistors but I can't quite believe that's all there is to it.
My understanding is that the ADRP instruction isn't supposed to be
used on its own. The result of the ADRP provides a 4k aligned address,
the following instruction such as an LDR has an immediate offset that
can reach any address within the 4k page. For example to get the
address of a global variable var with -fpic in ELF:
adrp x0, :got:var // relocation R_AARCH64_ADR_GOT_PAGE var
ldr x0, [x0, :got_lo12:var] // relocation R_AARCH64_LD64_GOT_LO12_NC
The resulting code section is 4 byte aligned, I'm not sure where the
requirement for 4k aligned sections come from unless you are planning
to use ADRP alone? Do you need just one instruction for the purposes
of reducing code size? Another possibility if you don't care about
code-size but mustn't use ADRP is (range permitting) to have the
linker turn an ADRP to ADR and replace the following instruction with
a NOP. I think that is something you'd need to maintain downstream
though.
If you can use gcc then that supports -mcmodel=tiny. How long it would
take to implementing it in LLVM would depend on how familiar you are
with LLVM and how much you know of the specification of -mcmodel=tiny;
on the assumption you aren't that familiar I'd guess at an order of
weeks.
Peter
Ah, there's definitely no linker-optimization hints for ELF. The
compiler doesn't even emit the data that the linker would need.
> As an educated opinion, how difficult might something like this be? minutes? hours? days? weeks? months?
Probably a few hours on the compiler side for me (~1 plumbing "tiny"
through as a valid option, ~1-2 implementing it in AArch64, + time
compiling etc). It's actually a pretty simple change to make as these
things go; thread-local storage is likely to be the trickiest bit.
That's assuming the linker can cope with the new relocations, which
looks plausible from a quick grep but not a foregone conclusion.
> With this explanation in hand, one other alternative I was looking at was
> using a linkerscript to essentially rebase the code and have ADRP
> instructions that would address the correct location as a result.
You mean provide the explicit (misaligned) address you intend to load
the binary at and get the linker to fix things up? Theoretically it
would have sufficient information, but I don't know how you'd convince
it not to align pages.
I think it's the segments that need to be 4K aligned (i.e. after
linking). Normally this isn't really an extra constraint because
you're just going to map them in with the MMU anyway, but in strange
embedded situations I could see it being a problem.
Consider the fully linked sequence:
adrp x0, #0
add x0, x0, #8
Starting at 0x1000 this would result in x0 == 0x1008 == pc, at 0x1ffc
it would result in x0 == 0x1008 != pc. Not good for
position-independence (or static positioning, but for different
reasons not illustrated by that example).
Peter
If you do decide to investigate the linker script route, the ALIGN
builitin function might be useful. I think the simplest way is to do
something like:
.text ALIGN(0x1000) : { *(.text) }
.my_next_section ALIGN (0x1000) : { *(my_next_section) }
Bothe .text and .my_next_section would start at 4k boundaries.
Link to docs: https://sourceware.org/binutils/docs/ld/Builtin-Functions.html#Builtin-Functions
Peter