In the last couple of weeks I have been working on various issues related to aarch64 port. I have managed to make good progress and I will be sending new patches soon.
Two issues had to do with making Java run on aarch64 -
https://github.com/cloudius-systems/osv/issues/1145 and
https://github.com/cloudius-systems/osv/issues/1157. After exchanging some emails on the openjdk emailing list and researching this problem, I finally discovered that the problem only happens when JIT is enabled and is caused by the fact that the JIT compiler generates machine code to access arbitrary address in memory in a way that assumes all addresses are 48 bits, meaning first 16 bits are 0. And here are the details:
"Once I got hold of the JDK debuginfo files and identified the patching code - MacroAssembler::pd_patch_instruction(), I was able to put a breakpoint in it and see something very revealing:
#0 MacroAssembler::pd_patch_instruction_size (branch=0x20000879465c "\351\377\237\322\351\377\277\362\351\377\337\362\n\243\352\227\037",
target=0xffffa00042c862e0 "\020zB") at src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:75
#1 0x0000100000bc13cc in MacroAssembler::pd_patch_instruction (file=0x0, line=0, target=0xffffa00042c862e0 "\020zB", branch=<optimized out>)
at src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp:626
#2 NativeMovConstReg::set_data (this=0x20000879465c, x=-105551995837728) at src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp:262
#3 0x0000100000850bd0 in CompiledIC::set_ic_destination_and_value (value=0xffffa00042c862e0,
entry_point=0x20000823d290 "(\b@\271\b\001]\322*\005@\371\037\001\n\353,\001", <incomplete sequence \371\200>, this=<optimized out>)
at src/hotspot/share/code/compiledIC.hpp:193
#4 ICStub::finalize (this=<optimized out>) at src/hotspot/share/code/icBuffer.cpp:91
#5 ICStubInterface::finalize (this=<optimized out>, self=<optimized out>) at src/hotspot/share/code/icBuffer.cpp:43
#6 0x0000100000e30958 in StubQueue::stub_finalize (this=0xffffa00041555300, s=<optimized out>) at src/hotspot/share/code/stubs.hpp:168
#7 StubQueue::remove_first (this=0xffffa00041555300) at src/hotspot/share/code/stubs.cpp:175
....
The corresponding crash value of X9 was this:
0x0000
a00042c862e0vs the target argument of pd_patch_instruction() (see above in the backtrace):
0xffff
a00042c862e0Now given this comment:
// Move a constant pointer into r. In AArch64 mode the virtual
// address space is 48 bits in size, so we only need three
// instructions to create a patchable instruction sequence that can
// reach anywhere.
and this fragment of pd_patch_instruction() -
https://github.com/openjdk/jdk17u/blob/6f0f42630eac1febf562062afc523fdf3d2a920a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L152-L161it seems that the code to load x8 register with an address gets patched with 0x0000a00042c862e0 instead of 0xffffa00042c862e0. It is interesting that this assert -
https://github.com/openjdk/jdk17u/blob/6f0f42630eac1febf562062afc523fdf3d2a920a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L77 - does not get hit.
The bottom line is that the valid address 0xffffa00042c862e0 gets truncated to 0x0000a00042c862e0 I guess based on the assumption that in Linux all userspace addresses are 48-bits long (see
https://www.kernel.org/doc/html/latest/arm64/memory.html). In OSv unikernel, there is no separation between user space and kernel space, and it happens that addresses returned by malloc fall into this range:
0xffffa00000000000 - 0xffffafffffffffff
So I guess the only solution to fix it on the OSv side would be to tweak its virtual memory mapping for mallocs and make sure it never uses virtual addresses > 48-bits."
Currently OSv maps this part of virtual memory like so:
------ 0x ffff 8000 0000 0000 phys_mem --\
| | |- Main Area - 16T
------ 0x ffff 9000 0000 0000 --X
| | |- Page Area - 16T
------ 0x ffff a000 0000 0000 --X
| | |- Mempool Area - 16T
------ 0x ffff b000 0000 0000 --X
| | |- Debug Area - 80T
------ 0x ffff ffff ffff ffff --/
I wonder if this was arbitrary choice made early in OSv design and there was some good reason for it.
Could this be changed to this:
------ 0x 0000 8000 0000 0000 phys_mem --\
| | |- Main Area - 16T
------ 0x 0000 9000 0000 0000 --X
| | |- Page Area - 16T
------ 0x 0000 a000 0000 0000 --X
| | |- Mempool Area - 16T
------ 0x 0000 b000 0000 0000 --X
| | |- Debug Area - 80T
------ 0x 0000 ffff ffff ffff --/
I did manage to hack the code for aarch64 and it seems to be working.
Now going forward I think Linux will extend the userspace addresses eventually from 48 bits to 56 bits (see
https://en.wikipedia.org/wiki/Intel_5-level_paging) or higher. And dotnet actually made a fix to disable this high 16-bits hack. But given there are Linux apps that may assume that addresses are 48-bit and take advantage of it, it might be wise to change the OSv virtual memory layout to use the lower part only (<=
0x 0000 ffff ffff ffff).
What do you think?
Waldek