On 2023-05-24, Michael Matz wrote:
>Hello,
>
>On Wed, 24 May 2023, Florian Weimer wrote:
>
>> > Isn't it an inherent assumption that all object files contributing to
>> > a single final binary use the same model?
>>
>> It would certainly be nice if the larger models would be usable without
>> having to rebuild GCC and glibc to get the appropriate crt*.o files.
>> That's only possible with some level of ABI compatibility.
>
>It's downward, but not upward compatible. So it's enough if the crt*.o
>files would be large-model aware i.e. simply would use 64-bit relocs for
>everything they need to do, instead of PC-relative ones. There's no harm
>(except for a couple bytes code-size waste) vis the small/medium models.
>
>Otherwise, in principle I agree with Jan, all .o files have to be compiled
>with the same (or a larger) model. It simply can't be made to work 100%
>reliable otherwise. E.g. if the text size will be > 2GB for whatever
>reason, and you link in some small/medium model .o (e.g. from static
>archives) you will run into problems. _Even_ if we were to try to specify
>a way where some use-cases would happen to work.
>
>Like Fangruis suggestion of still trying to keep .data small (by moving
>large data items to .ldata) in the large model, even though that's not
>necessary. It will make some more mixed examples happen to work, without
>being a reliable solution for the mixed-model problem.
On
https://maskray.me/blog/2023-05-14-relocation-overflow-and-code-models#aarch64-code-models
, I have analyzed AArch64 code models. Given that AArch64 and x86-64
executables are comparable in size, and R_AARCH64_ADR_PREL_PG_HI21 has a
doubled range, AArch64's small code model is good enough for many
programs that only slightly exceed the normal limit.
For these programs on x86-64, compiling the majority of source files
with -mcmodel= with a suitable -mlarge-data-threshold=N interracts well
with the -mcmodel=small object files (glibc, libstdc++, prebuilt .a
files, etc). I find it odd that changing some -mcmodel=medium object
files to use -mcmodel=large may exert relocation overflow pressure to
-mcmodel=small object files, therefore I created a GCC patch and was
asked to start a discussion in the x86-64 ABI group.
I agree that mixing object files doesn't unleash the full power of
-mcmodel=medium and -mcmodel=large, but I don't think that matters.
Practicality should weigh a lot when we discussing the code models,
otherwise we won't need regular/large data section distinction.
>So, I'm fine with putting a _suggestion_ of such split even in the large
>model into the psABI, as a hint. But I'm not fine with putting in
>language that implies that .ldata (and friends) have to be used in the
>large model.
If I change
+ Like the medium code model, the data sections are split into two
+ parts.
to
+ Like the medium code model, the data sections can be split into two
+ parts.
for
https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/42 ,
would that look good?
>FWIW, I consider the large code models to be a theoretic excercise in
>trying to reach feature completion. At the time we were adding it to the
>psABI there were only hear-say inhouse requirements for actually having
>ELF executables or shared libs that had > 2GB text, and I have never seen
>such in public. The large model code sequences are _so_ bad that really
>noone should use it, but rather split the code into different shared libs,
>if must be. The only case where the large model is necessary is if you
>have an individual function that's larger than 2GB, and if your
>project/code structure is bad enough to result in _that_ then you have
>other problems and obviously aren't interested in performance.
>
>
>Ciao,
>Michael.
If we use range extension thunks instead of
_GLOBAL_OFFSET_TABLE_+R_X86_64_PLTOFF64+indirect jump, the large code
model code sequence will not be that bad. I think global data symbols
are not a bottleneck for a significant portion or programs.
(
Large code models may be used by some JIT engines. I know hear-say that
LLVM MCJIT (now deprecated by Orc) uses(used?) the x86-64(?) large code
model.)