Google Группы больше не поддерживают новые публикации и подписки в сети Usenet. Опубликованный ранее контент останется доступен.

IAR ARM Cortex-M compiler does not align stack on 8-byte boundary

127 просмотров
Перейти к первому непрочитанному сообщению

StateMachineCOM

не прочитано,
18 сент. 2022 г., 16:26:1518.09.2022
ARM ABI says that the stack should be 8-byte aligned, but I see cases where the stack is aligned only to 4-byte boundary.

For example, I have the following simple busy-delay function:

<pre>
void delay(int iter) {
int volatile counter = 0;
while (counter < iter) { // delay loop
++counter;
}
}
</pre>

This compiles with IAR EWARM 9.10.2 on ARM Cortex-M to the following disassembly:

<pre>
SUB SP, SP, #0x4
...
ADD SP, SP, #0x4
BX LR
</pre>

The problem is that after SUB SP,SP,4 the stack is misaligned (is aligned only to 4-byte boundary).

Why is this happening? Is this compliant with the ARM ABI? Are there any compiler options to control that?

Richard Damon

не прочитано,
18 сент. 2022 г., 16:47:0218.09.2022
I think, that as long as the function doesn't call another function it
doesn't need to respect that ABI, since it knows it isn't going to do
the operations that need the 8-byte alignment.

If it isn't *I*nterfacing with anything, the ABI doesn't apply.

StateMachineCOM

не прочитано,
18 сент. 2022 г., 20:10:5118.09.2022
Yes, the simple delay() function does not call anything. But still, interrupts can preempt it, which is quite likely because a function like this runs for a long time by design (and consumes a significant percentage of the CPU time).

In fact, I've checked it, and an interrupt preempting delay() must re-align the stack by using the "stack aligner". So the simple (no FPU) Cortex-M exception stack frame of 8 registers (32 bytes) becomes the bigger stack frame of 9 registers (36 bytes). Please note that the Cortex-M CPU deals with it just fine and the program runs. But in the case of RTOS or some other assembly code dealing with interrupts could break the system by making assumptions about the stack alignment. I thought that the compatibility with interrupts is the primary reason why the ARM ABI stipulates 8-byte stack alignment.

Also, I've just checked ARM/KEIL Compiler 6 (based on LLVM), and that compiler generated 8-byte aligned code for delay():

<pre>
SUB SP, SP, #0x8
...
ADD SP, SP, #0x8
BX LR
</pre>

Now, I don't have the time to investigate all compilers and various optimization levels. I thought that standards, like the ARM ABI, are supposed to settle things like that. I'm just a bit perplexed and couldn't find much information about that.

David Brown

не прочитано,
19 сент. 2022 г., 04:08:0219.09.2022
(Please get a real newsreader and a real newsserver, rather than using
the google groups crapware. Google groups is fine for searching old
posts, but makes a mess of posts - it ruins line endings, code
formatting, attributions, and generally breaks every Usenet posting
convention it can. If you /must/ use google groups, please make the
effort to get attributions right and to quote appropriate parts of the
earlier posts. And if you are including code snippets, fix the line
endings of your post. news.eternal-september.org is a free newsserver,
and Thunderbird is one of many free newsreaders.)

On 19/09/2022 02:10, StateMachineCOM wrote:
> Yes, the simple delay() function does not call anything. But still, interrupts can preempt it, which is quite likely because a function like this runs for a long time by design (and consumes a significant percentage of the CPU time).
>
> In fact, I've checked it, and an interrupt preempting delay() must re-align the stack by using the "stack aligner". So the simple (no FPU) Cortex-M exception stack frame of 8 registers (32 bytes) becomes the bigger stack frame of 9 registers (36 bytes). Please note that the Cortex-M CPU deals with it just fine and the program runs. But in the case of RTOS or some other assembly code dealing with interrupts could break the system by making assumptions about the stack alignment. I thought that the compatibility with interrupts is the primary reason why the ARM ABI stipulates 8-byte stack alignment.
>

The hardware has to be able to cope with interrupts occurring while
stacks are not 8-byte aligned. It's possible that it is marginally
slower or results in a bigger stack frame, but it has to work.

The key reason for stack alignment is efficiency. It makes a bigger
difference when you have caches and big internal buses, and an even
bigger difference when this is combined with multiple cores. It's also
possible that some vector and SIMD units require higher alignments. For
embedded Cortex-M devices, it would not have made much difference (I
believe the old EABI required 4 byte alignment), but requiring 8 byte
alignment is a very minor cost that makes future compatibility much
simpler. Getting it right early on avoids the kind of dog's dinner you
see in the x86 world where the 64-bit Windows stack alignment is too
small for the needs of SIMD instructions.

> Also, I've just checked ARM/KEIL Compiler 6 (based on LLVM), and that compiler generated 8-byte aligned code for delay():
>
> <pre>
> SUB SP, SP, #0x8
> ...
> ADD SP, SP, #0x8
> BX LR
> </pre>
>
> Now, I don't have the time to investigate all compilers and various optimization levels. I thought that standards, like the ARM ABI, are supposed to settle things like that. I'm just a bit perplexed and couldn't find much information about that.

A leaf function can be fine with 4 byte stack alignment. A quick test
shows gcc aligns on 8 bytes, while clang aligns at 4 bytes for a leaf
function.

An extremely useful tool for investigating this kind of thing is the
online compiler at <https://godbolt.org>. It does not include many
commercial compilers (though it has MSVC), but supports C, C++, and lots
of languages on a very wide range of compilers and targets. Here you
can see your code compiled for gcc and clang Cortex-M4 :

<https://godbolt.org/z/cc6bf6oGe>


StateMachineCOM

не прочитано,
19 сент. 2022 г., 12:09:4219.09.2022
Hi David,
Thanks for your help.

> Please get a real newsreader and a real newsserver...

I'd like to do this, but I use this newsgroup so infrequently that I don't want to buy and install anything special. Is there some online tool you'd recommend?

> An extremely useful tool for investigating this kind of thing is the online compiler

Yes, thank you. It seems indeed as a useful tool for a quick look at the generated assembly.

But regarding the stack alignment requirements, The "ARM Procedure Call Standard for the ARM Architecture" (ARM IHI 0042E) says in Section 5.2.1.1 "Universal stack constraints" that "SP mod 4 = 0, The stack must at all times be aligned at word boundary". Later in the next Section 5.2.1.2 "Stack constraints at a public interface" it strengthens the requirements to: "SP mod 8 = 0. The stack must be double-word aligned".

So the question now is: what do they mean by "public interface"?

David Brown

не прочитано,
19 сент. 2022 г., 14:16:1219.09.2022
On 19/09/2022 18:09, StateMachineCOM wrote:
> Hi David, Thanks for your help.
>
>> Please get a real newsreader and a real newsserver...
>
> I'd like to do this, but I use this newsgroup so infrequently that I
> don't want to buy and install anything special. Is there some online
> tool you'd recommend?
>

Thunderbird is free - as are any of a dozen different newsreaders,
depending on preferences and OS. Many other email programs also support
Usenet. There are several free Usenet servers, at least for non-binary
groups like those in comp.* news.eternal-september.org is a popular
one. Your ISP might also provide the service, as it used to be a
standard part of any internet access package.

I don't know of any free online interfaces other than google groups,
which is barely worth the price (although as always with google, it's
good for searching). There are several paid-for services, mostly
targeting binary groups (which used to be a popular way to spread
pirated software and media, before bittorrent).

Technical groups are all text posts, and most have relatively few posts.
Even if you start your newsreader once a month, it will take no more
than a few seconds to download all posts in comp.arch.embedded to bring
it up to date.

>> An extremely useful tool for investigating this kind of thing is
>> the online compiler
>
> Yes, thank you. It seems indeed as a useful tool for a quick look at
> the generated assembly.
>

I use it all the time, for looking at code on different targets,
comparing different options, checking complicated syntax (such as
testing C++ features in the latest standards, newer than the compilers I
have online), comparing the output of different compilers, sharing code
with others via links, checking if the code I write gives exactly the
assembly I want, amongst other things.

> But regarding the stack alignment requirements, The "ARM Procedure
> Call Standard for the ARM Architecture" (ARM IHI 0042E) says in
> Section 5.2.1.1 "Universal stack constraints" that "SP mod 4 = 0, The
> stack must at all times be aligned at word boundary". Later in the
> next Section 5.2.1.2 "Stack constraints at a public interface" it
> strengthens the requirements to: "SP mod 8 = 0. The stack must be
> double-word aligned".
>
> So the question now is: what do they mean by "public interface"?

I guess that means when calling code, or being called from code, that is
independently compiled. When it is within the same compiled code, you
don't have to follow the standard ABI at all - you (meaning "the
compiler") can make your own rules regarding parameter passing, volatile
/ non-volatile registers, etc.

Richard Damon

не прочитано,
19 сент. 2022 г., 21:48:5519.09.2022
Yes, the Standard API defines what functions are allowed to presume when
they are called by "unknown" code. That is what is allowed at a "Public
API", being public, anyone can call it.

Since routines are allowed to assume they are entered with a stack
pointer aligned to a multiple of 8, the caller needs to assure that (at
least if their entry at a public API also had the stack pointer properly
aligned).

The purpose of this is that some common instructions require their
source/destination to be so aligned, and it is a bit awkward to write a
subroutine that might be called with a stack pointer that isn't so
aligned to make the pointer so aligned (it typically costs a register to
hold the old SP), so the ABI requires the stack to be so aligned.

If a piece of code doesn't call any outside routines, then this isn't a
problem, so the ABI doesn't restrict the stack pointer at those times.
This is important, as it isn't uncommon to want to temporarily push a
single word onto the stack for a bit, and it the stack pointer needed to
be kept at an alignment of 8, that operation would need to use up extra
stack memory.
0 новых сообщений