(Please get a real newsreader and a real newsserver, rather than using
the google groups crapware. Google groups is fine for searching old
posts, but makes a mess of posts - it ruins line endings, code
formatting, attributions, and generally breaks every Usenet posting
convention it can. If you /must/ use google groups, please make the
effort to get attributions right and to quote appropriate parts of the
earlier posts. And if you are including code snippets, fix the line
endings of your post.
news.eternal-september.org is a free newsserver,
and Thunderbird is one of many free newsreaders.)
On 19/09/2022 02:10, StateMachineCOM wrote:
> Yes, the simple delay() function does not call anything. But still, interrupts can preempt it, which is quite likely because a function like this runs for a long time by design (and consumes a significant percentage of the CPU time).
>
> In fact, I've checked it, and an interrupt preempting delay() must re-align the stack by using the "stack aligner". So the simple (no FPU) Cortex-M exception stack frame of 8 registers (32 bytes) becomes the bigger stack frame of 9 registers (36 bytes). Please note that the Cortex-M CPU deals with it just fine and the program runs. But in the case of RTOS or some other assembly code dealing with interrupts could break the system by making assumptions about the stack alignment. I thought that the compatibility with interrupts is the primary reason why the ARM ABI stipulates 8-byte stack alignment.
>
The hardware has to be able to cope with interrupts occurring while
stacks are not 8-byte aligned. It's possible that it is marginally
slower or results in a bigger stack frame, but it has to work.
The key reason for stack alignment is efficiency. It makes a bigger
difference when you have caches and big internal buses, and an even
bigger difference when this is combined with multiple cores. It's also
possible that some vector and SIMD units require higher alignments. For
embedded Cortex-M devices, it would not have made much difference (I
believe the old EABI required 4 byte alignment), but requiring 8 byte
alignment is a very minor cost that makes future compatibility much
simpler. Getting it right early on avoids the kind of dog's dinner you
see in the x86 world where the 64-bit Windows stack alignment is too
small for the needs of SIMD instructions.
> Also, I've just checked ARM/KEIL Compiler 6 (based on LLVM), and that compiler generated 8-byte aligned code for delay():
>
> <pre>
> SUB SP, SP, #0x8
> ...
> ADD SP, SP, #0x8
> BX LR
> </pre>
>
> Now, I don't have the time to investigate all compilers and various optimization levels. I thought that standards, like the ARM ABI, are supposed to settle things like that. I'm just a bit perplexed and couldn't find much information about that.
A leaf function can be fine with 4 byte stack alignment. A quick test
shows gcc aligns on 8 bytes, while clang aligns at 4 bytes for a leaf
function.
An extremely useful tool for investigating this kind of thing is the
online compiler at <
https://godbolt.org>. It does not include many
commercial compilers (though it has MSVC), but supports C, C++, and lots
of languages on a very wide range of compilers and targets. Here you
can see your code compiled for gcc and clang Cortex-M4 :
<
https://godbolt.org/z/cc6bf6oGe>