From reading and rereading this thread multiple times, my question was:
"Is the stack pointer 16 byte aligned (even 8 byte multiple) prior to
the call instruction, or 16 byte aligned (odd 8 byte multiple) prior to
the call instruction?"
Agner answered that (below) in the cited document. See the last
sentence of his quote. I'm assuming he is correct. (But, is he? ... )
"The 64-bit systems keep the stack aligned by 16. The stack word size
is 8 bytes, but the stack must be aligned by 16 before any call
instruction. Consequently, the value of the stack pointer is always 8
modulo 16 at the entry of a procedure. A procedure must subtract an
odd multiple of 8 from the stack pointer before any call instruction."
- Agner Fog
The "System V Application Binary Interface" for AMD64 confirms that
the 16 byte alignment is on an odd 8 byte multiple from rsp, i.e.,
"%rsp + 8", prior to the call instruction, i.e., "when control is
transferred to the function entry point" (section 3.2.2 "The Stack
Frame"):
"The end of the input argument area shall be aligned on 16 (32, if
__m256 is passed on stack) byte boundary. In other words, the value
(%rsp + 8) is always a multiple of 16 (32) when control is transferred
to the function entry point."
https://web.archive.org/web/20120323195628/http://www.x86-64.org/documentation/abi.pdf
So, yes, Agner is correct, according to the ABI. And, it also means
that due to the call instruction, - which apparently pushes 8 bytes
itself for the return address - the stack must be aligned on an odd
multiple of 8 prior to the call instruction to be aligned on an even
multiple of 8 (or 16 byte aligned) upon entering the procedure.
If I'm not mistaken, the routines I've seen in the thread don't account
for the 8 byte return value pushed onto the stack by the call
instruction. They're 16 byte aligned (even 8 byte multiple) for the
address prior to the call instruction. So, the stack is aligned on an
even multiple of 8 (or 16 byte aligned) prior to the call instruction,
and will be on an odd multiple of 8 upon entry into the procedure. If
I understood Agner and the ABI correctly, this would be incorrect, as
the stack should be aligned to an even multiple of 8 (or 16 byte
aligned) upon entry into the procedure. I.e., the alignment should
compensate for the return value pushed by the call instruction.
Rod Pemberton
--
Isn't anti-hate just hate by another name? Isn't
anti-protesting just protesting by another name?
Peace is a choice that both sides rejected.