Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

64 bit stack alignment

1,154 views
Skip to first unread message

Alex

unread,
Aug 29, 2017, 11:14:38 AM8/29/17
to
On 64 bit Windows, stack alignment on a 16 byte boundary is required
before calling all except a leaf function. In the called function, the
stack is 8 mod 16.

Now, I'm struggling to come up with a way of doing it beyond this code
(which I didn't invent, but I can't for the life of me remember where I
found it.)

push rsp
push [rsp]
and spl $F0
call funkychicken
pop rsp

It seems to be the only way of doing this without branches, flags or
other expensive nonsense. But, as ever, there may be a better way. Any
suggestions?


--
Alex

Rick C. Hodgin

unread,
Aug 29, 2017, 12:14:43 PM8/29/17
to
I don't think the above solution will work. If I read it correctly
(as it appears to be mixing ISAs), you'll lose your relative stack
pointer position with the AND, and since it's an unknown (it will
either change the value of rsp or not), then it won't be reliable.

The only thing I can think of that doesn't involve branching is an
algorithm like this (completely untested, and I'm using r15 as an
arbitrarily chosen register, it can be changed to any other reg):

xor r15,r15
test rsp,0Fh ; See if we're already aligned
setnz r15b ; Set to 0 or 1 based on alignment
shl r15,3 ; Multiply by 8
sub rsp,r15 ; Adjust rsp by 0 or 8
call funkychicken ; Perform the normal call
add rsp,r15 ; Un-adjust rsp by 0 or 8

Note also that this has the effect of pushing your parameters up
on the stack an additional 8-bytes, so you may need to do this
first ... I don't know. I've always kept the stack 8-byte aligned
and everything worked, so I'm not really sure what you're doing
here.

Thank you,
Rick C. Hodgin

Anton Ertl

unread,
Aug 29, 2017, 12:29:45 PM8/29/17
to
The usual way (when you are in an ABI-compliant function yourself) is
that your function adjusts the stack pointer by 8 mod 16 in a
statically known way.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Alex

unread,
Aug 29, 2017, 1:14:49 PM8/29/17
to
On 29-Aug-17 17:07, Rick C. Hodgin wrote:
> On Tuesday, August 29, 2017 at 11:14:38 AM UTC-4, Alex wrote:
>> On 64 bit Windows, stack alignment on a 16 byte boundary is required
>> before calling all except a leaf function. In the called function, the
>> stack is 8 mod 16.
>>
>> Now, I'm struggling to come up with a way of doing it beyond this code
>> (which I didn't invent, but I can't for the life of me remember where I
>> found it.)
>>
>> push rsp
>> push [rsp]
>> and spl $F0
>> call funkychicken
>> pop rsp
>>
>> It seems to be the only way of doing this without branches, flags or
>> other expensive nonsense. But, as ever, there may be a better way. Any
>> suggestions?
>
> I don't think the above solution will work. If I read it correctly
> (as it appears to be mixing ISAs), you'll lose your relative stack
> pointer position with the AND, and since it's an unknown (it will
> either change the value of rsp or not), then it won't be reliable.

It does. Either the AND doesn't change RSP and we pop the second copy,
or it does and we pop the first.

If I'm passing parameters beyond 4, then that needs parameters pushed on
the stack. An even number can use AND SPL $F0 and an odd number OR SPL
$08 as in

push rsp
push [rsp]
and spl $F0 ; align to 16
push r14 ; two extra parameters = 16
push r15
call funkychicken
add rsp $8
pop rsp

or

push rsp
push [rsp]
or spl $08 ; align to 8 byte
push r15 ; one extra parameter
call funkychicken
add rsp $4
pop rsp

>
> The only thing I can think of that doesn't involve branching is an
> algorithm like this (completely untested, and I'm using r15 as an
> arbitrarily chosen register, it can be changed to any other reg):
>
> xor r15,r15
> test rsp,0Fh ; See if we're already aligned
> setnz r15b ; Set to 0 or 1 based on alignment
> shl r15,3 ; Multiply by 8
> sub rsp,r15 ; Adjust rsp by 0 or 8
> call funkychicken ; Perform the normal call
> add rsp,r15 ; Un-adjust rsp by 0 or 8
>
> Note also that this has the effect of pushing your parameters up
> on the stack an additional 8-bytes, so you may need to do this
> first ... I don't know. I've always kept the stack 8-byte aligned
> and everything worked, so I'm not really sure what you're doing
> here.

The MS ABI needs 16 byte alignment; mainly to support XMM aligned on the
stack. http://www.agner.org/optimize/calling_conventions.pdf

>
> Thank you,
> Rick C. Hodgin
>


--
Alex

Alex

unread,
Aug 29, 2017, 1:29:51 PM8/29/17
to
On 29-Aug-17 17:23, Anton Ertl wrote:
> Alex <al...@nospicedham.rivadpm.com> writes:
>> On 64 bit Windows, stack alignment on a 16 byte boundary is required
>> before calling all except a leaf function. In the called function, the
>> stack is 8 mod 16.
>>
>> Now, I'm struggling to come up with a way of doing it beyond this code
>> (which I didn't invent, but I can't for the life of me remember where I
>> found it.)
>>
>> push rsp
>> push [rsp]
>> and spl $F0
>> call funkychicken
>> pop rsp
>>
>> It seems to be the only way of doing this without branches, flags or
>> other expensive nonsense. But, as ever, there may be a better way. Any
>> suggestions?
>
> The usual way (when you are in an ABI-compliant function yourself) is
> that your function adjusts the stack pointer by 8 mod 16 in a
> statically known way.
>
> - anton
>

I'm using a two stack model (as you will know, this is Forth) where the
return stack is RSP based and the data stack is based on another
register, say RBP.

On my 32 bit system (ESP and EBP respectively) calling into Windows is
most easily achieved by switching the stacks, doing the call, and
switching back again, since all the parameters are already on the data
stack pointed by EBP, and there is no special alignment required.

For 64 bit Windows, that technique is not as easy since Windows passes 4
parameters in registers, the rest on the stack, and has this oddball 16
byte alignment; so there is no static way of ensuring either of the
stacks is 16 byte aligned, regardless of whether I switch stacks or not.

--
Alex

Rick C. Hodgin

unread,
Aug 29, 2017, 1:59:57 PM8/29/17
to
#1 -- That will change the value of the stack pointer, but you
don't know if it changed or not. If it was already aligned,
it would've done nothing, but if it wasn't aligned, it
would've adjusted it. And when you return, you'll be at a
point of not knowing whether or not it should be adjusted.
Your subsequent pop rsp will be off by 8 bytes potentially.

#2 -- What is "spl"?

> > The only thing I can think of that doesn't involve branching is an
> > algorithm like this (completely untested, and I'm using r15 as an
> > arbitrarily chosen register, it can be changed to any other reg):
> >
> > xor r15,r15
> > test rsp,0Fh ; See if we're already aligned
> > setnz r15b ; Set to 0 or 1 based on alignment
> > shl r15,3 ; Multiply by 8
> > sub rsp,r15 ; Adjust rsp by 0 or 8
> > call funkychicken ; Perform the normal call
> > add rsp,r15 ; Un-adjust rsp by 0 or 8
> >
> > Note also that this has the effect of pushing your parameters up
> > on the stack an additional 8-bytes, so you may need to do this
> > first ... I don't know. I've always kept the stack 8-byte aligned
> > and everything worked, so I'm not really sure what you're doing
> > here.
>
> The MS ABI needs 16 byte alignment; mainly to support XMM aligned on the
> stack. http://www.agner.org/optimize/calling_conventions.pdf

You are correct. I have that noted in my code as well as the need
for creating a 32-byte shadow area above all parameters, but I've
never explicitly checked the stack in my code before calling the
target function. I have populated registers, or pushed everything
as an 8-byte push and then just called the target.

I'm thinking I've been lucky in that all of the code I've used to
date has had only a few parameters which went in registers, or an
even number of parameters which worked out.

Pretty sure it's a bug in my code, and I'll thank you for teaching
me something today. :-)

Alex

unread,
Aug 29, 2017, 2:14:59 PM8/29/17
to
On 29-Aug-17 18:06, Alex wrote:
> The only thing I can think of that doesn't involve branching is an
> algorithm like this (completely untested, and I'm using r15 as an
> arbitrarily chosen register, it can be changed to any other reg):
>
>      xor     r15,r15
>      test    rsp,0Fh         ; See if we're already aligned
>      setnz   r15b            ; Set to 0 or 1 based on alignment
>      shl     r15,3           ; Multiply by 8
>      sub     rsp,r15         ; Adjust rsp by 0 or 8
>      call    funkychicken    ; Perform the normal call
>      add     rsp,r15         ; Un-adjust rsp by 0 or 8
>
> Note also that this has the effect of pushing your parameters up
> on the stack an additional 8-bytes, so you may need to do this
> first ... I don't know.  I've always kept the stack 8-byte aligned
> and everything worked, so I'm not really sure what you're doing
> here.

OK, at the cost of a register (preferably one that doesn't get
scratched, which makes R15 OK) we could do

mov r15 esp
sub esp $10 ; 16 decimal
and spl $F0
call dotheconga
mov rsp r15

--
Alex

Rick C. Hodgin

unread,
Aug 29, 2017, 2:30:01 PM8/29/17
to
Nice. That would work.

I love teamwork. :-) It's always much easier to edit than it is
to create.

Alex

unread,
Aug 29, 2017, 2:45:04 PM8/29/17
to
That's why there are two identical values on the stack provided by the
pushes of RSP and [RSP] (which is RSP at the point of the first push).

>
> #2 -- What is "spl"?

The low order byte of RSP. It's key to understanding why it works.
Anyhow, I'm looking for an alternative, and might use the register version.


--
Alex

Rick C. Hodgin

unread,
Aug 29, 2017, 3:30:08 PM8/29/17
to
You alter rsp after that push, so the value you'll be popping back
will either be correct (it was already aligned) or wrong, it was not
aligned and needed the bits lopped off, which now has rsp pointing
to some new value when you finally execute your pop rsp.

>> #2 -- What is "spl"?
>
> The low order byte of RSP.

I haven't seen that before.

> It's key to understanding why it works. Anyhow, I'm looking for an
> alternative, and might use the register version.

I still don't think your version will work reliably.

Alex

unread,
Aug 29, 2017, 6:15:18 PM8/29/17
to
On 29-Aug-17 20:19, Rick C. Hodgin wrote:
> I still don't think your version will work reliably.

There's a good article I've found here;
http://www.masmforum.com/board/index.php?PHPSESSID=786dd40408172108b65a5a36b09c88c0&topic=4752.0


--
Alex

Andrew Cooper

unread,
Aug 29, 2017, 7:45:25 PM8/29/17
to
Why is the stack 8 mod 16? That is the bug in this scenario.

If every function sets up an ABI compatible stack for its callees, all a
callee needs to do is ensure it pushes/adjusts the stack pointer by an
even number of words.

I see from other replies that you are doing this in some Forth situation
with multiple stacks, but at any point that you have the above scenario,
a higher caller has screwed up. Things will definitely go wrong when
you call into a C library, but also with any signal handler which
intends to use the red zone.

Your above code will function correctly, but has a performance hit,
because a direct write to the stack pointer interrupts stack-engine
optimisations in the pipeline for adjacent pushes/pops/calls/rets.
Also, writes to 8-bit registers suffer a merge penalty back into the
register file. `and $~0xf, %rsp` would be more efficient; It encodes in
the same number of bytes, but doesn't suffer from merging.

As a minor note, you should use `leave` rather than pop %rsp, as it
takes less instruction bandwidth to execute.

~Andrew

Rick C. Hodgin

unread,
Aug 29, 2017, 7:45:25 PM8/29/17
to
That link cites the first version as an error.

I think the r15 solution you came up with is the safest, assuming
r15 is disposable like that. You should wrap it for nested calls:

push r15 ; <== save r15 prior to use
mov r15,esp
sub esp,$10 ; 16 decimal
and spl,$F0
call dotheconga
mov rsp,r15
pop r15 ; <== restore original r15

Bernhard Schornak

unread,
Aug 30, 2017, 5:16:06 AM8/30/17
to
If we assume the stack was properly aligned by the calling function,
there is only one way to align the stack while your function code is
running:


sub $0x?8, %rsp
...
function code
...
add $0x?8, %rsp
ret


The subtracted space must be at least 32 (0x20) byte for microsoft's
'red zone' plus 8 byte for the return address (pushed onto the stack
by the calling function!), so a 'leaf function' should subtract 0x28
to work properly in a multithreaded environment. This subtraction of
40 (0x28) automatically aligns your stack if it was properly aligned
before. Add as many paragraphs (0x28 + n*16 byte) as required as the
local storage for your function.

*After* the subtraction, addresses 0x00(%rsp) through 0x1F(%rsp) are
reserved for called functions. Some qwords above 0x20(%rsp) might be
required as well to pass parameters to called functions which do not
fit into the first four registers (rcx, rdx, r8, r9).


Greetings from Augsburg

Bernhard Schornak

Bartc

unread,
Aug 30, 2017, 6:46:12 AM8/30/17
to
On 30/08/2017 10:10, Bernhard Schornak wrote:
> Alex wrote:

>> It seems to be the only way of doing this without branches, flags or
>> other expensive nonsense. But,
>> as ever, there may be a better way. Any suggestions?
>
>
> If we assume the stack was properly aligned by the calling function,
> there is only one way to align the stack while your function code is
> running:
>
>
> sub $0x?8, %rsp
> ...
> function code
> ...
> add $0x?8, %rsp
> ret
>
>
> The subtracted space must be at least 32 (0x20) byte for microsoft's
> 'red zone' plus 8 byte for the return address (pushed onto the stack
> by the calling function!), so a 'leaf function' should subtract 0x28
> to work properly in a multithreaded environment. This subtraction of
> 40 (0x28) automatically aligns your stack if it was properly aligned
> before. Add as many paragraphs (0x28 + n*16 byte) as required as the
> local storage for your function.

This assumes the code doesn't use the stack for any other purposes
between function entry, and a call to a function that expects the stack
to be aligned (at the call).

For example, things may be put on the stack while evaluating a complex
expression and one of the terms requires a call. Or you are pushing
arguments 5, 6 or 7 of a complex call, and one of those expressions
itself involves a function call.

It is also possible, if not calling external functions, that calls to
internal functions in your code, which do not require alignment, do not
bother with keeping the stack aligned, or use a simple argument-passing
convention (and have pushed an even or odd number of parameters), or
don't need a stack frame.

For whatever reason, when it is necessary to call an external function,
it won't know the stack alignment.


(Solutions I've used:

* Call a special stub function for calling external functions. There is
a separate one for 4, 5, 6 etc parameters. It uses a check, and branch,
and will rearrange things as needed. It expects such calls to be rare.
Local calls use a private call convention.

* Keep track of how many things have been pushed onto the stack at any
point in an instruction sequence. Then it will know if the stack is
aligned or not and generate the correct code.

* Avoid using the stack for any purpose than for calling functions. And
in the latter case, it avoids nested calls (by pre-evaluating any such
terms). Both the last two I've used in generated code.)

--
bartc

Bernhard Schornak

unread,
Aug 30, 2017, 12:01:35 PM8/30/17
to
What I wrote is working code following the rules MS defined for 64 bit
Windoze. Whatever your code does: The stack bottom (content of RSP) is
the border between the currently executed code and other code snippets
belonging to the same process. 'New' code only will access stack above
RSP if it belongs to a passed structure (which should return some data
requested by the caller) or if more than 4 parameters were passed (the
5th up to the last parameter are passed at the dwords 0x20[RSP] + size
of stack frame (e.g. 0x48[RSP] if you subtracted 0x28 in your function
entry sequence). Per definition, the so called 'red zone' at the stack
bottom (here 0x28...0x47[RSP] after subtracting 0x28 from RSP) belongs
to the called function and can be used as local storage by the callee.

If no function violates the convention (and working MS code definitely
obeys all rules), the stack cannot be misaligned. Only faulty function
entry code is capable to move RSP outside paragraph boundaries.


> (Solutions I've used:
>
> * Call a special stub function for calling external functions. There is a separate one for 4, 5, 6
> etc parameters. It uses a check, and branch, and will rearrange things as needed. It expects such
> calls to be rare. Local calls use a private call convention.
>
> * Keep track of how many things have been pushed onto the stack at any point in an instruction
> sequence. Then it will know if the stack is aligned or not and generate the correct code.
>
> * Avoid using the stack for any purpose than for calling functions. And in the latter case, it
> avoids nested calls (by pre-evaluating any such terms). Both the last two I've used in generated code.)


On modern machines, it is a bad idea to push and pop data onto or from
the stack, because this needs an additional register (generally RBP is
used as so called base pointer). None of my programs uses PUSH/POP. It
only is useful for very low level stuff (like retrieving th content of
RIP). Using wrappers (your 'stubs'):

http://st-intelligentdesign.blogspot.de/2010/11/st-opens-wrappers-for-64-bit-windoze.html

The linked file provides wrappers for the most used functions provided
by the common Windoze libraries. The advantage of wrappers is a 'clean
environment', where registers don't lose their content while calling a
'dirty' C/C++/whatever-function clobbering registers without restoring
them on exit.

It shouldn't be a problem to translate AT&T-style assembler to iNTEL's
goobledigook... ;)

Anton Ertl

unread,
Sep 2, 2017, 12:21:47 PM9/2/17
to
Alex <al...@nospicedham.rivadpm.com> writes:
>I'm using a two stack model (as you will know, this is Forth) where the
>return stack is RSP based and the data stack is based on another
>register, say RBP.
>
>On my 32 bit system (ESP and EBP respectively) calling into Windows is
>most easily achieved by switching the stacks, doing the call, and
>switching back again, since all the parameters are already on the data
>stack pointed by EBP, and there is no special alignment required.

This is already wrong, because the parameters are in the wrong order
and it breaks down when you need to pass an FP parameter. Of course
there is the lure of letting the programmer do the parameter
reversing, and many Forth system implementors fell into this trap, but
the significant ones have seen the light, and do it mostly properly
now.

>For 64 bit Windows, that technique is not as easy since Windows passes 4
>parameters in registers, the rest on the stack, and has this oddball 16
>byte alignment; so there is no static way of ensuring either of the
>stacks is 16 byte aligned, regardless of whether I switch stacks or not.

You could have the C stack separate from the data, return, and FP
stacks, with the C stack always complying with the ABI. You could
store the C stack pointer in memory while executing Forth code to to avoid
wasting a register on a value that is not used for a long time.

AFAIK you are planning to eventually have an analytical compiler; in
that case letting the compiler put the arguments of a C call in
registers or on the C stack should not be particularly difficult.

Alex

unread,
Sep 2, 2017, 5:07:19 PM9/2/17
to
Followups set to comp.lang.forth

On 02-Sep-17 16:58, Anton Ertl wrote:
> Alex <al...@nospicedham.rivadpm.com> writes:
>> I'm using a two stack model (as you will know, this is Forth) where the
>> return stack is RSP based and the data stack is based on another
>> register, say RBP.
>>
>> On my 32 bit system (ESP and EBP respectively) calling into Windows is
>> most easily achieved by switching the stacks, doing the call, and
>> switching back again, since all the parameters are already on the data
>> stack pointed by EBP, and there is no special alignment required.
>
> This is already wrong, because the parameters are in the wrong order

FunctionName(a,b,c,d) becomes d c b a FunctionName. It hasn't proved to
be a problem so far. a b c d FunctionName requires FunctionName to
reverse 4 parameters; that's both inefficient and requires the count.
Varargs become significantly more difficult.

> and it breaks down when you need to pass an FP parameter. Of course

How?

> there is the lure of letting the programmer do the parameter
> reversing, and many Forth system implementors fell into this trap, but
> the significant ones have seen the light, and do it mostly properly
> now.

I have to disagree; it's more a matter of taste and style.

>
>> For 64 bit Windows, that technique is not as easy since Windows passes 4
>> parameters in registers, the rest on the stack, and has this oddball 16
>> byte alignment; so there is no static way of ensuring either of the
>> stacks is 16 byte aligned, regardless of whether I switch stacks or not.
>
> You could have the C stack separate from the data, return, and FP
> stacks, with the C stack always complying with the ABI. You could
> store the C stack pointer in memory while executing Forth code to to avoid
> wasting a register on a value that is not used for a long time.

Yes, I am proposing using a non-volatile XMM register for that purpose.

\ Register usage
\
\ XMM15 save of Windows RSP
\ RAX cached top of stack
\ RSP return stack RP
\ RBP data stack SP
\ RBX per process user area
\ R12 locals stack LP
\ R13 float stack FP

>
> AFAIK you are planning to eventually have an analytical compiler; in

Eventually. It's hard getting the needed hours together.

> that case letting the compiler put the arguments of a C call in
> registers or on the C stack should not be particularly difficult.
>
> - anton
>


--
Alex

Rod Pemberton

unread,
Sep 4, 2017, 7:35:11 AM9/4/17
to
From reading and rereading this thread multiple times, my question was:

"Is the stack pointer 16 byte aligned (even 8 byte multiple) prior to
the call instruction, or 16 byte aligned (odd 8 byte multiple) prior to
the call instruction?"

Agner answered that (below) in the cited document. See the last
sentence of his quote. I'm assuming he is correct. (But, is he? ... )

"The 64-bit systems keep the stack aligned by 16. The stack word size
is 8 bytes, but the stack must be aligned by 16 before any call
instruction. Consequently, the value of the stack pointer is always 8
modulo 16 at the entry of a procedure. A procedure must subtract an
odd multiple of 8 from the stack pointer before any call instruction."
- Agner Fog

The "System V Application Binary Interface" for AMD64 confirms that
the 16 byte alignment is on an odd 8 byte multiple from rsp, i.e.,
"%rsp + 8", prior to the call instruction, i.e., "when control is
transferred to the function entry point" (section 3.2.2 "The Stack
Frame"):

"The end of the input argument area shall be aligned on 16 (32, if
__m256 is passed on stack) byte boundary. In other words, the value
(%rsp + 8) is always a multiple of 16 (32) when control is transferred
to the function entry point."

https://web.archive.org/web/20120323195628/http://www.x86-64.org/documentation/abi.pdf

So, yes, Agner is correct, according to the ABI. And, it also means
that due to the call instruction, - which apparently pushes 8 bytes
itself for the return address - the stack must be aligned on an odd
multiple of 8 prior to the call instruction to be aligned on an even
multiple of 8 (or 16 byte aligned) upon entering the procedure.

If I'm not mistaken, the routines I've seen in the thread don't account
for the 8 byte return value pushed onto the stack by the call
instruction. They're 16 byte aligned (even 8 byte multiple) for the
address prior to the call instruction. So, the stack is aligned on an
even multiple of 8 (or 16 byte aligned) prior to the call instruction,
and will be on an odd multiple of 8 upon entry into the procedure. If
I understood Agner and the ABI correctly, this would be incorrect, as
the stack should be aligned to an even multiple of 8 (or 16 byte
aligned) upon entry into the procedure. I.e., the alignment should
compensate for the return value pushed by the call instruction.


Rod Pemberton
--
Isn't anti-hate just hate by another name? Isn't
anti-protesting just protesting by another name?
Peace is a choice that both sides rejected.

Frank Kotler

unread,
Sep 4, 2017, 4:24:42 PM9/4/17
to
> Hey Rod...
>
> I may have dropped a message on the floor. I got one from you with a new
> and improved address, so "not in whitelist". It should be no problem to
> fix this - 'a' for "approve" and 'y' to add user to whitelist. For some
> reason it didn't go (?) and the server thought it was a duplicate(?). It
> was very late so I went to bed. When I awoke, I found "no messages". I
> don't know what I did wrong. I cut and pasted it off "clax.log" and here
> it is.
>
> This is from Rod:
>
> Best,
> Frank
> [attempted moderator]
>
> Take two: I guess this was supposed to go to the forth group. Dunno why I saw it at all in that case. Anyway here it is - we need the traffic. :)
>
> On Tue, 29 Aug 2017 16:07:13 +0100
> Alex <al...@nospicedham.rivadpm.com> wrote:
>
>> On 64 bit Windows, stack alignment on a 16 byte boundary is required=20
>> before calling all except a leaf function. In the called function,
>> the stack is 8 mod 16.
>>=20
>> Now, I'm struggling to come up with a way of doing it beyond this
>> code (which I didn't invent, but I can't for the life of me remember
>> where I found it.)
>>=20
>> push rsp
>> push [rsp]
>> and spl $F0
>> call funkychicken
>> pop rsp
>>=20
>> It seems to be the only way of doing this without branches, flags or=20
>> other expensive nonsense. But, as ever, there may be a better way.
>> Any suggestions?
>>=20
>
> =46rom reading and rereading this thread multiple times, my question was:
>
> "Is the stack pointer 16 byte aligned (even 8 byte multiple) prior to
> the call instruction, or 16 byte aligned (odd 8 byte multiple) prior to
> the call instruction?"
>
> Agner answered that (below) in the cited document. See the last
> sentence of his quote. I'm assuming he is correct. (But, is he? ... )
>
> "The 64-bit systems keep the stack aligned by 16. The stack word size
> is 8 bytes, but the stack must be aligned by 16 before any call
> instruction. Consequently, the value of the stack pointer is always 8
> modulo 16 at the entry of a procedure. A procedure must subtract an
> odd multiple of 8 from the stack pointer before any call instruction."
> - Agner Fog
>
> The "System V Application Binary Interface" for AMD64 confirms that
> the 16 byte alignment is on an odd 8 byte multiple from rsp, i.e.,
> "%rsp + 8", prior to the call instruction, i.e., "when control is
> transferred to the function entry point" (section 3.2.2 "The Stack
> Frame"):
>
> "The end of the input argument area shall be aligned on 16 (32, if
> __m256 is passed on stack) byte boundary. In other words, the value
> (%rsp + 8) is always a multiple of 16 (32) when control is transferred
> to the function entry point."
>
> https://web.archive.org/web/20120323195628/http://www.x86-64.org/documentat=
> ion/abi.pdf
>
> So, yes, Agner is correct, according to the ABI. And, it also means
> that due to the call instruction, - which apparently pushes 8 bytes
> itself for the return address - the stack must be aligned on an odd
> multiple of 8 prior to the call instruction to be aligned on an even
> multiple of 8 (or 16 byte aligned) upon entering the procedure.
>
> If I'm not mistaken, the routines I've seen in the thread don't account
> for the 8 byte return value pushed onto the stack by the call
> instruction. They're 16 byte aligned (even 8 byte multiple) for the
> address prior to the call instruction. So, the stack is aligned on an
> even multiple of 8 (or 16 byte aligned) prior to the call instruction,
> and will be on an odd multiple of 8 upon entry into the procedure. If
> I understood Agner and the ABI correctly, this would be incorrect, as
> the stack should be aligned to an even multiple of 8 (or 16 byte
> aligned) upon entry into the procedure. I.e., the alignment should
> compensate for the return value pushed by the call instruction.
>
>
> Rod Pemberton
> --=20

wolfgang kern

unread,
Sep 6, 2017, 12:19:51 AM9/6/17
to
Frank Kotler said:

>> Hey Rod...
>>
>> I may have dropped a message on the floor...

I'm not Rod, but I saw his post here in CLAX anyway:

news:oohq7k$15lt$1...@gioia.aioe.org...
__
wolfgang

Rod Pemberton

unread,
Sep 6, 2017, 12:19:52 AM9/6/17
to
On Mon, 04 Sep 2017 16:24:51 -0400
Frank Kotler <fbko...@nospicedham.myfairpoint.net> wrote:

> > Hey Rod...
> >
> > I may have dropped a message on the floor. I got one from you with
> > a new and improved address, so "not in whitelist". It should be no
> > problem to fix this - 'a' for "approve" and 'y' to add user to
> > whitelist. For some reason it didn't go (?) and the server thought
> > it was a duplicate(?). It was very late so I went to bed. When I
> > awoke, I found "no messages". I don't know what I did wrong. I cut
> > and pasted it off "clax.log" and here it is.
> >

I see my original on c.l.a.x. and your re-posts to c.l.a.x. and c.l.f.
(It was a reply to Alex's first post, not the later one you replied to.)

The original post was in the thread on the server where I read. It's
also on AIOE.org. It also made it to Google Groups. Maybe, someone
needs to check Eternal-September? The Usenet message ID to look up the
post and a link to Google Groups to view it:

msg-id: oohq7k$15lt$1...@gioia.aioe.org
https://groups.google.com/d/msg/comp.lang.asm.x86/JV13RMJsaS8/fEtCMEIZBAAJ


Rod Pemberton
--

Frank Kotler

unread,
Sep 6, 2017, 1:00:54 AM9/6/17
to
Rod Pemberton wrote:

...
> Maybe, someone
> needs to check Eternal-September?

Yeah, they may have had some trouble there. I definitely had some
trouble here. I hope it's fixed. Thanks for your patience, guys.

Best,
Frank

Bernhard Schornak

unread,
Sep 7, 2017, 12:28:43 AM9/7/17
to
Rod Pemberton schrieb:


> Maybe, someone needs to check Eternal-September?


Eternal-September was offline due to a hardware problem. It now is
working properly, again.

James Van Buskirk

unread,
Sep 8, 2017, 5:29:16 AM9/8/17
to
"Bernhard Schornak" wrote in message news:ooqhek$heh$1...@dont-email.me...

> Rod Pemberton schrieb:

> > Maybe, someone needs to check Eternal-September?

> Eternal-September was offline due to a hardware problem. It now is
> working properly, again.

Actually they got new hardware and their new server is called
reader.eternal-september.org . They said it would take an
extra day for their content to be visible at the old address of
news.eternal.september.org , but now that it is visible I think
both should be equivalent at this point. I haven't yet tried the
reader.* address however.

The news.* post got stuck in my outbox; trying again from
reader.* . Sorry for any duplicates.

James Van Buskirk

unread,
Sep 8, 2017, 5:29:17 AM9/8/17
to
"Bernhard Schornak" wrote in message news:ooqhek$heh$1...@dont-email.me...

> Rod Pemberton schrieb:

> > Maybe, someone needs to check Eternal-September?

> Eternal-September was offline due to a hardware problem. It now is
> working properly, again.

Bernhard Schornak

unread,
Sep 8, 2017, 8:29:30 AM9/8/17
to
My previous post was sent via reader.eternal-september.org. It
worked as expected. Full statement from eternal-september:

https://www.eternal-september.org/


Have a nice weekend!

Bernhard Schornak
0 new messages