mov rax,00112233'44556677h
and eax,eax
...
resulting eax in 44556677h !!!
that is simple, nearly obvious but cool P) ,
Cheers
--
.:hopcode[marc:rainer:kranz]:.
x64 Assembly Lab
http://sites.google.com/site/x64lab
The "and x,x" combination is used to set the overflow, carry, sign, zero and parity flags, and always returns the same x value it had before.
- Rick C. Hodgin
except, IIRC, it will either sign or zero extend EAX into RAX (I forget
which), rather than leaving the high-bits unchanged (as would make more
sense IMO...).
Hmmm... according to Intel's manual:
"Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1; otherwise, it is set to 0. This instruction can be used with a LOCK prefix to allow the it to be executed atomically.
"In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits."
One of the encodings shows "and r64/m64,imm8" and "and r64/m64,imm32" which will both sign-extend the immediate up to 64-bits, but not when a 32-bit r32/m32 is used in combination with another r32.
If it's encoded as "and eax,eax" it should not touch the upper bits. Does anyone know if it really does? I don't have a 64-bit CPU right now to test it.
- Rick C. Hodgin
It makes more sense to zero the high bits out (that's what it does, not sign
extend) otherwise xor eax,eax and mov eax,something etc would have a
dependency on rax. That saves some REX.W prefixes, and this way no MOVZX
r64,r32 instruction is necessary.
> If it's encoded as "and eax,eax" it should not touch the upper bits. Does anyone know if it really does? I don't have a 64-bit CPU right now to test it.
If you set the lower 32 bit of a register in 64 bit
mode, the upper 32 bit of that register are cleared
automatically. This can be used to save (redundant)
REX-prefixes, e.g.: XOR EAX,EAX does the same thing
than XOR RAX,RAX, but is two byte shorter.
Greetings from Augsburg
Bernhard Schornak
It seems the answer is: yes, at least for the cpu and code I used. The test
cpu is an AMD X2 5600+ 2.8Ghz.
I used some code to get to 64-bit mode from DOS. I searched for it, and I
believe it's from here:
http://www.japheth.de/JWasm/Dos64.html
It's apparently written by Japheth with some code he got from others. I've
not used it before. I can only assume it works correctly. I.e., I've got
no hard proof it gets to 64-bit mode correctly. Although, it seemed to
work.
I added this just after the "Hello 64Bit":
mov rax, 2233445566778899h
and rax, rax
call WriteQW
and eax, eax
call WriteQW
It emitted (on one line actually...):
2233445566778899
0000000066778839
Which has AL corrupted... So, I changed it to this:
mov rax, 2233445566778899h
and rax, rax
call WriteQW
mov rax, 2233445566778899h
and eax, eax
call WriteQW
It emitted (on one line actually...):
2233445566778899
0000000066778899
The code is compiled with JWasm. JWasm is Japheth's extended version of
OpenWatcom's Wasm assembler. JWasm does not work in DOS. It requires
Windows. The compiled byte sequence for "and rax, rax" and "and eax, eax"
is the same as a current version of NASM.
I would've preferred to use NASM code, but I don't have any 64-bit startup
code in NASM. Alternately, I would've used GAS on a 64-bit Linux, but I
recently wiped that.
Rod Pemberton
well, it was one of them, but I couldn't remember which, and didn't want
to go look it up...
> If it's encoded as "and eax,eax" it should not touch the upper bits.
> Does anyone know if it really does?
It *should* touch, and zero, the upper bits.
Generally, all 32-bit operations (those without REX.W[1]) in 64-bit mode
that write to a register will clear the top 32 bits of the 64-bit register.
It allows you to do 32-bit operations just as in 32-bit mode, without
creating spurious dependencies on the previous values of the top bits.
So,
xor eax, eax
is a way to zero the entire 64-bit register, and
and eax, eax
will zero the top bits and set flags depending on the low 32-bits.
If you don't need the flags,
mov eax, eax
will do the same.
Also, if you have a 32-bit immediate in the instruction, 32-bit instructions
will just use it (and still zero the top bits), and 64-bit instructions will
sign extend it. I.e.,
mov eax, 0x87654321
will make rax equal 0x0000000087654321, and
mov rax, 0x87654321
will make rax equal 0xffffffff87654321.
If you want to set the low bits without changing the high bits,
you can do a trick like:
// sets eax to ebx without changing rax[32..64]
xor ebx, eax
xor rax, rbx
Not perfect - it changes ebx (but you can get the low bits back with
another xor ebx,eax) and it uses two dependent operations instead of a
single move, but it gets the job done :)
/L
[1] That doesn't have an implicit operand size of 64-bits, like pop.
--
Lasse Reichstein Holst Nielsen
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Hi Lasse, You are right !
and the assembler i use *cannot* encode it.
It cannot encode
MOV qword[RAX],80000000h i.e. Intel legally
REX.W + C7 /0 MOV r/m64,imm32
Move imm32 sign extended to 64-bits to r/m64.
Here, my way how to obtain it !!!
db 0100'1000b
db 0C7h
db 11100000b
dd 80000000h ;48 C7 E0 00000080 --> mov rax,FFFFFFFF80000000
db 48h
db 0C7h
db 00000000b
dd 80000000h ;48 C7 20 00000080 --> mov qword [rax],FFFFFFFF80000000
Now, other details
mov rax,11223344FFFFFFFFh
and rax,07FFFFFFh ;--- ops : 4825 FFFFFF07
;--- RAX -> 00000000'7FFFFFFFh ZF = SF = 0, PF = 1
nop
;--- because the assembler i use does not allow
;--- AND RAX,80000000h i.e. Intel legally
;--- REX.W + 25 id AND RAX, imm32
;--- RAX AND imm32 signextended to 64-bits,
;--- i will hardcode it
mov rax,11223344FFFFFFFFh
db 48h,25h ;--- ops: 4825 7FFFFFFF
dd 7FFFFFFFh ;--- RAX -> 0000000007FFFFFFFh, ZF = SF = 0, PF = 1
nop
mov rax,11223344FFFFFFFFh
db 48h,25h ;--- AND RAX,80000000h
dd 80000000h ;--- RAX -> 1122334480000000h !!!!!!! ZF = SF = 0, PF = 1
nop
mov rax,11223344FFFFFFFFh
db 48h,25h ;--- ops: 4825 FFFFFFFFh ( AND RAX,0FFFFFFFFh)
dd 0FFFFFFFFh ;--- RAX -> 11223344FFFFFFFFh ZF = SF = 0, PF = 1
nop
mov rax,11223344FFFFFFFFh
db 48h,25h ;--- AND RAX,011111111h
dd 011111111h ;--- sign ext RAX -> 0000000011111111h ZF = SF = 0,PF = 1
;--- but, note the following 2 items
nop
mov rax,11223344FFFFFFFFh
db 48h,25h ;--- AND RAX,80808080h
dd 81234567h ;--- RAX -> 1122334481234567h !!!!!!! ZF = SF = PF = 0
nop
mov rax,11223344FFFFFFFFh
and eax,eax ;--- RAX -> 00000000FFFFFFFFh ZF = 0, SF = PF = 1
Cheers,
On Apr 2, 4:50 am, "Rod Pemberton"
<do_not_h...@nospicedham.noavailemail.cmm> wrote:
> "Rick C. Hodgin" <foxmuldrs...@nospicedham.gmail.com> wrote in messagenews:61ef64ee-2d09-4c3b...@glegroupsg2000goo.googlegroups.com...
>
> I used some code to get to 64-bit mode from DOS. I searched for it, and I
> believe it's from here: http://www.japheth.de/JWasm/Dos64.html
>
> The code is compiled with JWasm. JWasm is Japheth's extended version of
> OpenWatcom's Wasm assembler. JWasm does not work in DOS. It requires
> Windows.
Hate to correct you, but I don't know why you think it won't run in
DOS. It has both real mode and 32-bit DPMI binaries (as well as Linux
and Win32). So no, you don't need Windows (thankfully!). ;-)
Oops! I clearly missed that... The version I have (v2.00pre) also has a
DOS version: Jwasmd.exe. I'm not sure why I didn't extract that. But,
that's good. So, thanks for mentioning that. I probably need a newer
version too... Apparently, the current version is v2.05. I guess I should
start checking how out of date my software is.
Rod Pemberton
Oops! I clearly missed that... The version I have (v2.00pre) also has a
DOS version: Jwasmd.exe. I'm not sure why I didn't extract that. But,
that's good. So, thanks for mentioning that. I probably need a newer
version too... Apparently, the current version is v2.05. I guess I should
start checking how out of date my software is.
Rod Pemberton
(resend)
> mov rax,00112233'44556677h
> and eax,eax
> ...
> resulting eax in 44556677h !!!
>
> that is simple, nearly obvious but cool P) ,
Exact seen. X86-64 code got implied (ZX-extended to 64-bit)
operations on all 32-bit accesses to registers/memory.
And this is known since 64-bit (AMD) arrived, even often
misinterpreted by tools and programmers.
I took this strange behave as a mnemocic additive (comment)
into my disassembler as an optional (just informal) ZxD prefix.
But whenever we may became more familiar with 64-bit coding we may
not need this 'ZxD comment' anymore, so I have this option as just
an output-option in my disassembler :)
How I see the how/why was an M$-dictate and AMD followed it like a
good girl and swallowed this toad (I'd have gone the logical way).
__
wolfgang
> Exact seen. X86-64 code got implied (ZX-extended to 64-bit)
> operations on all 32-bit accesses to registers/memory.
> And this is known since 64-bit (AMD) arrived, even often
> misinterpreted by tools and programmers.
> I took this strange behave as a mnemocic additive (comment)
> into my disassembler as an optional (just informal) ZxD prefix.
> But whenever we may became more familiar with 64-bit coding we may
> not need this 'ZxD comment' anymore, so I have this option as just
> an output-option in my disassembler :)
> How I see the how/why was an M$-dictate and AMD followed it like a
> good girl and swallowed this toad (I'd have gone the logical way).
The applicable mnemonic as I see it is that AMD acquired some DEC
engineers at the demise of Alpha. Thus you see Alpha characteristics
in AMD designs since then: Alpha had equal latencies for floating
point multiply and add, so Athlon did too. Alpha never had partial
register stalls because it had no instructions that could write a
partial register, so X86-64 was designed such that the instructions
that wrote a 32-bit register wrote the whole register by zeroing
the high bits. If you write only AH, R11W, or SIL, there is no
zeroing so partial registers can still be written, painfully with
RFLAGS as well.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
Hi all,
thank you for your precious comments on this subject.
64 bit programming is something really new imho. i think
that in a way or another "a mnemonic 64 bit implicit mental habit"
must be, preferrably SIMD-habit too, when goal is a fine implementing of
something; i.e: sign-extension, zero-extension should be normally (but
carefully) considered.
if You say "strange behave" i think to imagine correctly what You
mean. i wrote a complete LDE 32 bit engine in 800 bytes and 2 days;
the funniest thing is that after 6-months lerning-time-only
on Intel/AMD opcode manuals, i felt myself like an "archeologist" :-)
because i discovered so different "engineering layouts", at least 5
concretely traceable, and so many weirdnesses that are now
not only habit for me to live with, but surely part of their success
too !
Also, i understand what James tells too, about the latency on zeroing
register. Both your comments on the subjects give me confirmation that i
was right (ich habe es richtig gespürt, sozusagen)!
One more thing: Intel/AMD, they have won not on speed
or the many and funny microcode patented algos; they *win* on thermal
dissipation and energy-sparing revolutionary ideas. Isnt it now the time
for for an emulation challenge ?
>> But whenever we may became more familiar with 64-bit coding we may
>> not need this 'ZxD comment' anymore, so I have this option as just
>> an output-option in my disassembler :)
What i really need is a solid symbolic representation of what the cpu
manuals preserves. If, for example, the rule on the same MOV instruction
sound like 8B/r , REX.W 8B/r, and finally REX.W C7/0 for automated
sign-extension of imm32, well, one should evaluate it in the best way.
That is not so simple. By using for example notation like
0x3 or 0x3L (no extension) or 0x3Z (zero extend, use REX.W C7/0).
That is what i think to be the challenge by building a new assembler:
Encoding symbols as flexible as the opcode generation.
In conclusion, i would be very glad to hear that there is an assembler
available out there, capable of so flexible encoding on the
64bit MOV instruction, just like i described in my post following that
of Lasse.
>
>> How I see the how/why was an M$-dictate and AMD followed it like a
>> good girl and swallowed this toad (I'd have gone the logical way).
>
> The applicable mnemonic as I see it is that AMD acquired some DEC
> engineers at the demise of Alpha. Thus you see Alpha characteristics
> in AMD designs since then: Alpha had equal latencies for floating
> point multiply and add, so Athlon did too. Alpha never had partial
> register stalls because it had no instructions that could write a
> partial register, so X86-64 was designed such that the instructions
> that wrote a 32-bit register wrote the whole register by zeroing
> the high bits. If you write only AH, R11W, or SIL, there is no
> zeroing so partial registers can still be written, painfully with
> RFLAGS as well.
>
Cheers,
There are multiple encodings for certain instructions, e.g., selecting the
special RAX/EAX/AX/AL form instead of normal form for arithmetic
instructions. Typically, assemblers chose the accumulator short-form first
over a normal form. I'll discuss NASM assembler, for example.
I think you're asking how to select between B8+r and C7/0 when either
instruction is a valid encoding, e.g., "mov rax, 3" would be valid for
both. C7/0 supports memory addressing, so it would be chosen in that
situation. It seems NASM chooses B8+r first for the register form. I'm not
sure if there is a method in NASM to select C7/0 encoding for the register
form in 16-bit and 32-bit code.
For 64-bit mode, the B8+r and C7/0 instructions have different size
operands. One has imm64 and the other has imm32. So, when using NASM, you
can select C7, if the immediate will fit an imm32. For the register form,
it seems NASM will select B8+r first. For the memory form, C7/0 should be
selected. REX.W is inserted automatically by NASM:
(AMD) MOV reg64, imm64 B8 +rq iq
(Intel) REX.W + B8 + rd MOV r64, imm64
(NASM) mov rax, 16
48 B8 10 00 00 00 00 00 00 00
(AMD) MOV reg/mem64, imm32 C7 /0 id
(Intel) REX.W + C7 /0 MOV r/m64, imm32
(NASM) mov rax, dword 16
48 C7 C0 10 00 00 00
The NASM keyword: "dword", forces C7/0 form in 64-bit code. The AMD ('07)
manual doesn't indicate on the instruction page that REX.W is required for
64-bit code. It does elsewhere in the manuals. The Intel manual ('08)
doesn't indicate the size of the immediate for the opcode, unlike AMD which
has 'iq' and 'id'. The Intel manual shows 'rd', instead of 'rq', for the
instruction register for B8+r, which is incorrect. So, you might want to
make sure you've got up to date manuals from both of them.
> That is not so simple. By using for example notation like
> 0x3 or 0x3L (no extension) or 0x3Z (zero extend, use REX.W C7/0).
> That is what i think to be the challenge by building a new assembler:
> Encoding symbols as flexible as the opcode generation.
>
Is this taken care of by the instruction(s) chosen?
Personally, I expect an immediate to be encoded with the native integer
size, i.e., 16-bit for 16-bit code, 32-bit for 32-bit code, 64-bit for
64-bit code. Most assemblers "optimize" to the smallest size instruction
encoding, which I don't like. E.g., in 64-bit code, '3' will not be encoded
using the normal 64-bit offset form of the instruction, but will be encoded
using an 8-bit offset form of the instruction, if available. IMO, it should
be possible to select all possible instruction encodings using syntax, not
assembler flags and not command line options. IMO, the assembler shouldn't
provoke this response: "It does stuff I didn't code!", like optimize...
To zero-extend or sign-extend, (currently) one should use MOVSX, MOVSXD, or
MOVZX. That means if an immediate is involved, an instruction(s), like MOV,
must load the immediate into a register first. Alternately, to zero extend
32-bit offsets in 64-bit mode, you can use the encoding without REX.W.
Also, most arithmetic and bitwise instructions (ADD, ADC, SUB, SBC, OR, AND,
XOR) sign-extend for you. To zero-extend, you could "xor reg,reg" or "mov
reg,0" to clear, then load a smaller size immediate. Or, you can use
CBW/CWDE/CDQE or CWD/CDQ/CQO instructions.
Rod Pemberton
>> Exact seen. X86-64 code got implied (ZX-extended to 64-bit)
>> operations on all 32-bit accesses to registers/memory.
>> And this is known since 64-bit (AMD) arrived, even often
>> misinterpreted by tools and programmers.
>> I took this strange behave as a mnemocic additive (comment)
>> into my disassembler as an optional (just informal) ZxD prefix.
>> But whenever we may became more familiar with 64-bit coding we may
>> not need this 'ZxD comment' anymore, so I have this option as just
>> an output-option in my disassembler :)
>> How I see the how/why was an M$-dictate and AMD followed it like a
>> good girl and swallowed this toad (I'd have gone the logical way).
> The applicable mnemonic as I see it is that AMD acquired some DEC
> engineers at the demise of Alpha. Thus you see Alpha characteristics
> in AMD designs since then: Alpha had equal latencies for floating
> point multiply and add, so Athlon did too. Alpha never had partial
> register stalls because it had no instructions that could write a
> partial register, so X86-64 was designed such that the instructions
> that wrote a 32-bit register wrote the whole register by zeroing
> the high bits. If you write only AH, R11W, or SIL, there is no
> zeroing so partial registers can still be written, painfully with
> RFLAGS as well.
yeah, x86-64 behave like a complete different CPU, but what I figured
so far is that we've got partial register-stalls, especially on byte
access either on RL or RH if within one code fetch.
I haven't checked on Alpha nor on other (this time exotic CPUs), so
Sparc and other clones may have other issues on this.
But I also see that decisions for any layout isn't made up by engineers,
the final word is still in the hand of merchants (I again curse them all).
__
wolfgang
Hi Rod,
thank You for your comment. Your language is plain and
straightforward.
>
>> That is not so simple. By using for example notation like
>> 0x3 or 0x3L (no extension) or 0x3Z (zero extend, use REX.W C7/0).
>> That is what i think to be the challenge by building a new assembler:
>> Encoding symbols as flexible as the opcode generation.
>>
>
> Is this taken care of by the instruction(s) chosen?
Yes, but only for an eventual optimization purpouse
(not exactly mine), i.e., working on sets of instructions.
Ok, please forget that, i will try to explain the
concept in a better way:
Let us consider 2 things on 64bit:
a) cpu has most time the zeroing "mood"
b) cpu has *sometimes* the signing "mood"
if mode (64bit) is priority on datasize
if keyword dword is for 48C7 encoding
in such cases (MOV,AND etc) we could have the following
pattern,/for example/,
a) mov rax,16 ; mode64 use 48B8
b) mov rax,-16 ; neg64 (mode64) use 48B8
c) mov rax,dword 16 ; resize32 (mode64) use 48C7
d) mov rax,dword -16 ; resize32 (neg64(mode64)) use 48C7
e) mov rax,-dword 16 ; neg32 (resize32(mode64)) use 48B8
f) mov rax,-(dword 16) ; neg64 (resize32(mode64)) use *48C7* here
Case f) packs a CDQE inside, or a MOVSX on imm32,
because the sign "minus" tell us implicitely of a
"will" of taking care of the signedess in mode64.
> IMO, it should
> be possible to select all possible instruction encodings using syntax, not
> assembler flags and not command line options. IMO, the assembler shouldn't
> provoke this response: "It does stuff I didn't code!", like optimize...
exactly... i agree. syntax is a priority, switch/flags/cmd options make
the whole harder on both sides,to the developer and to the user.
>
> To zero-extend or sign-extend, (currently) one should use MOVSX, MOVSXD, or
> MOVZX. That means if an immediate is involved, an instruction(s), like MOV,
> must load the immediate into a register first. Alternately, to zero extend
> 32-bit offsets in 64-bit mode, you can use the encoding without REX.W.
> Also, most arithmetic and bitwise instructions (ADD, ADC, SUB, SBC, OR, AND,
> XOR) sign-extend for you. To zero-extend, you could "xor reg,reg" or "mov
> reg,0" to clear, then load a smaller size immediate. Or, you can use
> CBW/CWDE/CDQE or CWD/CDQ/CQO instructions.
>
> Rod Pemberton
as i explained above in the pattern, that is the reason
i find not only AND EAX,EAX cool, but also the fact that we
*don't need* a supposed MOVZX R64,R32.