Where do register numbers come from when compiling to machine code?

51 views
Skip to first unread message

Martin Petkov

unread,
May 7, 2021, 2:03:10 PM5/7/21
to pwn-coll...@googlegroups.com
I was in the process of doing babyshell level 4, and while I solved it, there's a gap in my understanding. I don't understand how registers get translated to machine code, and I have yet to find the correct Google Search incantation to help with this ("x86_64 assembly register numbers" doesn't yield anything that looks right).

For example, to prepare to call execve:
0: 48 c7 c0 3b 00 00 00    mov rax, 0x3b

With eax:
0: b8 3b 00 00 00    mov eax, 0x3b

This makes no sense to me. Where do the bytes "48 c7 c0" come from? Why is the second form "b8" entirely different, despite eax just being the lower 4 bytes of rax? And why is "88" (or 0x58) not anywhere, even though http://ref.x86asm.net/coder64.html says that the opcode for mov is 88?

Thank you in advance, if anyone can shed some light!

Cindy Xiao

unread,
May 7, 2021, 3:45:03 PM5/7/21
to Martin Petkov, pwn-coll...@googlegroups.com
Hi,

You might find this article useful, it contains some walkthrough examples on x86_64 instruction encoding: https://pyokagan.name/blog/2019-09-20-x86encoding/.

You might also want to refer to this page, which has more detailed reference in general about instruction encoding and register encoding: https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers

For your example, note that there are lots of different instructions with the mnemonic MOV, not just the one whose opcode is 0x88. These instructions differ depending on where and what they are moving, and what size of values they are moving. If you search on http://ref.x86asm.net/coder64.html for "MOV", you can see lots of entries in that table with the mnemonic MOV, all with different opcodes.

In the following:

0: 48 c7 c0 3b 00 00 00    mov rax, 0x3b

0x48 would be the REX prefix byte, which indicates a 64-bit operand size.
0xc7 would be the opcode for "
MOV r/m16/32/64 imm16/32", which is one of the many different MOV instructions that appear in that table.
0xc0 encodes RAX in this situation, addressed directly - see the information about the ModR/M byte in the links for more explanation on this!
0x3b 00 00 00 is the immediate value you're moving in.

I'm not super familiar with x86_64 instruction encoding so this might not be 100% correct, but it should give a general idea. Hope that helps!

Cindy

--
You received this message because you are subscribed to the Google Groups "pwn-college-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pwn-college-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pwn-college-users/CAFkzVYVLLx3gT_tS8y_OJMdGr4zRNkaA2T-4AZDO4YU_woPg1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Martin Petkov

unread,
May 10, 2021, 8:19:36 AM5/10/21
to Cindy Xiao, pwn-coll...@googlegroups.com
Thank you! That really helps, especially knowing that H indicates a 64 bit command.
Reply all
Reply to author
Forward
0 new messages