It has been a while, but I had to write some 32 bit x86 assembly. This was a function thunk in Windows / Win32.
I was looking for a simple 5 byte instruction to JMP to an absolute (ie. non-relative) or immediate (imm32) address. Essentially very simple code generation.
I could not find such an instruction. All the JMPs were relative, indirect, or specifying some segment index/TSS switch. In fact, in all the windows thunking code, they branch relative which requires a tiny amount of extra work when generating the code. I ended up using the EAX register and more than one instruction to accomplish this.
spamt...@crayne.org wrote: > I was looking for a simple 5 byte instruction to JMP to an absolute > (ie. non-relative) > or immediate (imm32) address. Essentially very simple code generation.
> I could not find such an instruction. All the JMPs were relative, > indirect, or specifying some segment index/TSS switch. In fact, in all > the windows thunking code, they branch relative which requires a tiny > amount of extra work when generating the code. I ended up using the EAX > register and more than one instruction to accomplish this.
> Why was this choice made by Intel??
Relative jumps don't care where your code is located in memory, which is a big bonus, particularly if you aren't running a virtual memory system. It can be annoying sometimes when patching code up. Relative jumps work just fine though, you just need to calculate the offset yourself (For E8 jmp):
newtargetaddress-callbyteaddress+5
Where callbyteaddress is the address of the E8 byte you inserted.
Or you can do an absolute jump through this far slower construct:
spamt...@crayne.org wrote: > It has been a while, but I had to write some 32 bit x86 assembly. This > was a function > thunk in Windows / Win32.
> I was looking for a simple 5 byte instruction to JMP to an absolute > (ie. non-relative) > or immediate (imm32) address. Essentially very simple code generation.
> I could not find such an instruction. All the JMPs were relative, > indirect, or specifying some segment index/TSS switch. In fact, in all > the windows thunking code, they branch relative which requires a tiny > amount of extra work when generating the code. I ended up using the EAX > register and more than one instruction to accomplish this.
> Why was this choice made by Intel??
> thx in advance
JMP FAR does what you want. Of course, in flat-model OSes (like Windows or Linux) you don't really get to use it. That's the real problem.
Then again, I would argue that a tiny bit of extra work at code generation time is a whole lot better than executing two instructions at run time. How hard is it to compute the distance between the JMP and target if you already know the target address? Cheers, Randy Hyde
spamt...@crayne.org wrote: > Why was this choice made by Intel??
Someone please correct me if I'm wrong, but I think it's to do with relocation - if your code uses relative JMPs, it can be placed anywhere in memory without breaking it, as opposed to if you'd used absoloute addresses. Since, for example in Linux, your program may be being moved about in memory, or swapped in and out to disk WHILE it's running (or any other OS, probably Windows as well).
Can someone tell me what this JMP FAR mentioned above is? Will it take absolute i.e. numerical address? "JUMP to address 10"?
HI, You can use absolute jumps like this: mov eax,abs_address ; yes it can be zero or 10 JMP eax
or more directly
Call [my_jump_table]
or
Call eax
but then it is not a jumb anymore
I guess relocations did play a role in this choice. However CALL's are absolute so they do hinder relocations.
A nother assumption is that the JMP's appear very offten in code because of IF THEN ELSE like constructs in HLL languages and because of this if the operand is smaller for "close" relative then the code might be smaller and faster. Maybe adding to IP is also faster than simply loading it (i doubt this)...
> Someone please correct me if I'm wrong, but I think it's to do with > relocation - if your code uses relative JMPs, it can be placed anywhere > in memory without breaking it, as opposed to if you'd used absoloute > addresses. Since, for example in Linux, your program may be being moved > about in memory, or swapped in and out to disk WHILE it's running (or > any other OS, probably Windows as well).
> Can someone tell me what this JMP FAR mentioned above is? Will it take > absolute i.e. numerical address? "JUMP to address 10"?
> Someone please correct me if I'm wrong, but I think it's to do with > relocation - if your code uses relative JMPs, it can be placed anywhere > in memory without breaking it, as opposed to if you'd used absoloute > addresses. Since, for example in Linux, your program may be being moved > about in memory, or swapped in and out to disk WHILE it's running (or > any other OS, probably Windows as well).
I have no problem with relative JMP, but why exclude a JMP immediate. Given Intel's choice of instructions in the past, it would seem odd that they would exclude that.
> JMP FAR does what you want. Of course, in flat-model OSes (like Windows > or Linux) you don't really get to use it. That's the real problem.
Interesting stuff. Is that instruction just 5 bytes? (which OS's don't use a flat model?)
> Then again, I would argue that a tiny bit of extra work at code > generation time is a whole lot better than executing two instructions > at run time. How hard is it to compute the distance between the JMP and > target if you already know the target address? > Cheers, > Randy Hyde
Yep, I don't mind... and in fact this question is kind of ironic for me since I'm a big fan of PIC. I just cannot think of a reason Intel would not put something like that in. It seems strange that something so simple would be missing.
> Someone please correct me if I'm wrong, but I think it's to do with > relocation - if your code uses relative JMPs, it can be placed anywhere > in memory without breaking it, as opposed to if you'd used absoloute > addresses. Since, for example in Linux, your program may be being moved > about in memory,
Only in physical memory. The linear addresses remain constant. Think about it.
> or swapped in and out to disk WHILE it's running
This is completely transparent to the program.
> (or any other OS, probably Windows as well).
Insert pun on "as well".
> Can someone tell me what this JMP FAR mentioned above is? Will it take > absolute i.e. numerical address? "JUMP to address 10"?
A far jump is a jump that takes a segment:offset pair as target. A near jump only takes an offset inside the current segment. The IA-32 manual lists all address modes for each instruction. Get it from Intel.
> Someone please correct me if I'm wrong, but I think it's to do with > relocation - if your code uses relative JMPs, it can be placed anywhere > in memory without breaking it, as opposed to if you'd used absoloute > addresses.
That sounds logical. Disassembling the object file looks like these addresses *are* being relocated though, so... ???
> Since, for example in Linux, your program may be being moved > about in memory, or swapped in and out to disk WHILE it's running (or > any other OS, probably Windows as well).
I *think* we'd be at the same virtual address when we're swapped in and running. I think a dynamic library can be moved to a different virtual address, and so needs to be "position independent". I'm pretty fuzzy on this...
> Can someone tell me what this JMP FAR mentioned above is? Will it take > absolute i.e. numerical address? "JUMP to address 10"?
Well... "jmp far" and "call far" involve loading cs, as well as (e)ip. In the case of "call far", cs is on the stack - altering the expected position of parameters, relative to (e)sp/(e)bp (if you're doing that). More useful in 16-bit code, where the segments are only 64k-1 (usually). Not very useful in "flat model" OSen - not from "user code", anyway. However, to my absolute astonishment, it *does* work (in Linux - can't speak for Windows) if I use the same cs as I've already got (determined by inspection - I suppose it's "standard"... ??? Different number for Windows, probably).
You can *write* "jmp 10" or "call 10", in any case. What gets emitted for code is not "10", but the distance (+/-) from "here" to "10". In the case of "jmp", there's a signed-byte form, if the displacement fits in a signed byte.
The "far" versions emit the address (and the segment/selector), not a "relative" value. For example:
global _start
section .text _start:
call far [target] call 23h:subbie
;------------------------------------ ; these next two lines will crash, if run ; included strictly to observe the disassembly
call subbie
jmp 10 ;------------------------------------
jmp 23h:exit jmp exit jmp short exit
nop
exit:
mov eax, 1 int 80h
;------------------ subbie: ; say hi, just to prove we did something mov eax, 4 mov ebx, 1 mov ecx, msg mov edx, msg_len int 80h retf ;-----------------
section .data target dd subbie dw 23h
msg db 'tada!', 10 msg_len equ $ - msg ;--------------------------
Frank Kotler wrote: > spamt...@crayne.org wrote: >> spamt...@crayne.org wrote:
>>> Why was this choice made by Intel??
>> Someone please correct me if I'm wrong, but I think it's to do with >> relocation - if your code uses relative JMPs, it can be placed >> anywhere in memory without breaking it, as opposed to if you'd used >> absoloute addresses.
> That sounds logical. Disassembling the object file looks like these > addresses *are* being relocated though, so... ???
Depends on object format and the kind of call/jmp. Obviously, external references will have to be fixed up by a linker, or at loadtime.
> I *think* we'd be at the same virtual address when we're swapped in > and running. I think a dynamic library can be moved to a different > virtual address, and so needs to be "position independent". I'm > pretty fuzzy on this...
I can't think of why anybody would put you at a deifferent VA when swapped in, unless perhaps using an OS that doesn't support hardware paging. Segment-based swapping? Eek :)
> Well... "jmp far" and "call far" involve loading cs, as well as (e)ip. > In the case of "call far", cs is on the stack - altering the expected > position of parameters, relative to (e)sp/(e)bp (if you're doing > that). More useful in 16-bit code, where the segments are only 64k-1 > (usually). Not very useful in "flat model" OSen - not from "user > code", anyway. However, to my absolute astonishment, it *does* work > (in Linux - can't speak for Windows) if I use the same cs as I've > already got (determined by inspection - I suppose it's "standard"... > ??? Different number for Windows, probably).
It works on windows too, but the CS value changes between operating systems (at least between 9x/NT, haven't really bothered to look at the different NT versions). The FAR call/jmp opcodes are longer than the relative calls though, and might also be slower?
As for relocation, there's different ways to do it. For windows (in reality only used for DLLs, and only if the preferred imagebase isn't available), all absolute references are fixed up ("add [ref], delta").
Another way is to keep a global register that keeps the imagebase, and reference everything from that - you can 100% avoid relocations that way. I think that's what happens with GCC -fPIC.