writing a jump table

nicolasbock

unread,

Mar 15, 2011, 7:21:44 PM3/15/11

to

Hello list,

I am trying to write a jump table, but unfortunately with limited
success. When I compile the code and disassemble it, the offset of
"table" is 0, which I guess means that something didn't work out. Any
help would be gratefully appreciated.

The assembly code, jump_table.S:

.text
.global jump_table
.type jump_table, @function

jump_table:
push %rax

mov $0x02, %rax # Move index into rax; 2 is supposed to end up at
label_02.
jmp *table(, %rax, 4) # Jump into the table.

.align 8
table:
.long label_00
.long label_01
.long label_02

label_00:
nop

label_01:
nop

label_02:
nop

done:
pop %rax
ret

.size jump_table, .-jump_table

compiled with: gcc -c -g -o jump_table.o jump_table.S

disassembled code:

(gdb) disassemble jump_table
Dump of assembler code for function jump_table:
0x0000000000000000 <+0>: push %rax
0x0000000000000001 <+1>: mov $0x2,%rax
0x0000000000000008 <+8>: jmpq *0x0(,%rax,4)
0x000000000000000f <+15>: nop
0x0000000000000010 <+16>: add %al,(%rax)
0x0000000000000012 <+18>: add %al,(%rax)
0x0000000000000014 <+20>: add %al,(%rax)
0x0000000000000016 <+22>: add %al,(%rax)
0x0000000000000018 <+24>: add %al,(%rax)
0x000000000000001a <+26>: add %al,(%rax)
0x000000000000001c <+0>: nop
0x000000000000001d <+0>: nop
0x000000000000001e <+0>: nop
0x000000000000001f <+0>: pop %rax
0x0000000000000020 <+1>: retq
End of assembler dump.

Thanks already, nick

Bjarni Juliusson

unread,

Mar 15, 2011, 9:10:00 PM3/15/11

to

On 03/16/2011 12:21 AM, nicolasbock wrote:
> Hello list,
>
> I am trying to write a jump table, but unfortunately with limited
> success. When I compile the code and disassemble it, the offset of
> "table" is 0, which I guess means that something didn't work out. Any
> help would be gratefully appreciated.

> compiled with: gcc -c -g -o jump_table.o jump_table.S

What happens to the 0 if you link the program?

Bjarni
--

INFORMATION WANTS TO BE FREE

puppi

unread,

Mar 16, 2011, 8:03:09 AM3/16/11

to

On Mar 15, 8:21 pm, nicolasbock <nicolasb...@nospicedham.gmail.com>
wrote:

Don't worry, you're dumping the object-code file. Its address
references often appear wrong, but that's because .o is a relocatable
file format: the addresses, including those imbued in opcodes, will
only be truly defined when you link it. If that happens in a final,
executable file, then that's a real problem, and quite likely a bug in
the assembler or linker (in my PC it worked fine, for instance).
There's just a minor issue: your code as it is doesn't work, it causes
a segmentation fault, because you placed "table" in the .text section,
when you should have done it in the .data or .bss sections. The jump
to the address specified in table(, %rax, 4) causes a segfault not
because of the jump address per se, but because of the mere attempt to
fetch that address: you're trying to read data from the code segment
(which is read-protected). Embracing the table label and its contents
by .data and .text solves this minor issue.
Of course, you could also have disabled the read-protection in .text
with a 0x7D syscall, but that's a dirty trick best left for self-
modifying programs and similar beauties ;]

Frank Kotler

unread,

Mar 16, 2011, 10:23:55 AM3/16/11

to

puppi wrote:

...

>> When I compile the code and disassemble it, the offset of
>> "table" is 0, which I guess means that something didn't work out. Any
>> help would be gratefully appreciated.

...

> Don't worry, you're dumping the object-code file. Its address
> references often appear wrong, but that's because .o is a relocatable
> file format: the addresses, including those imbued in opcodes, will
> only be truly defined when you link it.

That's correct, I think.

...

> There's just a minor issue: your code as it is doesn't work, it causes
> a segmentation fault, because you placed "table" in the .text section,
> when you should have done it in the .data or .bss sections. The jump
> to the address specified in table(, %rax, 4) causes a segfault not
> because of the jump address per se, but because of the mere attempt to
> fetch that address: you're trying to read data from the code segment
> (which is read-protected).

I don't think that's correct. Read-protected? How does the CPU read it?
Have you verified that it works with the table in .data?

I'm not familiar with 64-bit code, but I suspect that the contents of
the jump-table need to be 64-bit addresses, no? Qwords, or...
".longlong", (G)as may call 'em? So should be "table(, %rax, 8)"? That's
just a guess.

Best,
Frank

Bernhard Schornak

unread,

Mar 16, 2011, 3:44:11 PM3/16/11

to

nicolasbock wrote:

This nop is your '.align 8'.

> 0x0000000000000010<+16>: add %al,(%rax)
> 0x0000000000000012<+18>: add %al,(%rax)

This is long #1,

> 0x0000000000000014<+20>: add %al,(%rax)
> 0x0000000000000016<+22>: add %al,(%rax)

long #2 and

> 0x0000000000000018<+24>: add %al,(%rax)
> 0x000000000000001a<+26>: add %al,(%rax)

long #3.

> 0x000000000000001c<+0>: nop
> 0x000000000000001d<+0>: nop
> 0x000000000000001e<+0>: nop

These are the three nop's in your labels

> 0x000000000000001f<+0>: pop %rax
> 0x0000000000000020<+1>: retq

and the final label 'done'.

Try

.data
.p2align 4,,15
table:.quad label_00
.quad label_01
.quad label_02

.text
.p2align 4,,15
pushq %rax
movl $0x02, %eax # saves a byte (no prefix)
jmp *table(, %rax, 8) # Jump to label 2

.p2align 4,,15
label_00:
nop
jmp done
.p2align 4,,15
label_01:
nop
jmp done
.p2align 4,,15
label_02:
nop
.p2align 4,,15
done:pop %rax
ret

Addresses, including labels, are 64 bit in 64 bit code,
hence .long was replaced by .quad (GCC itself uses jump
tables with offsets to %rip, reducing the table size to
one half...).

Greetings from Augsburg

Bernhard Schornak

puppi

unread,

Mar 16, 2011, 3:59:19 PM3/16/11

to

On Mar 16, 11:23 am, Frank Kotler

Yes, read-protected. Every memory page has an associated triplet of
bits defining its protections/permissions. Each of these 3 bits define
either read, write or execute protection. That's a property imposed by
the operating system and reinforced at the hardware level (that is,
the OS defines the protections for each page and the CPU applies
them). The CPU can read it because these restrictions are virtual,
imposed to applications only. Both the processor and the OS's kernel
have free access to memory. It's a safety feature hardware-supported
since the i286 and operating system imposed since the first version of
Linux and (at least fully) since Windows ME (which accounted for the
reduction of the famous blue error screen in Windows). For more info,
http://en.wikipedia.org/wiki/Protected_mode (there are better sources
for it than wikipedia, and many assembly books cover memory models and
modes). The .text section defaults to executable-only, the .data
section default to readable-only and the .bss section defaults to non-
executable (readable and writable). Some OSs give write permissions to
any readable page: that's why it's platform dependant if you can or
not use regions in .data as buffers.
These protections can, however, be changed under some limitations.
That's what the 0x7D syscall does. More info on this syscall can be
found on the manpages for the C function mprotect(), which is simply a
C interface to the syscall (and whose parameters are exactly those of
the syscall, in the same order).

But you're absolutely right about the faultness of the .long
declaration (.quad is the right one under GAS) and the need for
"table(, %rax, 8)" instead of table(, %rax, 4)". I'm so much more used
to 32bit assembly that I overlooked it completely. It had, however,
worked in my pc. I checked to see how that was possible, and found
that luckily the non-zero bytes of the labels' addresses were all in
their first 4 bytes. Also, the next bytes in memory were zeroed, which
indirectly gave the right address (since tha jmp was using the last
label, label_02). But changing the value in %rax from 2 to 1 or 0 gave
a plain segfault of course, because then the first 2 lower halfs of
two labels addresses were "concatenated" into a single, horribly wrong
address.

wolfgang kern

unread,

Mar 16, 2011, 4:49:03 PM3/16/11

to

nicolasbock posted:

> I am trying to write a jump table, but unfortunately with limited
> success. When I compile the code and disassemble it, the offset of
> "table" is 0, which I guess means that something didn't work out. Any
> help would be gratefully appreciated.

Methink a jump-table should reside in a data section if the entries
are subject to change (either by link/make/create/or the like)

OTOH I see nothing wrong if this table is part of a bootloader
and use Code-referenced data.

my way would be (assumed 64-bit mode yet):
;entry in rax and all entries contain a valid address
;a limit check here may not hurt much.

jmp [table+rax*8] ;if in the data-seg or
jmp cs:[table+rax*8] ;if the table is part of the code section
;also possible if your compiler is able to:
jmp [rip+table_offset+rax*8] ;RIP access uses CS by default

__
wolfgang

robert...@nospicedham.yahoo.com

unread,

Mar 16, 2011, 5:33:36 PM3/16/11

to

On Mar 16, 2:59 pm, puppi <fabricio.pu...@nospicedham.gmail.com>
wrote:

> reduction of the famous blue error screen in Windows). For more info,http://en.wikipedia.org/wiki/Protected_mode(there are better sources

> for it than wikipedia, and many assembly books cover memory models and
> modes). The .text section defaults to executable-only, the .data
> section default to readable-only and the .bss section defaults to non-
> executable (readable and writable). Some OSs give write permissions to
> any readable page: that's why it's platform dependant if you can or
> not use regions in .data as buffers.
> These protections can, however, be changed under some limitations.
> That's what the 0x7D syscall does. More info on this syscall can be
> found on the manpages for the C function mprotect(), which is simply a
> C interface to the syscall (and whose parameters are exactly those of
> the syscall, in the same order).

Your timing is more than a bit off. For x86, the NX (no execute) page
attribute was originally implemented by AMD for their first 64-bit
CPUs, and was back-ported to most later 32 bit x86s CPUs. The first
use in Windows ("Date Execution Protection" - DEP) was in a service
pack to Windows XP and WS2003, and happened (~August 2004) not too
long after CPUs with the feature stared shipping (~April 2003).
Windows ME (the last of the Win9x line) never supported any such
thing.

But the NX bit was a rather late addition to x86 (given that the 32
bit 80386, which also implemented the first paging hardware for x86,
shipped in 1985 - nearly twenty years earlier). Of course the
equivalent has been around on some other architecture for much longer
(although usually in the reverse sense - "allow execute" instead of
"don't allow execute").

When running 16 bit code, all protected mode versions of Windows
(starting with the first, v3.0*) did "enforce" unwriteable code
segments, but mainly because you cannot write through CS on any
protected mode x86. So it was more by default, and didn't apply to
anything but 16-bit protected mode. And Windows happily let you
create a data segment alias** for a code segment, so you could write
into it.

*Ignoring that you could actually run Win3.0 in real mode for
compatibility with Win2.x application compatibility

**AllocDStoCSAlias() was a one step method of creating an executable
segment from a data segment - if you wanted to go the other way there
were about three steps...

Frank Kotler

unread,

Mar 16, 2011, 9:52:58 PM3/16/11

to

puppi wrote:

...
>>> There's just a minor issue: your code as it is doesn't work, it causes
>>> a segmentation fault, because you placed "table" in the .text section,
>>> when you should have done it in the .data or .bss sections. The jump
>>> to the address specified in table(, %rax, 4) causes a segfault not
>>> because of the jump address per se, but because of the mere attempt to
>>> fetch that address: you're trying to read data from the code segment
>>> (which is read-protected).
>> I don't think that's correct. Read-protected? How does the CPU read it?
>> Have you verified that it works with the table in .data?

...

> Yes, read-protected. Every memory page has an associated triplet of
> bits defining its protections/permissions. Each of these 3 bits define
> either read, write or execute protection. That's a property imposed by
> the operating system and reinforced at the hardware level (that is,
> the OS defines the protections for each page and the CPU applies
> them). The CPU can read it because these restrictions are virtual,
> imposed to applications only. Both the processor and the OS's kernel
> have free access to memory. It's a safety feature hardware-supported
> since the i286 and operating system imposed since the first version of
> Linux and (at least fully) since Windows ME (which accounted for the
> reduction of the famous blue error screen in Windows). For more info,
> http://en.wikipedia.org/wiki/Protected_mode (there are better sources
> for it than wikipedia, and many assembly books cover memory models and
> modes). The .text section defaults to executable-only, the .data
> section default to readable-only and the .bss section defaults to non-
> executable (readable and writable). Some OSs give write permissions to
> any readable page: that's why it's platform dependant if you can or
> not use regions in .data as buffers.
> These protections can, however, be changed under some limitations.
> That's what the 0x7D syscall does. More info on this syscall can be
> found on the manpages for the C function mprotect(), which is simply a
> C interface to the syscall (and whose parameters are exactly those of
> the syscall, in the same order).

Okay, thanks for the heads-up on that. I'll be putting off learning
64-bit until I absolutely "have" to, I think...

Maybe I should let Nick explain this, but he's mentioned on the
linux-assembly list that he needs this for a dynamic library. In 32-bit
mode, this requires "PIC", Position Independent Code (Windows .dll's
have a relocation section, I believe - have a "preferred" load address,
but can be relocated if need be - Linux .so's lack the relocation table,
so really have to be PIC). I (mis?)understand that 64-bit code allows
"RIP-relative" addressing. I think this should solve that problem(?).
Maybe someone can advise Nick how to get (G)as to generate RIP-relative
code? (if you think that'll help him) The error message apparently
mentions "-fPIC"(?)...

Best,
Frank

Dick Wesseling

unread,

Mar 17, 2011, 4:41:43 AM3/17/11

to

In article <ilrpud$bun$1...@speranza.aioe.org>,

Frank Kotler <fbko...@nospicedham.myfairpoint.net> writes:
> ...
> Maybe I should let Nick explain this, but he's mentioned on the
> linux-assembly list that he needs this for a dynamic library. In 32-bit
> mode, this requires "PIC", Position Independent Code (Windows .dll's
> have a relocation section, I believe - have a "preferred" load address,
> but can be relocated if need be - Linux .so's lack the relocation table,
> so really have to be PIC).

ELF shared objects can have a relocation table just fine. But relocation
requires making a private copy of the patched page(s) which defeats the
purpose of shared objects.

Bob Masta

unread,

Mar 17, 2011, 9:18:19 AM3/17/11

to

Dunno about 64-bit or Linux, but in 32-bit Windows the
permissions are controlled at link time. You only need
data? (BSS) and .text sections, with everything that would
normally have been in .data placed instead into .text. Then
in the linker use

/SECTION:.text,ERW

Best regards,

Bob Masta

DAQARTA v6.00
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
Frequency Counter, FREE Signal Generator
Pitch Track, Pitch-to-MIDI
Science with your sound card!

nicolasbock

unread,

Mar 17, 2011, 3:51:15 PM3/17/11

to

On Mar 17, 1:52 am, Frank Kotler

<fbkot...@nospicedham.myfairpoint.net> wrote:
> puppi wrote:
>
> ...
>
> >>> There's just a minor issue: your code as it is doesn't work, it causes
> >>> a segmentation fault, because you placed "table" in the .text section,
> >>> when you should have done it in the .data or .bss sections. The jump
> >>> to the address specified in table(, %rax, 4) causes a segfault not
> >>> because of the jump address per se, but because of the mere attempt to
> >>> fetch that address: you're trying to read data from the code segment
> >>> (which is read-protected).
> >> I don't think that's correct. Read-protected? How does the CPU read it?
> >> Have you verified that it works with the table in .data?
>
> ...
>
>
>
>
>
>
>
>
>
> > Yes, read-protected. Every memory page has an associated triplet of
> > bits defining its protections/permissions. Each of these 3 bits define
> > either read, write or execute protection. That's a property imposed by
> > the operating system and reinforced at the hardware level (that is,
> > the OS defines the protections for each page and the CPU applies
> > them). The CPU can read it because these restrictions are virtual,
> > imposed to applications only. Both the processor and the OS's kernel
> > have free access to memory. It's a safety feature hardware-supported
> > since the i286 and operating system imposed since the first version of
> > Linux and (at least fully) since Windows ME (which accounted for the
> > reduction of the famous blue error screen in Windows). For more info,

> >http://en.wikipedia.org/wiki/Protected_mode(there are better sources

The code I posted here first was a test-case-stand-alone example that
I constructed so I could learn how to write jump tables. In the end
though I am looking to include an assembly function into a library
which is built in a static and a dynamic version using libtool and
autoconf/automake. As I mentioned on linux-assembly, the linker was
complaining about my jump table telling me that it can't relocate some
symbols. I used a C program with a large switch statement and gcc with
-fPIC to see how to deal with PIC and am using the version below now.
Just as gcc does, I placed the jump table in .section .rodata, but I
am not sure if that's equivalent to placing it in the .data section or
not.

.text
.global jump_table
.type jump_table, @function

jump_table:
# Push stack pointer so we can make room for local storage.
push %rax

mov $0x02, %rax # Move index into rax; 2 is supposed to end up at
label_02.

lea 0(,%rax, 4), %rdx
lea table(%rip), %rax
mov (%rdx, %rax), %edx
movslq %edx, %rdx
lea table(%rip), %rax
lea (%rdx, %rax), %rax
jmp *%rax

.section .rodata
.align 4
table:
.long label_00-table
.long label_01-table
.long label_02-table
.long label_03-table

.text
label_00:
jmp done

label_01:
jmp done

label_02:
jmp done

label_03:
jmp done

nicolasbock

unread,

Mar 17, 2011, 3:53:16 PM3/17/11

to

On Mar 16, 7:44 pm, Bernhard Schornak <schor...@nospicedham.web.de>
wrote:

Bernhard,

if the jumps are shortish, which is to mean within my assembly
function, wouldn't .long labels suffice?

Greetings from Los Alamos

nicolasbock

unread,

Mar 17, 2011, 4:02:22 PM3/17/11

to

On Mar 17, 8:41 am, f...@nospicedham.securityaudit.val.newsbank.net
(Dick Wesseling) wrote:
> In article <ilrpud$bu...@speranza.aioe.org>,

That's a subtle point I didn't appreciate before. I didn't understand
why the linker was complaining about the relocation. After all, as
someone pointed out before here, the compiler places the assembly
instructions at address 0 and the linker relocates the instructions to
their final location, fixing up addresses in the process. As I
understand now based on your post, the problem with dynamic libraries
really is that the OS wants to share the library and it will therefore
potentially load it into different virtual address spaces. Since
memory access is done by using an absolute address as opposed to an
offset for jumps, sharing of my original function can not work since
it uses absolute addresses. Is that about correct?

Dick Wesseling

unread,

Mar 18, 2011, 12:19:53 AM3/18/11

to

In article <45dbec16-ee13-48a8...@a11g2000pro.googlegroups.com>,

nicolasbock <nicol...@nospicedham.gmail.com> writes:
> On Mar 17, 8:41 am, f...@nospicedham.securityaudit.val.newsbank.net
> (Dick Wesseling) wrote:
>> In article <ilrpud$bu...@speranza.aioe.org>,
>> Frank Kotler <fbkot...@nospicedham.myfairpoint.net> writes:
>>
>> > ...
>> > Maybe I should let Nick explain this, but he's mentioned on the
>> > linux-assembly list that he needs this for a dynamic library. In 32-bit
>> > mode, this requires "PIC", Position Independent Code (Windows .dll's
>> > have a relocation section, I believe - have a "preferred" load address,
>> > but can be relocated if need be - Linux .so's lack the relocation table,
>> > so really have to be PIC).
>>
>> ELF shared objects can have a relocation table just fine. But relocation
>> requires making a private copy of the patched page(s) which defeats the
>> purpose of shared objects.
> That's a subtle point I didn't appreciate before. I didn't understand
> why the linker was complaining about the relocation.

What complaint? Maybe I missed a message, but both versions posted of
your jump_table.S compile and ran without complaint (after changing
.long to .quad and changing the scale factor to 8).

> After all, as
> someone pointed out before here, the compiler places the assembly
> instructions at address 0 and the linker relocates the instructions to
> their final location, fixing up addresses in the process. As I

Not "the linker", but "the linkers". There are two linkers involved:

* The compile time linker "ld" combines .o files into disk images.

* The runtime linker, which is typically called ld-something, loads disk
images, either main programs or shared objects, into memory.
See "man rtld" or "man ld.so".

The *runtime* linker relocates instructions to their final locations and
fixes up addresses. In order to do so it must change part of the code,
this will trigger the copy-on-write machinery in the OS which makes a
private copy of the modified pages. The remaing pages in can still be
shared between processes.

Elf uses 2 tricks to to minimize the number of pages that must be COWed:

1) collect all external *code* references of an object into a jump table
and create a stub that jumps through the table.
The external references are resolved with the address of the stub,
not the address of the final destination. Calling the stub is PIC
because x86 uses relative addresses for call and jmp.

This is in done by ld.

2) That leaves the problem of external *data* references. x86-64 can use
rip relative for internal data references, but not for data
references that are external to a shared object, the latter still
need to be relocated at runtime.

This is where the -fPIC option of the C compiler comes into play. all
external data references are also collected into a table and an
extra indirection level is generated by the compiler:

2.1) use relative addressing - rip for x86-64, ebx+offset for x86-32
to load the address of the data from the table.

2.2) use the address loaded in the previous step to access the data.

So, after linking a program or shared object most of the runtime
relocations will operates upon these tables, not on the rest of the
object. Private pages will be allocated for these tables only, and the
remaining pages can be shared.

> As I
> understand now based on your post, the problem with dynamic libraries
> really is that the OS wants to share the library and it will therefore
> potentially load it into different virtual address spaces. Since
> memory access is done by using an absolute address as opposed to an
> offset for jumps, sharing of my original function can not work since
> it uses absolute addresses. Is that about correct?

That is about correct.

If an object such as your jump_table.o contains relocations in the
code or data sections then these will be patched by the runtime
linker. Everything will work as expected, but the penalty is that
extra pages must be COWed just because one or two bytes must be
patched.
There's another memory penalty: if your program never executes the
COWed pages they will still contribute to its memory footprint.

If you have lots of jump tables then collecting all of your jump tables
into a section of their own alleviates the penalty: now only the
pages containing the jump table section need be COWed.

nicolasbock

unread,

Mar 18, 2011, 1:08:13 PM3/18/11

to

On Friday, March 18, 2011 4:19:53 AM UTC, Dick Wesseling wrote:
> What complaint? Maybe I missed a message, but both versions posted of
> your jump_table.S compile and ran without complaint (after changing
> .long to .quad and changing the scale factor to 8).

I tried that again and found that

(1) When I compile the assembly version (both of them) and make a shared library using automake/autoconf/libtool, I don't get any errors and can use that library with a main() to produce a working executable.
(2) When I add the first version of the jump table to my existing library, I get the linker error:

/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.5/../../../../x86_64-pc-linux-gnu/bin/ld: ./.libs/libspamm_kernel.a(jump_table_assembly.o): relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
./.libs/libspamm_kernel.a(jump_table_assembly.o): could not read symbols: Bad value

The second version with the relative addressing using %rip works fine. I supposed this means there is some sort of conflict between the jump table and the other functions in the library? How do I find that conflict? The error message doesn't sound terribly informative to me.

Here again for definiteness the assembly jump_table.S version I am using now. When I define SHARED, this compiles fine standalone and within my existing library. When I undefine SHARED it only compiles standalone, but not as part of my existing library.

jump_table.S:

.text
.global jump_table_asm
.type jump_table_asm, @function

jump_table_asm:

# Push stack pointer so we can make room for local storage.
push %rax

mov $0x02, %rax # Move index into rax; 2 is supposed to end up at label_02.

//#define SHARED
#ifdef SHARED

lea 0(,%rax, 4), %rdx
lea table(%rip), %rax
mov (%rdx, %rax), %edx
movslq %edx, %rdx
lea table(%rip), %rax
lea (%rdx, %rax), %rax
jmp *%rax

.section .rodata
.balign 4

table:
.long label_00-table
.long label_01-table
.long label_02-table
.long label_03-table

#else
jmp *table(, %rax, 4)

.section .rodata
.balign 4

table:
.long label_00
.long label_01
.long label_02

.long label_03
#endif

.text
label_00:
jmp done

label_01:
jmp done

label_02:
jmp done

label_03:
jmp done

done:
pop %rax
ret

.size jump_table_asm, .-jump_table_asm

> Not "the linker", but "the linkers". There are two linkers involved:

Thanks for the detailed explanation. That clarifies it a bit.

nick

puppi

unread,

Mar 18, 2011, 1:01:10 PM3/18/11

to

On Mar 16, 5:33 pm, "robertwess...@yahoo.com"

> > reduction of the famous blue error screen in Windows). For more info,http://en.wikipedia.org/wiki/Protected_mode(thereare better sources

You're absolutely right, my bad. I remembered that protected mode
started to be fully supported under Windows in the first version of
the NT series (except for the NT itself), which I mistakenly took to
be ME, instead of XP. Windows history (Windows, actually) is far from
being an area of expertise of mine.

Bernhard Schornak

unread,

Mar 18, 2011, 5:31:25 PM3/18/11

to

nicolasbock wrote:

Your jump offsets might fit into a single byte, but
the jump itself uses a memory reference. Any 64 bit
memory reference must be a 64 bit quantity!

This is how GCC creates jump tables for Win64:

.p2align 4,,10
L25: leaq L19(%rip), %rdx
movslq (%rdx,%rax,4), %rax
leaq (%rax,%rdx), %rdx
jmp *%rdx

.section .rdata,"dr"
.align 4
L19: .long L11-L19
.long L12-L19
.long L13-L19
.long L14-L19
.long L15-L19
.long L16-L19
.long L10-L19
.long L17-L19
.long L18-L19

.text
L18: case code
...
...
L11: case code
...

The addresses are stored RIP relative as offsets to
label L19, reducing the table to 32 bit quantities.
As you can see, offsets are expanded to 64 bit with
leaq (%rax, %rdx), %rdx - load RDX (address of L19)
plus RAX (offset from L19) into RDX - to generate a
valid memory reference.

IMHO, a few byte more don't really matter on recent
machines with 4 GB RAM and up if the extra size re-
duces execution time. JMP x(, RAX, 8) is one single
instruction (DP, 4 clocks). GCC's sequence executes
in 5 clocks (due to dependencies). One clock slower
is a lot of time if this jump table sits at the top
of a message loop. As long as it is not called very
often, size might be an issue.

> Greetings from Los Alamos

And back!

A nice weekend to all!

Bernhard Schornak

nicolasbock

unread,

Mar 18, 2011, 7:13:31 PM3/18/11

to

Ok, the latency argument is a good point. Since I care most about speed here right now, I should change the jump table to .quad

Thanks!

Dick Wesseling

unread,

Mar 19, 2011, 12:44:45 AM3/19/11

to

In article <3652af36-71a5-42a6...@glegroupsg2000goo.googlegroups.com>,

nicolasbock <nicol...@nospicedham.gmail.com> writes:
> On Friday, March 18, 2011 4:19:53 AM UTC, Dick Wesseling wrote:
> > What complaint? Maybe I missed a message, but both versions posted of
> > your jump_table.S compile and ran without complaint (after changing
> > .long to .quad and changing the scale factor to 8).
>
> I tried that again and found that
>
> (1) When I compile the assembly version (both of them) and make a shared
> library using automake/autoconf/libtool, I don't get any errors and
> can use that library with a main() to produce a working executable.
>
> (2) When I add the first version of the jump table to my existing
> library, I get the linker error:

> (...)

> relocation R_X86_64_32S against `.rodata' can not be used when
> making a shared object; recompile with -fPIC

The error message says that you cannot store a 64bit address in a 32bit
variable, thats the "32" in "R_X86_64_32S". That will only work if
your code is loaded into the first 4G, but shared objects must be
loadable everywhere.

Changing .long into .quad and using a scale factor of 8 takes care of
that problem.

Nicolas Bock

unread,

Mar 21, 2011, 12:24:21 PM3/21/11

to

On Saturday, March 19, 2011 4:44:45 AM UTC, Dick Wesseling wrote:
> The error message says that you cannot store a 64bit address in a 32bit
> variable, thats the "32" in "R_X86_64_32S". That will only work if
> your code is loaded into the first 4G, but shared objects must be
> loadable everywhere.
>
> Changing .long into .quad and using a scale factor of 8 takes care of
> that problem.

Thanks!