I need help to build a code generator

11 views
Skip to first unread message

Robe

unread,
Apr 4, 2007, 1:52:49 PM4/4/07
to gener...@googlegroups.com
Hi,

I'm doing a code generator for a compiler. It will generate code for Linux and FreeBSD using the i386 architecture.

I'm guiding myself for the Red Dragon book and the Tool Interface Standard document to do that.

The goal of my compiler is to create relocatable object files from scratch, and then build a binary executable with those object files using the GNU linker "ld".

As you can see the "Red Dragon" book and the document "Tool Interface Standard (TIS) - Executable and Linking Format (ELF) Specification v1.2" is not enough to accomplish this task.

Because of that I was thinking in a way that let me see a start point and I get an idea:

I used the Netwide assembler to create one relocatable object file with the minimum code possible and here it is

------------
min.asm
------------
; Minimum assembler file

;;;;;;;
;  Compile with the following command line
;     nasm -f elf min.asm

section .text

    global _start   ; must be declared for the linker's entry point (ld)

_start:

    mov    eax, 1    ; system call number (sys_exit)
    int    0x80

As you can see the code above just return the control to the operating system. This is the minimum code possible for a program.

Using min.o as a guide and the libelf library I started to write the code of the code generator. I'm using ghex2 (a hexadecimal viewer/editor for gnome) either.

After that I compiled the assembler code, then with the "readelf" command I see the relocatable object file generated by nasm with the following command line:

readelf -a min.o

Here's the important information

Section Headers:
  [Nr] Name              Type            Addr           Off          Size         ES   Flg    Lk  Inf  Al
  [ 0]                          NULL           00000000  000000    000000   00            0    0    0
  [ 1] .text                 PROGBITS    00000000  000130   000007   00    AX    0    0    16
  [ 2] .comment        PROGBITS    00000000  000140   00001f    00            0    0    1
  [ 3] .shstrtab          STRTAB        00000000  000160   00002a    00           0    0    1
  [ 4] .symtab           SYMTAB       00000000  000190    000050   10            5    4    4
  [ 5] .strtab              STRTAB       00000000  0001e0    000010   00            0    0    1

Symbol table '.symtab' contains 5 entries:
   Num:    Value    Size  Type         Bind        Vis              Ndx     Name
     0: 00000000     0      NOTYPE    LOCAL     DEFAULT    UND
     1: 00000000     0      FILE           LOCAL     DEFAULT    ABS    min.asm
     2: 00000000     0      SECTION   LOCAL     DEFAULT    ABS
     3: 00000000     0      SECTION   LOCAL     DEFAULT    1
     4: 00000000     0      NOTYPE    GLOBAL   DEFAULT    1         _start


In my code generator I just put the items 0 and 4 in my .symtab section.

Here is the first doubt

What does the rest of the entries in special the number 2?
Why the code generated by my code generator run perfectly even when I've omitted these entries?

Up to here everything was fine. The problem arrive when I try to add the ".data" section to the object file with this code

------------
min.asm
------------

; Minimum assembler file with .data section

;;;;;;;
;  Compile with the following command line
;     nasm -f elf min.asm

section .data        ; data section
    msg:    db "Hello world", 10    ; the string to print, 10=cr

section .text

    global _start   ; must be declared for the linker's entry point (ld)

_start:
    mov    ecx,msg        ; arg2, pointer to string
    mov    edx,12        ; arg3, length of string to print
    mov    ebx,1            ; arg1, where to write, screen
    mov    eax,4            ; write sysout command to int 80 hex
    int    0x80            ; interrupt 80 hex, call kernel

    mov    eax, 1    ; system call number (sys_exit)
    int    0x80

Once the file is compiled using the same readelf command I get the following results

Section Headers:
  [Nr] Name              Type             Addr             Off            Size          ES   Flg    Lk   Inf   Al
  [ 0]                          NULL            00000000     000000    000000     00            0     0     0
  [ 1] .data                 PROGBITS    00000000     000180    00000c     00    WA   0     0     4
  [ 2] .text                  PROGBITS    00000000     000190    00001d    00    AX     0    0     16
  [ 3] .comment         PROGBITS    00000000     0001b0    00001f     00             0    0     1
  [ 4] .shstrtab           STRTAB        00000000     0001d0    00003a    00             0    0     1
  [ 5] .symtab            SYMTAB        00000000    000210     000070    10            6    6     4
  [ 6] .strtab               STRTAB        00000000    000280     000014    00             0    0     1
  [ 7] .rel.text             REL              00000000     0002a0    000008     08            5    2     4

Relocation section '.rel.text' at offset 0x2a0 contains 1 entries:
 Offset        Info             Type             Sym.Value     Sym. Name
00000001   00000301   R_386_32      00000000      .data

Symbol table '.symtab' contains 7 entries:
   Num: Value        Size    Type          Bind        Vis              Ndx      Name
     0: 00000000     0         NOTYPE     LOCAL     DEFAULT    UND
     1: 00000000     0         FILE            LOCAL     DEFAULT    ABS     min.asm
     2: 00000000     0         SECTION    LOCAL     DEFAULT    ABS
     3: 00000000     0         SECTION    LOCAL     DEFAULT     1
     4: 00000000     0         SECTION    LOCAL     DEFAULT     2
     5: 00000000     0         NOTYPE     LOCAL     DEFAULT     1          msg
     6: 00000000     0         NOTYPE     GLOBAL  DEFAULT      2          _start

As you can see now appear a new section ".rel.text"

Now there is a difference between the two compiled files in the section .symtab in the field Info. In the first example the value is 4 and in the second is 6.

Another questions

What's the meaning of the field "info" in the section ".symtab"?
What's the meaning of the item 1 in the ".symtab" section?

At this point I need to create a relocation section in my object file but it doesn't work anymore because when I try to link it the "ld" issue an error

Segmentation fault (core dumped)

I need to know the answers of all these questions to can go on.

I hope you can help me.

Best regards,

--
Robe.

We rarely think people have good sense unless they agree with us

Michael Matz

unread,
Apr 4, 2007, 3:03:21 PM4/4/07
to gener...@googlegroups.com
Hi,

[when pasting tables, try to use a non-proportional font, it's very hard
to read otherwise]

On Wed, 4 Apr 2007, Robe wrote:

> Here's the important information
>
> Section Headers:
> [Nr] Name Type Addr Off Size ES Flg Lk Inf Al

> [ 0] NULL 00000000 000000 00000000 0 0 0


> [ 1] .text PROGBITS 00000000 000130 000007 00 AX 0 0 16
> [ 2] .comment PROGBITS 00000000 000140 00001f 00 0 0 1
> [ 3] .shstrtab STRTAB 00000000 000160 00002a 00 0 0 1
> [ 4] .symtab SYMTAB 00000000 000190 000050 10 5 4 4

> [ 5] .strtab STRTAB 00000000 0001e0 00001000 0 0 1


>
> Symbol table '.symtab' contains 5 entries:
> Num: Value Size Type Bind Vis Ndx Name
> 0: 00000000 0 NOTYPE LOCAL DEFAULT UND
> 1: 00000000 0 FILE LOCAL DEFAULT ABS min.asm
> 2: 00000000 0 SECTION LOCAL DEFAULT ABS
> 3: 00000000 0 SECTION LOCAL DEFAULT 1
> 4: 00000000 0 NOTYPE GLOBAL DEFAULT 1 _start
>
>
> In my code generator I just put the items 0 and 4 in my .symtab section.

> ...

> Here is the first doubt
>
> What does the rest of the entries in special the number 2?

The other entries are section and file symtab entries, described in TIS:

<cite>
A symbol's type provides a general classification for the associated
entity.
...
* STT_SECTION

The symbol is associated with a section. Symbol table entries of
this type exist primarily for relocation and normally have STB_LOCAL
binding.

* STT_FILE

Conventionally, the symbol's name gives the name of the source file
associated with the object file. A file symbol has STB_LOCAL
binding, its section index is SHN_ABS, and it precedes the other
STB_LOCAL symbols for the file, if it is present.
</cite>

They theoretically are optional if no section based relocation exists.

> Why the code generated by my code generator run perfectly even when I've
> omitted these entries?

By chance, I believe. It might be that theoretically the section symbols
are not necessary, but they happen to be generated by e.g. GNU as, so it's
conceivable that GNU ld meanwhile depends on their existence. You might
want to generate the section symbols for all sections except the symbol
and string table sections (just because that's the same with GNU as).

> Once the file is compiled using the same readelf command I get the
> following results
>
> Section Headers:
> [Nr] Name Type Addr Off
> Size ES Flg Lk Inf Al
> [ 0] NULL 00000000 000000
> 000000 00 0 0 0
> [ 1] .data PROGBITS 00000000 000180 00000c
> 00 WA 0 0 4
> [ 2] .text PROGBITS 00000000 000190 00001d
> 00 AX 0 0 16
> [ 3] .comment PROGBITS 00000000 0001b0 00001f 00
> 0 0 1
> [ 4] .shstrtab STRTAB 00000000 0001d0 00003a
> 00 0 0 1
> [ 5] .symtab SYMTAB 00000000 000210 000070
> 10 6 6 4
> [ 6] .strtab STRTAB 00000000 000280 000014
> 00 0 0 1
> [ 7] .rel.text REL 00000000 0002a0
> 000008 08 5 2 4
>
> Relocation section '.rel.text' at offset 0x2a0 contains 1 entries:
> Offset Info Type Sym.Value Sym. Name
> 00000001 00000301 R_386_32 00000000 .data

See? Here you have a relocation against the .data section. If you look
at the actual relocation entry (you might need a hexdumper, as readelf and
objdump interpret all entries), you will see that the r_info member of it
contains the number of the section symbol for the .data section ...

> Symbol table '.symtab' contains 7 entries:
> Num: Value Size Type Bind Vis Ndx
> Name
> 0: 00000000 0 NOTYPE LOCAL DEFAULT UND
> 1: 00000000 0 FILE LOCAL DEFAULT ABS
> min.asm
> 2: 00000000 0 SECTION LOCAL DEFAULT ABS
> 3: 00000000 0 SECTION LOCAL DEFAULT 1
> 4: 00000000 0 SECTION LOCAL DEFAULT 2
> 5: 00000000 0 NOTYPE LOCAL DEFAULT 1
> msg
> 6: 00000000 0 NOTYPE GLOBAL DEFAULT 2
> _start

... which is 3 in your case. (A STT_SECTION symbol, whose index is the
one of the section you search, which is 1 for .data for you). See the TIS
again:

<cite>
* st_shndx

Every symbol table entry is ``defined'' in relation to some section;
this member holds the relevant section header table index. As Figure
1-8 {*} and the related text describe, some section indexes indicate
special meanings.
</cite>

> As you can see now appear a new section ".rel.text"
>
> Now there is a difference between the two compiled files in the section
> .symtab in the field Info. In the first example the value is 4 and in the
> second is 6.

From TIS again:

<cite>
+ Figure 1-13: sh_link and sh_info Interpretation

sh_type sh_link sh_info
======= ======= =======
...
SHT_SYMTAB, The section header index of One greater than the symbol
SHT_DYNSYM the associated string table. table index of the last local
symbol (binding STB_LOCAL).
</cite>

>
> Another questions
>
> What's the meaning of the field "info" in the section ".symtab"?

See above.

> What's the meaning of the item 1 in the ".symtab" section?

Entry 1, the STT_FILE symbol? See above.

> At this point I need to create a relocation section in my object file
> but it doesn't work anymore because when I try to link it the "ld" issue
> an error
>
> Segmentation fault (core dumped)

Might happen with invalid .o files. ld doesn't verify _all_ values you
put in there, so it possibly does out-of-range accesses.

> I hope you can help me.

You said you have TIS ELF, please do read section 1 completely from front
to back, all quotes above come from that section, and readelf output is
relatively near that description (only stripping common prefixes for
brevity, and joining some information for easier parsing by humans).


Ciao,
Michael.

Reply all
Reply to author
Forward
0 new messages