global start, global main or anything I want?

phoenix

unread,

Mar 16, 2008, 1:51:20 PM3/16/08

to

Hello guys,
I am a complete newbie who is trying to learn assembly. I am using
NASM and working on Linux.
I have written a (very) simple program that calculates ah+al.
Here it is the code:

-----------------------------------------------
; this file is called ass.s

segment .data
number1 db 5
number2 db 7

segment .text
global prog
prog:
push ebp
mov ebp,esp
xor eax,eax
mov al,[number1]
mov ah,[number2]
add al,ah
xor ah,ah
pop ebp
ret

-------------------------------------------------

So I type in the terminal: nasm -f elf ass.s -o ass.o
and then I type: gcc ass.o -o assexe

An error message appears:
/usr/lib/gcc/i486-linux-gnu/4.1.2/../../../../lib/crt1.o: In function
`_start':
(.text+0x18): undefined reference to `main'
collect2: ld returned 1 exit status

If the label (in the program) is called main (and not prog), there's
no error message.
If the label is called start, the error message appears again.

My question is: can I choose whatever label names in my program,
without having all these error messages? Or am I forced to always
insert a label called "main" in my code?

Thank you and sorry for my english!

Tim Roberts

unread,

Mar 16, 2008, 5:25:30 PM3/16/08

to

phoenix <spam...@crayne.org> wrote:
>
>I am a complete newbie who is trying to learn assembly. I am using
>NASM and working on Linux.
>I have written a (very) simple program that calculates ah+al.

>...

>So I type in the terminal: nasm -f elf ass.s -o ass.o
>and then I type: gcc ass.o -o assexe
>
>An error message appears:
>/usr/lib/gcc/i486-linux-gnu/4.1.2/../../../../lib/crt1.o: In function
>`_start':
>(.text+0x18): undefined reference to `main'
>collect2: ld returned 1 exit status
>
>If the label (in the program) is called main (and not prog), there's
>no error message.
>If the label is called start, the error message appears again.
>
>My question is: can I choose whatever label names in my program,
>without having all these error messages? Or am I forced to always
>insert a label called "main" in my code?

When you are writing a function that will be called by other code, you can
name it whatever you want to, as long as both sides agree on the names.

In this case, however, there is another player involved. When you run a
program from a command line, the system loads your program into memory and
starts it by jumping to a special address called the "transfer address".
When you use "gcc" to link your code, "gcc" assumes that you are using the
C run-time library, so it arranges that the "transfer address" points to a
startup function in the C run-time library. That startup function
initializes a bunch of stuff in the library, and then calls a function
called "main". Every C program must include a function called "main",
which receives the command-line arguments and does the main processing.
When "main" returns, the program ends.

So, if you want to use "gcc" to link your code (and that's a reasonable
thing to do when you are learning assembly, because it lets you use
convenient functions like printf), then your main entry point will have to
be called "main".

The code you wrote is not really a "program", because it doesn't do
anything with the result of the addition. It is more of a function that is
designed to be called by someone else. If you were calling this from a C
main program, you could call it "prog", and your "main" would be in the C
file.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Terence

unread,

Mar 16, 2008, 5:19:02 PM3/16/08

to

First, you are missing the END statement at the end of the code.
Then, near the beginning, after the .dat section, your are missing

.code
PROC (name)

I don't think that is all but only what I see immediately.

Frank Kotler

unread,

Mar 16, 2008, 5:58:37 PM3/16/08

to

phoenix wrote:
> Hello guys,
> I am a complete newbie who is trying to learn assembly. I am using
> NASM and working on Linux.
> I have written a (very) simple program that calculates ah+al.
> Here it is the code:
>
> -----------------------------------------------
> ; this file is called ass.s

What would you have named it if it divided by two? :)

Nothing special about English - the CPU doesn't speak it! :)

This is a good question (in any language). In general, you can use any
names you want, but certain names are "known" to certain parts of the
toolchain (including Nasm - don't call a procedure "eax", for example).
By default, gcc will link some "startup" code against the modules you
give it. This "startup code" (crt0.o or somesuch) calls "main" - so it
has to be there, somewhere! You can tell gcc "--nostartfiles" (or
somesuch) to eliminate this default behavior, if you want to...

You might want to write a "ass_c.c" to call "prog" (since you've
declared it "global")...

void prog (void);

int main()
{
prog();
return 0;
}

That would be another way to shut gcc up.

Without the underscore, "start" is "nothing special", but "_start" is
known to ld as the default entrypoint - where the loader will transfer
control to our code. The "startup code" includes this label, so if
you've linked that in, and also had a "_start" label in your code, ld
would complain about finding it twice. But ld expects to find it
*somewhere*, and will complain if it doesn't. Hard to please! :) You can
override "_start" as the "known" entrypoint with "-e:otherentry" on the
command line to ld... if you think there's any point...

So "main" and "_start" are the two "special" labels you have to worry
about. If you're *not* using C, you can use "main" freely - I try to
avoid it, unless it's a "real C main", since it may confuse humans, but
Nasm and ld don't care about it...

You *must* have an entrypoint - ld picks a default, if it can't find
"_start", which may or may not be right. Doesn't *need* to be called
"_start", but... what's the point of calling it something else!

So, we could say gcc provides "_start" and expects you to provide
"main". There isn't really much point in involving gcc at all, for
something as simple as this. We could just start with "_start", and exit
with "sys_exit", returning the "answer" in bl. (I'm fond of saying we
return an exitcode in ebx, but it's really just bl!) Like so:

; assemble: nasm -f elf myprog.asm
; link: ld -o myprog myprog.o
; run: ./myprog
; observe: echo $?

global _start

section .text
_start:
mov bl, 5
add bl, 7
mov eax, 1 ; __NR_exit
int 80h
;----------------

You can easily add variables number1 and number2 back in - just like
you've got it (except return in bl, not al) for "example 2", if you want.

When we get to adding two floating-point numbers... it becomes *really*
handy to be able to use "printf" to display the results! You may want
something a little more flexible than "echo $?" to communicate with the
user sooner than that. We can provide that, with or without C.

Or perhaps you want to get into interfacing with C right from the start.
There are a few "rules", besides "main" being special... caller cleans
up stack... ebx, esi, edi, and (obviously?) ebp have to be preserved
across calls (including "main"!)... and we can expect the same from C
when we call C (this implies that ecx and edx may be trashed on ya!!!).
Result in eax, usually (float???). Printf expects floats to be passed as
doubles, even if ya call 'em "float". I'm probably forgetting some...

For a start, try replacing "prog" with "main" in your file... can you
see the correct result with "echo $?". Looks to me like it'd work, but I
haven't tried it. Figured I'd leave that for you. :)

Best,
Frank

Robert Redelmeier

unread,

Mar 16, 2008, 6:32:37 PM3/16/08

to

phoenix <spam...@crayne.org> wrote in part:

> and then I type: gcc ass.o -o assexe
>
> An error message appears:
> /usr/lib/gcc/i486-linux-gnu/4.1.2/../../../../lib/crt1.o: In function
> `_start':
> (.text+0x18): undefined reference to `main'
> collect2: ld returned 1 exit status

> If the label (in the program) is called main (and not prog),
> there's no error message. If the label is called start,
> the error message appears again.

> My question is: can I choose whatever label names in my
> program, without having all these error messages? Or am I
> forced to always insert a label called "main" in my code?

In addition to the other fine responses you've received,
I would add you should look at the `gcc` manpage, particularly
under the -nostdlibs and -nostartfiles options.

Every Linux executable (both a.out and ELF) needs a global
entry point labeled _start . gcc startfiles provides this,
but then needs an entrypoint called main.

-- Robert

phoenix

unread,

Mar 17, 2008, 6:25:33 AM3/17/08

to

Thank you guys for your responses. They' ve been very useful.

I' ve made some tests today, the result is that if I leave "prog" as
the label name and then I use the GCC "-nostdlib " parameter, a
warning message stating that there's no _stat function appears but the
program (or function) is correctly linked. Using "-nostdlib "
parameter, the executable file is just 744 byte, without "-nostdlib "
parameter the executable file is about 7 kB!

My last question is: what is the best way to write simple (addition,
subtraction,...) STAND-ALONE assembly programs, without C libraries or
external stuff (just pure assembly)? What about writing the code of
the program with a _start label in it, then assembling and linking it,
using the "-nostdlib " parameter (or using directly ld)? Or is there a
better (or easier) way?

P.S.: why the executable stops with a segmentation fault? Am I forced
to use the syscall exit?

thank you

Frank Kotler

unread,

Mar 17, 2008, 5:27:21 PM3/17/08

to

phoenix wrote:
> Thank you guys for your responses. They' ve been very useful.
>
> I' ve made some tests today, the result is that if I leave "prog" as
> the label name and then I use the GCC "-nostdlib " parameter, a
> warning message stating that there's no _stat function appears but the
> program (or function) is correctly linked. Using "-nostdlib "
> parameter, the executable file is just 744 byte, without "-nostdlib "
> parameter the executable file is about 7 kB!
>
> My last question

Bet it isn't! :)

> is: what is the best way to write simple (addition,
> subtraction,...) STAND-ALONE assembly programs, without C libraries or
> external stuff (just pure assembly)? What about writing the code of
> the program with a _start label in it, then assembling and linking it,
> using the "-nostdlib " parameter (or using directly ld)? Or is there a
> better (or easier) way?

I think using ld directly is the easiest way. We can also use Nasm's "-f
bin" mode, and "stuff" an elf executable header on the front...

> P.S.: why the executable stops with a segmentation fault? Am I forced
> to use the syscall exit?

Yes! If you're "call"ed, say "main" being called from the "startup"
code, you can end with "ret". But the "_start" label is not called, it's
"jmp"ed to. There is no return address on the stack, the first thing on
the stack is the argument count, "argc". So a "ret" will attempt to
return to "argc" as an address - probably 1 (our program name is
"argv[0]", so "argc" is at least 1). This is outside "our" address
space, and segfaults.

Addition and subtraction are simple enough, displaying the result is
somewhat less obvious. If we send a number to stdout, it's treated as an
ascii code, and the ascii codes for the "number characters" are not the
same as the number! Fortunately, the decimal digit characters are
contiguous, so we can add '0' (the character '0', *not* the number 0 -
aka 48 decimal or 30h) to "convert" a number to its ascii code. That's
good for *one* digit, if we've got more, we need to extract 'em one at a
time. "div" will do this... there are faster ways. "div" puts the
quotient in eax, and the remainder in edx... if we "div" by ten
repeatedly, we get the digits we want, but "backwards" from the way we
want to print 'em. Simplest way to "demo" this is to use a "static"
buffer. This may be a little harder to follow, since it makes a
"temporary" buffer on the stack. If ya *can't* follow it, we can start
with something simpler... but you could "just use it"... ya don't know
how "printf" works either, most likely...

Best,
Frank

; nasm -f elf hwint.asm
; ld -o hwint hwint.o

global _start

section .data

hiya db "Hello, World of (real) Assembly Language!", 10
; "$", in this context, means "here", the current offset
; in the file, so this calculation counts characters.
hiya_len equ $ - hiya

ans db "InitDemo's value is "
ans_len equ $ - ans

InitDemo dd 5

section .text
_start:
mov ecx, hiya
mov edx, hiya_len
call write_stdout

mov ecx, ans
mov edx, ans_len
call write_stdout

mov eax, [InitDemo]
call showeaxd

call newline

xor eax, eax ; claim "no error".
exit:
mov ebx, eax ; error/return code in ebx (bl, actually)

mov eax, 1 ; __NR_exit
int 80h

;-----------------------

;---------------------------------
showeaxd:
pusha ; save caller's regs

sub esp, byte 10h ; make buffer on stack
lea ecx, [esp + 10h] ; start at "end" of buffer
mov ebx, 10 ; for decimal, divide by 10
xor esi, esi ; length counter
.top:
dec ecx ; work towards "front" of buffer
xor edx, edx ; "div" works with edx:eax!
div ebx ; quotient in eax, remainder in edx
add dl, '0' ; convert number to ascii char
mov [ecx], dl ; store it
inc esi ; count it
or eax, eax ; quotient zero?
jnz .top ; do more

mov edx, esi ; length in edx
call write_stdout ; print it

add esp, byte 10h ; free the buffer

popa ; restore caller's regs
ret
;---------------------------------

;-------------------
newline:
pusha
push byte 10 ; linefeed
mov ecx, esp ; stack is the buffer
mov edx, 1 ; just one
call write_stdout
add esp, byte 4 ; free buffer
popa
ret
;------------------

;------------------
write_stdout:
; expects buffer in ecx, length in edx
; trashes ebx!

mov ebx, 1 ; STDOUT
mov eax, 4 ; __NR_write
int 80h
ret
;-------------------

phoenix

unread,

Mar 19, 2008, 4:01:59 AM3/19/08

to

On 17 Mar, 22:27, Frank Kotler <spamt...@crayne.org> wrote:

> Yes! If you're "call"ed, say "main" being called from the "startup"
> code, you can end with "ret". But the "_start" label is not called, it's
> "jmp"ed to. There is no return address on the stack, the first thing on
> the stack is the argument count, "argc". So a "ret" will attempt to
> return to "argc" as an address - probably 1 (our program name is
> "argv[0]", so "argc" is at least 1). This is outside "our" address
> space, and segfaults.
>

Wow, this was the reason why the EIP register had the address 1, at
the end of the program!!!
This is an important thing that I couldn't understand before...

> Addition and subtraction are simple enough, displaying the result is
> somewhat less obvious. If we send a number to stdout, it's treated as an
> ascii code, and the ascii codes for the "number characters" are not the
> same as the number! Fortunately, the decimal digit characters are
> contiguous, so we can add '0' (the character '0', *not* the number 0 -
> aka 48 decimal or 30h) to "convert" a number to its ascii code. That's
> good for *one* digit, if we've got more, we need to extract 'em one at a
> time. "div" will do this... there are faster ways. "div" puts the
> quotient in eax, and the remainder in edx... if we "div" by ten
> repeatedly, we get the digits we want, but "backwards" from the way we
> want to print 'em. Simplest way to "demo" this is to use a "static"
> buffer. This may be a little harder to follow, since it makes a
> "temporary" buffer on the stack. If ya *can't* follow it, we can start
> with something simpler... but you could "just use it"... ya don't know
> how "printf" works either, most likely...

The code you've posted was very clear... so (even for me) It hasn't
been extremely difficult to understand it.

Thank you again, Frank