I just read through "Programming from the Ground Up" to acquaint
myself with x86 assembly programming, and I'm interested in writing a
bootloader for myself.
Does anyone know of a tutorial for writing a bootloader with GAS? I
have found only tutorials using NASM, which I find to be a little more
subtle and harder to understand than GAS.
In particular, I don't understand how the following NASM commands
would look like in GAS:
org 0x07c00
This command (somehow) makes the program load at address 0x07c00, I'm
not sure about the details.
jmp codesel:program_start
I'm not sure how segment addressing looks in GAS.
lgdt[gdtr]
Loading the global descriptor table. Is this a macro? I'm not sure how
NASM would know how to do this without it being specified somewhere.
times 510-($-$$) db 0
Filling from the current location to address 510 with 0's.
Thanks a lot for your help and guidance.
-Patrick
"CuppoJava" <patrick...@gmail.com> wrote in message
news:4926123d-2b90-4741...@z30g2000prg.googlegroups.com...
> Hi everyone,
>
> Does anyone know of a tutorial for writing a bootloader with GAS? I
> have found only tutorials using NASM, which I find to be a little more
> subtle and harder to understand than GAS.
Check http://wiki.osdev.org/Bare_Bones
When you use GAS, you need some linker magic to handle some of the stuff...
HTH,
Ned
I think that's set with the linker: LD...
After assembling with GAS, you'll need to link the .o object file:
ld --oformat binary --Ttext 0x7C00 -o boot.bin boot.o
If GAS has a method to set the org in your .S file, I'm not aware of it, but
I'm not as familiar with GAS bootloader code as with NASM.
GAS bootloaders typically start:
.code16
.text
.global _start
_start:
> jmp codesel:program_start
> I'm not sure how segment addressing looks in GAS.
>
I think that's:
ljmp $CODESEL, $program_start
.code32
program_start:
Actually, most will say $start32 and start32: since that's the long jump
used to flush the cpu cache and enable the switch to 32-bit mode. This is
after the code to set %cr0 PE bit to 1.
I'm not completely sure about how to specify CODESEL in GAS at the moment.
It may be an equate, or a computed offset like NASM, i.e., codesel label
minus start of gdt.
After that you usually reload segments registers with 32-bit selectors:
movl $DATASEL, $eax
movl %eax, %ds
movl %eax, %es
# etc...
SS and ESP should be loaded together either via LSS instruction, or via
back-to-back moves. The paired moves lock out interrupts. It's a special
feature. I'm not sure of the GAS form of LSS, but I can look it up.
> lgdt[gdtr]
> Loading the global descriptor table. Is this a macro? I'm not sure how
> NASM would know how to do this without it being specified somewhere.
>
Looking at some files online, it seems to be something like the following.
FYI, I'm not 100% sure I've got the syntax correct.
lgdt gdt
# elsewhere your gdt is setup
gdt:
# gdt info using .byte, .word, .long assembly pseudo-instructions
.word (gdt_end - gdt - 1)
.
.
.
.
# a bunch of .long or .word or .byte to fill in descriptor data
gdt_end:
> times 510-($-$$) db 0
> Filling from the current location to address 510 with 0's.
>
I think that's something like:
.fill 0x1fe - (. - start) , 1, 0
And the 55 AA signature in NASM,
dw 0xaa55
Is this in GAS,
.word 0xAA55
Examples of small bootloaders in GAS used to be everywhere. I can't find a
good GAS example. I can help you track down the other details of converting
NASM to GAS if you wish. Did you find a good NASM example? E.g., enable
CR0.PE, long jump, lgdt and gdt table, 0xAA55 signature, etc.? Actually,
I'm having problems finding a basic NASM example today that shows everything
too...
HTH,
Rod Pemberton
Topic for AOD regulars, maybe, someone (Did I elect myself?) should
construct a basic bootloader or PM startup identical in NASM and GAS... then
put it on AOD FAQ. I used to mention the "muVinux" loader which I now think
was derived from Chris Giese code... Maybe something by CG would be a
start.
I am following this tutorial: http://www.osdever.net/tutorials/hello_btldr.php?the_id=85
which is clearly written but for NASM.
After reading around a bit more, it seems GAS is not easily capable of
generating real-mode code. Perhaps I'll put in some time to learn
NASM.
As an aside: would you mind explaining exactly what org 0x07c00 does?
This is how I thought it works:
(1) An assembly program is converted to binary by looking up the
opcodes for all the mnemonics in the program.
(2) A new file is created, and bytes 0 to 0x07c00 is padded with
zeroes. Then the program binary is copied into this file starting from
address 0x07c00.
Is that right?
Thanks
-Patrick
Not since 8080 ASM.
Nowadays ORG 0x7C00 means; here starts a block which is expected to be
loaded at segment:offset where offset is 0x7C00.
Do to the approach of IBM when it wrote its rombios code to load a
floppy boot sector for the IBM PC, we have a destination load address
of 0000:7C00h, still with us. This means the first physical byte of
the boot code will be loaded to segment 0, offset 0x7C00 by the
rombios bootstrap routine.
The algorithm for (1) is alot more involved nowadays also.
hth,
Steve
>
> Thanks
> -Patrick
I think I get it now.
So all the "org 0x7c00" directive does, is change how labels are
resolved into addresses. Is that right?
For this program:
_start:
movl %eax, %ebx
_start would be resolved by the linker to point to 0.
But for this program:
org 1
_start:
movl %eax, %ebx
_start would be resolved by the linker to point to 1.
Is that correct?
-Patrick
07C0:0000 can also be used.
--
Maxim S. Shatskih
Windows DDK MVP
ma...@storagecraft.com
http://www.storagecraft.com
I think yes.
I want to say yes. However, it can be more involved than that,
depending on your toolset, memory model, modules combined into a
group, etc. In any case you need to experiment with the toolset of
your choice and learn what _it_ does.
for a long example using NASM..
[MAP ALL ORG0.MAP]
;;--------------------------------------------------------60
;; File: ORG0.NSM
;; Last:
;; Init:
;; Vers: 0.0.0 r0
;; Note: test org 0.
;;--------------------------------------------------------60
;; Test & map ORG 0
;;--------------------------------------------------------60
BITS32
org 0
[SECTION .text]
_start:
mov ebx, eax
mov edx, _start
TIMES 30h db 90h
mov ax, 0
int 16h
int 19h
;;--------------------------------------------------------60
;;--------------------------------------------------------60
;; [SECTION .dseg]
;;--------------------------------------------------------60
;; --== EO .MOD ==--
;;--------------------------------------------------------60
-= VS =-
[MAP ALL ORG1.MAP]
;;--------------------------------------------------------60
;; File: ORG1.NSM
;; Last:
;; Init:
;; Vers: 0.0.0 r0
;; Note: test org 1.
;;--------------------------------------------------------60
;; Test & map ORG 1
;;--------------------------------------------------------60
BITS32
org 1
[SECTION .text]
_start:
mov ebx, eax
mov edx, _start
TIMES 30h db 90h
mov ax, 0
int 16h
int 19h
;;--------------------------------------------------------60
;;--------------------------------------------------------60
;; [SECTION .dseg]
;;--------------------------------------------------------60
;; --== EO .MOD ==--
;;--------------------------------------------------------60
I added a reference to _start (mov edx,_start) to illustrate a few
things..
- NASM Map file
---------------------------------------------------------------
Source file: ORG0.NSM
Output file: ORG0.BIN
-- Program origin
-------------------------------------------------------------
00000000
-- Sections (summary)
---------------------------------------------------------
Vstart Start Stop Length Class Name
00000000 00000000 00000040 00000040 progbits .text
-- Sections (detailed)
--------------------------------------------------------
---- Section .text
------------------------------------------------------------
class: progbits
length: 00000040
start: 00000000
align: not defined
follows: not defined
vstart: 00000000
valign: not defined
vfollows: not defined
-- Symbols
--------------------------------------------------------------------
---- Section .text
------------------------------------------------------------
Real Virtual Name
00000000 00000000 BITS32
00000000 00000000 _start
-=VS=-
- NASM Map file
---------------------------------------------------------------
Source file: ORG1.NSM
Output file: ORG1.BIN
-- Program origin
-------------------------------------------------------------
00000001
-- Sections (summary)
---------------------------------------------------------
Vstart Start Stop Length Class Name
00000001 00000001 00000041 00000040 progbits .text
-- Sections (detailed)
--------------------------------------------------------
---- Section .text
------------------------------------------------------------
class: progbits
length: 00000040
start: 00000001
align: not defined
follows: not defined
vstart: 00000001
valign: not defined
vfollows: not defined
-- Symbols
--------------------------------------------------------------------
---- Section .text
------------------------------------------------------------
Real Virtual Name
00000001 00000001 BITS32
00000001 00000001 _start
The above MAP listing hints at the difference.
The following listings give no clue..
1 [MAP ALL ORG0.MAP]
2 ;;--------------------------------------------------------60
3 ;; File: ORG0.NSM
4 ;; Last:
5 ;; Init:
6 ;; Vers: 0.0.0 r0
7 ;; Note: test org 0.
8 ;;--------------------------------------------------------60
9 ;; Test & map ORG 0
10 ;;--------------------------------------------------------60
11 BITS32
12 org 0
13 [SECTION .text]
14 _start:
15 00000000 6689C3 mov ebx, eax
16
17 00000003 66BA[00000000] mov edx, _start
18
19 00000009 90<rept> TIMES 30h db 90h
20 00000039 B80000 mov ax, 0
21 0000003C CD16 int 16h
22 0000003E CD19 int 19h
23 ;;--------------------------------------------------------60
24 ;;--------------------------------------------------------60
25 ;; [SECTION .dseg]
26 ;;--------------------------------------------------------60
27 ;; --== EO .MOD ==--
28 ;;--------------------------------------------------------60
-=VS=-
1 [MAP ALL ORG1.MAP]
2 ;;--------------------------------------------------------60
3 ;; File: ORG1.NSM
4 ;; Last:
5 ;; Init:
6 ;; Vers: 0.0.0 r0
7 ;; Note: test org 1.
8 ;;--------------------------------------------------------60
9 ;; Test & map ORG 1
10 ;;--------------------------------------------------------60
11 BITS32
12 org 1
13 [SECTION .text]
14 _start:
15 00000000 6689C3 mov ebx, eax
16
17 00000003 66BA[00000000] mov edx, _start
18
19 00000009 90<rept> TIMES 30h db 90h
20 00000039 B80000 mov ax, 0
21 0000003C CD16 int 16h
22 0000003E CD19 int 19h
23 ;;--------------------------------------------------------60
24 ;;--------------------------------------------------------60
25 ;; [SECTION .dseg]
26 ;;--------------------------------------------------------60
27 ;; --== EO .MOD ==--
28 ;;--------------------------------------------------------60
Which appear identical except for the ORG
Now I load each of these .bin files using a debugger which sees each
load command and loads each to a new segment...
--------------------------------------------------
*** Symbolic Instruction Debugger *** Release 3.2
Copyright (c) 1983,1984,1985,1988,1990,1991
Digital Research, Inc. All Rights Reserved
--------------------------------------------------
#rorg0.bin <---loads to current segment
Start End
142E:0000 142E:003F
#rorg1.bin <---loads to a subsequent segment
Start End
1432:0000 1432:003F
#d142e:0
142E:0000 66 89 C3 66 BA {00 00 00 00} 90 90 90 90 90 90 90
f..f............
142E:0010 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
90 ................
142E:0020 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
90 ................
142E:0030 90 90 90 90 90 90 90 90 90 B8 00 00 CD 16 CD
19 ................
.. I space here to show the end of the first, also to explain that the
address
.. of _start is between the brackets I've put in.
142E:0040 66 89 C3 66 BA {01 00 00 00} 90 90 90 90 90 90 90
f..f............
142E:0050 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
90 ................
142E:0060 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
90 ................
142E:0070 90 90 90 90 90 90 90 90 90 B8 00 00 CD 16 CD
19 ................
The dump command dumps from the first .bin thru the second, so note,
142E:0040h == 1432:0000h (this has everything to do with the code
being an exact multiple of 16 bytes, a 'paragraph'.)
If you can pickup on this idea, you will see that in the near
identical code, one .bin is 'correct' and the other will 'fail', but
not because of the segment it resides in. The first .bin address of
_start does indeed point to start of it segment, but the second .bin
is faulty because the address of its _start is off by one with the its
actual _start in the first paragraph of its segment. The
second .bin's loading would have to be at offset 1 in the 1st
paragraph of its segment to be 'correct', because it is ORG'ed at 1.
Whichever toolset you choose, you'll infrequently need to verify an
issue, like the above, but you will be very happy to know how to do
it. So my advice is to do test problems to learn how to check for
them.
hth.
Steve
Do I think NASM is easier? Yes. I prefer it to MASM or GAS. Some people
prefer NASM syntax, but using other assemblers: FASM, YASM.
> After reading around a bit more, it seems GAS is not easily capable of
> generating real-mode code.
>
Well, I've seen numerous OS projects based on the GNU toolchain, and they
all have 16-bit startups using GAS. I do know that GAS has .code16,
.data16, and .code16gcc directives for 16-bit support. I don't yet know
what the 16-bit code that GAS emits looks like.
I haven't converted any of my boot code or 32-bit protected-mode switch code
to GAS. So, I'm not exactly sure of all the limitations, especially 16-bit
limitations. But, I have implemented 32-bit inline assembly in C for GCC
(i.e., DJGPP) which is basically GAS syntax.
> As an aside: would you mind explaining exactly what org 0x07c00 does?
>
Sorry, I don't know _exactly_... It may do some things I'm not aware of.
What I do know is:
1) that it indicates the code is to be loaded at offset 0x7c00 from the
start of a segment, probably CS, maybe DS, or both...
2) that offsets will have 0x7c00 added to correct their addresses for the
load location of 0x7c00
Data is loaded based on DS, so data addresses are probably corrected by ORG.
Code is loaded based on CS, so jump addresses are probably corrected by ORG
too.
I.e., if I say "ORG 0x7c00", load the code to 0x7c00 with CS and DS
segment=0, and run it there, it works.
I.e., if I say "ORG 0x7c00", load the code to segmentX*16+0x7c00 with CS and
DS segment=segmentX, and run it there, it works.
E.g., from some code that compiles for 0x7c00 (BIOS boot) and for 0x100 (DOS
.com):
lss sp, [stk]
; elsewhere
stk:
dw 07c00h
dw 0
When compiled for "ORG 0x100", the (dis)assembled instruction is:
lss sp, [0x1b3]
When compiled for "ORG 0x7c00", the (dis)assembled instruction is:
lss sp, [0x7cb3]
You can see that 0x100 or 0x7c00 is added to the offset for stk: label -
apparently 0xb3. That would be a data address based on an offset from DS
segment.
Rod Pemberton
yeah. I use my own assembler (BGBASM), but it also still uses (more or less)
NASM syntax.
however, the main difference between mine and NASM is that mine is generally
used for assembling things at runtime, whereas NASM is generally used as a
static assembler.
YASM can apparently be used as both a static and runtime assembler, but I
haven't really much investigated the specifics (I wrote mine well before I
found out about YASM, and generally stuck with my own assembler).
also different:
mine supports multiple opcodes per line ("add eax, 15; shl eax, 4", note ';'
uses whitespace to decide whether to merge lines or be a comment);
mine supports C-stlye comments;
mine doesn't support assembly-time expressions;
the preprocessor works differently;
...
>> After reading around a bit more, it seems GAS is not easily capable of
>> generating real-mode code.
>>
>
> Well, I've seen numerous OS projects based on the GNU toolchain, and they
> all have 16-bit startups using GAS. I do know that GAS has .code16,
> .data16, and .code16gcc directives for 16-bit support. I don't yet know
> what the 16-bit code that GAS emits looks like.
>
> I haven't converted any of my boot code or 32-bit protected-mode switch
> code
> to GAS. So, I'm not exactly sure of all the limitations, especially
> 16-bit
> limitations. But, I have implemented 32-bit inline assembly in C for GCC
> (i.e., DJGPP) which is basically GAS syntax.
>
AFAIK GAS didn't originally generate real-mode code, and instead as86 was
typically used (or NASM or others).
eventually they added 16-bit support into GAS, and as86 I guess started to
fade away.
I may be wrong here though...
no real comment here...
<--
07C0:0000 can also be used.
-->
technically, yes, as this is the same address...
however, the BIOS can't likely jump to this address, as it would essentially
risk breaking many/most bootloaders. a bootloader could itself use this
address (probably with a far jump to a label), such as, for example, to
allow them to not use "org" or similar.
While looking for some NASM info, I found this comparison of GAS and NASM
syntax:
http://www.ibm.com/developerworks/linux/library/l-gas-nasm.html?S_TACT=105AGX52&S_CMP=cn-a-l
RP
Thank you everyone for all your help, and especially to Steve's mini-
tutorial on how I can go about finding out some of these answers for
myself. This will keep me busy for a while to come.
-Patrick
Compaq Presario did exactly this.