144 dd foo - 0xc0000000 + 0x3
145
146 foo:
147 times 1024 dd 0
The warning is produced for line 144. As far as I understand, "dd"
should define a 4-byte data item, so I'm not sure what it is that nasm
doesn't like here.
Looks like a bug, to me. We'll look into it. Revert to nasm-2.07 if you
have to. (or just ignore the warning)
Thanks for the feedback!
Best,
Frank
That calculation only makes sense at runtime:
1) get the address of 'foo' into a register
2) subtract 0xC0000003 from it
Nathan.
Hmm?? How would the assembler know the address of 'foo' at compile
time? Only the linker itself knows how it will arrange the items from
various object files. Does Nasm 'always' assume that output is to a
flat file??
Nathan.
That's what linkers and loaders are for. The assembler puts the offsets
into the object file, and the linker (and later, the loader) adds the
base address where the section ends up.
nasm here takes the offset of foo from the start of its section (the
data section, I'm guessing), subtracts 0xc0000000, adds 3, stores the
result in the allocated long word, and marks it for relocation. When the
file is linked, the sections get relocated, and the linker adds the base
address of the data section to the offset stored in the long word.
Bjarni
--
INFORMATION WANTS TO BE FREE
Ah right, forgot to add that I'm using nasm 2.08.02, in case that helps.
Thanks.
That's what I thought. The assembler can only do calculations on
offsets -- not the resulting actual address. Thats why EBX in the
following code gets loaded with a crazy result:
global _start
section .data
foobar db 54h, 45h, 53h, 54h
barfoo dd foobar
foo dd foobar - 0xC0000000 + 3
section .text
_start:
mov ecx, foobar
mov edx, [barfoo]
mov ebx, [foo]
mov eax, 1
int 80h
Nathan.
...
>> nasm here takes the offset of foo from the start of its section (the
>> data section, I'm guessing), subtracts 0xc0000000, adds 3, stores the
>> result in the allocated long word, and marks it for relocation. When the
>> file is linked, the sections get relocated, and the linker adds the base
>> address of the data section to the offset stored in the long word.
>>
>
> That's what I thought. The assembler can only do calculations on
> offsets -- not the resulting actual address. Thats why EBX in the
> following code gets loaded with a crazy result:
>
> global _start
>
> section .data
> foobar db 54h, 45h, 53h, 54h
> barfoo dd foobar
> foo dd foobar - 0xC0000000 + 3
>
> section .text
> _start:
> mov ecx, foobar
> mov edx, [barfoo]
> mov ebx, [foo]
> mov eax, 1
> int 80h
>
> http://imgur dot com/MqCXI
(? can't post with this domain???)
Looks right to me. What do you think it should be?
Bjarni's got it right. Relocation. What Nasm *won't* do is:
dd foo >> 8
In this case, Nasm would not know the "final address" of "foo", and
simply adding a relocation would not give the correct result. This is
what causes Nasm to whine piteously about "scalar value" (defined
nowhere in the manual).
"data exceeds bounds" is a bug, I'm quite sure. 2.07 doesn't do it.
Best,
Frank
That is certainly weird.
> Looks right to me. What do you think it should be?
>
Why, Nasm should read one's mind and put 08049098 there, of
course! ;)
> Bjarni's got it right. Relocation. What Nasm *won't* do is:
>
> dd foo >> 8
>
> In this case, Nasm would not know the "final address" of "foo", and
> simply adding a relocation would not give the correct result. This is
> what causes Nasm to whine piteously about "scalar value" (defined
> nowhere in the manual).
>
Wikipedia says 'scalar' means a single value as opposed to multiple
values. Seems to me that Nasm Doc shouldn't use the 'scalar' word
since there is not built-in support for arrays.
> "data exceeds bounds" is a bug, I'm quite sure. 2.07 doesn't do it.
>
First line of my snippet encodes as 'b9 98 90 04'; if 'b9' is the
instruction, then that leaves only 3 bytes to encode the result of
'some type of mathematical operation' that involves 'C0 00 00 00',
right? Pretty clear where the bug originated.
Nathan.
Seems so to me! I don't think I've got the ambition right now to track
it down...
>> Looks right to me. What do you think it should be?
>>
>
> Why, Nasm should read one's mind and put 08049098 there, of
> course! ;)
Actually, if you look at your .o file, I think you'll find that ld does
the dirty work. Or make a list file (something I don't ordinarily do).
See the square brackets around the addresses in the opcode/operand
column? That tells you it's a relocation.
>> Bjarni's got it right. Relocation. What Nasm *won't* do is:
>>
>> dd foo >> 8
>>
>> In this case, Nasm would not know the "final address" of "foo", and
>> simply adding a relocation would not give the correct result. This is
>> what causes Nasm to whine piteously about "scalar value" (defined
>> nowhere in the manual).
>>
>
> Wikipedia says 'scalar' means a single value as opposed to multiple
> values. Seems to me that Nasm Doc shouldn't use the 'scalar' word
> since there is not built-in support for arrays.
I disclaim all responsibility for the use of the word "scalar". It's
what the error message says...
>> "data exceeds bounds" is a bug, I'm quite sure. 2.07 doesn't do it.
>>
>
> First line of my snippet encodes as 'b9 98 90 04'; if 'b9' is the
> instruction, then that leaves only 3 bytes to encode the result of
> 'some type of mathematical operation' that involves 'C0 00 00 00',
> right? Pretty clear where the bug originated.
To be honest, I think that's a bug in your debugger (what is that, anyway?).
Try this one. It pains me to use the C library, but hopefully it'll let
the Windows folks play, too. :)
Best,
Frank
;----------------------
; for Linux
; nasm -f elf32 myprog.asm
; gcc -o myprog myprog.o
; for Windows(?)
; nasm -f win32 --prefix _ myprog.asm
; gcc -o myprog.exe myprog.obj
; "should" work with other compilers
; for Watcom compilers?
; nasm -f win32 --postfix _ myprog.asm
global main
extern printf
section .data
format db `%X %X\n`, 0
; nasm after 2.07 warns about this, but does the right thing
foo dd format - 0xC000_0000 + 0x3
section .text
main:
push ebp
mov ebp, esp
; do it at runtime
mov eax, format
sub eax, 0xC000_0000
add eax, 3
push eax
; and at assemble/link/relocate time
push dword [foo]
push format
call printf
add esp, 12 ; ("leave" covers us, in this case)
leave
ret
;---------------------
Unlikely as it might seem, it does look like a bug in your debugger. I
notice it draws text incorrectly in many places, cutting off the ends.
> That's what I thought. The assembler can only do calculations on
> offsets -- not the resulting actual address. Thats why EBX in the
> following code gets loaded with a crazy result:
>
> global _start
>
> section .data
> foobar db 54h, 45h, 53h, 54h
foobar is the string "TEST"
> barfoo dd foobar
barfoo points to the string
> foo dd foobar - 0xC0000000 + 3
foo points 0xBFFFFFFD bytes away from the string. We do not know what is
in that part of the address space.
> section .text
> _start:
> mov ecx, foobar
Loads the aptly-sized string into ecx.
> mov edx, [barfoo]
Uses the pointer to the string to load it into edx.
> mov ebx, [foo]
Uses the pointer in foo to load the long word at the address 0xBFFFFFFD
bytes lower than the address of the string, where we haven't put any
useful data.
My diagnosis is that you have not grasped how a linker does relocations
and how the necessary data is stored in the object file. Allow me to
explain:
The assembler keeps a number of sections, the most common ones being
data and text. Each section is a contiguous block of bytes which are
defined by the code being assembled. The data bytes after your "section
.data" directive go into the data section in the order they are found in
the source, packed together. The instructions go in the text section.
While the assembler assembles the source, it keeps track of a "current
position" for each section. This position is how far into the section
the next byte will go, after which the current position is incremented.
The current position of each section starts at 0.
When a symbol is introduced in the code, the assembler notes the current
position, and puts the symbol name together with the position, that is
the offset into the section, or in other words the unrelocated address,
in the symbol table.
When a symbol is referenced, the assembler finds its name in the table,
and uses the value found with it.
Now, most assemblers allow addresses to be specified not only by a
simple symbol name, but also by a symbol plus an additional numerical
offset. Thus, you could put a struct in the data section and reference
its members:
.data
some_struct:
.long 1337
.long 4711
.long 0xDEADBEEF
.text
movl some_struct+0, %eax
movl some_struct+4, %ecx
movl some_struct+8, %edx
Say some_struct is the first thing in your data section. It gets
assigned address 0. This is its unrelocated address, or in other words,
its offset into the data section.
The three movl instructions thus reference unrelocated addresses 0, 4,
and 8, respectively. Instead of 4 or 8 you could type any number you
like, such as 0xC0000000 + 0x3. The assembler outputs into the encoding
of the instructions in the object file the resulting offsets from the
start of the data section.
The other thing the assembler does is record in a relocation table the
places in the text section where it put those unrelocated values, and
also to what section they are relative (the data section, that is). That
table, too, goes in the object file, for the benefit of the linker.
When the object file gets linked (and possibly when it gets loaded), the
linker (and loader) goes through the entries in the relocation table,
and adds the actual base address of the appropriate section to the
unrelocated values recorded in the object file. This is what is meant by
"relocation".
Thus, even though the assembler never knew where some_struct would end
up being loaded, it knew the offset of some_struct from the start of the
data section, and it knew what that offset plus 4 was, and what the
offset plus 8 was, and encoded those values in the assembled
instructions in the output file, together with a note for the linker to
fix those values so they point to the right place after the linker
decides where in memory the data section is supposed to go.
And that's how you relocate a computer program.
I suspect you knew most of this but hadn't gotten a clear picture of it
yet. I hope this rather longish description helped.
Evan's Debugger:
http://www.codef00.com/projects.php#debugger
Nathan.
Ah! Now I get it. The above paragraph is the part that I did not
know. I had always thought that the relocation table was a vestigial
item from the DOS days and that the loader would simply ignore it.
> When the object file gets linked (and possibly when it gets loaded), the
> linker (and loader) goes through the entries in the relocation table,
> and adds the actual base address of the appropriate section to the
> unrelocated values recorded in the object file. This is what is meant by
> "relocation".
>
> Thus, even though the assembler never knew where some_struct would end
> up being loaded, it knew the offset of some_struct from the start of the
> data section, and it knew what that offset plus 4 was, and what the
> offset plus 8 was, and encoded those values in the assembled
> instructions in the output file, together with a note for the linker to
> fix those values so they point to the right place after the linker
> decides where in memory the data section is supposed to go.
>
> And that's how you relocate a computer program.
>
> I suspect you knew most of this but hadn't gotten a clear picture of it
> yet. I hope this rather longish description helped.
>
Yes it does. Thanks.
Nathan.
Ok, you found the issue, but I think you made some assumptions about what
the "newbie" did...
Your 'format' is located *prior* to it's usage. The OP's 'foo' was located
*after* it's usage. Calculating backward references is easy, such as for
'format'. Forwards references usually require two passes or backtracking,
such as for 'foo'.
AFAICT, the guy didn't say he was using a .data section... ISTM, you
assumed he was using a .text and also a .data section and that the
calculation was in the .data section. He might only be using a .text.
Finally, he said 32-bit code. So, the code is supposed to be 32-bits. It
tells you nothing about whether he used BITS 16 or BITS 32 code size
directive, or if he used a directive... He only posted a tiny snip of his
code.
Hey, don't roll your eyes at me! :-) He claimed to be a "newbie"...
That's when the odd bugs come out of the woodwork. They don't use it
in the same way that everyone else "knows," so they "test" the dark shadows.
Rod Pemberton
> "Frank Kotler" <fbko...@nospicedham.myfairpoint.net> wrote in message
> news:ie3he4$qpv$1...@speranza.aioe.org...
> ...
> > section .data
> > format db `%X %X\n`, 0
> > ; nasm after 2.07 warns about this, but does the right thing
> > foo dd format - 0xC000_0000 + 0x3
> >
>
> Ok, you found the issue, but I think you made some assumptions about what
> the "newbie" did...
>
> Your 'format' is located *prior* to it's usage. The OP's 'foo' was
> located *after* it's usage. Calculating backward references is easy,
> such as for 'format'. Forwards references usually require two passes or
> backtracking, such as for 'foo'.
>
> AFAICT, the guy didn't say he was using a .data section... ISTM, you
> assumed he was using a .text and also a .data section and that the
> calculation was in the .data section. He might only be using a .text.
The snipped I posted was in a .data section indeed.
> Finally, he said 32-bit code. So, the code is supposed to be 32-bits. It
> tells you nothing about whether he used BITS 16 or BITS 32 code size
> directive, or if he used a directive... He only posted a tiny snip of his
> code.
I'm using BITS 32.
Yes and no. I actually devised my own "test case", based only loosely on
what aws reported, and confirmed that this "bug" (I think it's a bug)
exists.
> Your 'format' is located *prior* to it's usage. The OP's 'foo' was located
> *after* it's usage. Calculating backward references is easy, such as for
> 'format'. Forwards references usually require two passes or backtracking,
> such as for 'foo'.
Nasm makes as many passes as it takes, these days. I don't think you
"agree" with this change, and I concede that you've got a point - we've
fundamentally changed the default behavior of Nasm. On the whole, I
think this is a Good Thing, but I sympathize with different opinions...
> AFAICT, the guy didn't say he was using a .data section... ISTM, you
> assumed he was using a .text and also a .data section and that the
> calculation was in the .data section. He might only be using a .text.
Shouldn't matter. (should it?)
> Finally, he said 32-bit code. So, the code is supposed to be 32-bits. It
> tells you nothing about whether he used BITS 16 or BITS 32 code size
> directive, or if he used a directive... He only posted a tiny snip of his
> code.
I used "bits 32" in my initial "-f bin" test. The code I posted doesn't
say "bits 32", but I specified output formats (-f elf32 and -f win32)
which default to bits 32.
FWIW, I typo'ed it on my first try, wrote "db" instead of "dd" - results
in a different (correct) error message (one byte relocation attempted).
Just for "fun", I just tried it with "dw". "Recent" nasm
(2.10rc2-20101108... there's newer) complains "word data exceeds bounds"
(correct), 2.07 truncates it silently...
> Hey, don't roll your eyes at me! :-)
Oh jeez, my monitor's watching back! :)
> He claimed to be a "newbie"...
> That's when the odd bugs come out of the woodwork. They don't use it
> in the same way that everyone else "knows," so they "test" the dark shadows.
Yup... and we're glad they do!
Best,
Frank
Not at all, it is entirely central to the processes of linking and
loading. Relocation tables were in use long before DOS saw the light of
day, and continue to be used in all modern operating systems.
As far as I know, DOS has not had any significant influence on the
relocation processes of any other operating systems, save possibly its
own descendants in the Microsoft Windows family.
How the loaders in modern Windows versions handle the DOS EXE relocation
table I know nothing of, and quite possibly it is entirely vestigial in
that file format, since it is not a linkable object file format. Indeed,
I don't think DOS ever came with a linkable format, or a linker, or any
programming tools whatsoever beyond a BASIC interpreter and DEBUG...
>> I suspect you knew most of this but hadn't gotten a clear picture of it
>> yet. I hope this rather longish description helped.
>
> Yes it does. Thanks.
It is a pleasure to be of service. :)
But linkers ARE available that produce DOS executables.
Nathan.
It is a proper word to use, even though it is cryptic. The address of
the object has a 'section location' part and an 'offset within that
section' part. -in other words the location value is a vector, meaning
more than one part to its location value.
It is a little bit much to go into, but nasm has alot of flexibility
with regards to 'section' directive. Once one steps outside the
typical .text & .data sections, the flexibility is more apparent.
Other asm's use the the 'group' directive to combine multiple segments
into a group destined for DS for example.
Also see WRT, in the documentation. (I'm not sure if this is current
to the present version).
Steve
The 16 bit real mode MS linker, link.exe, was distributed with MS-DOS
up to roughly the DOS 4.0 timeframe. And the MS-DOS executable
formats are dead ends, not even Windows 1.0 used those (the whole 16-
bit Windows and OS/2 lines used NE). Obviously OS's allowing the
execution of real mode DOS programs obviously need to be able to load
COM and MZ. The DOS formats were not really particularly unique -
obviously COM/BIN/SYS were trivial image files (which had been done
many times before), and the relocation scheme in MZ was about as
simple as you can imagine, and very similar schemes had been done
before (although MZ obviously was specific to x86).
I did not know that, thanks. What object file format did it use?
Of course, and I have used a couple of them. I didn't mean to deny the
existence of existing tools, I just got side-tracked into bashing DOS.
OMF. Which was pretty much the only object format MS used until they
switched to PE (a variant of COFF) for their 32 bit stuff (and it's PE
for both the object and executable formats now). Which is not to say
that OMF couldn't do 32 bit stuff (it could, and was used that way on
occasion), but MS decided to switch. OMF went into a bunch of
executable formats, including COM, MZ, NE, LE (or was that LX?) and
(sometimes) PE. The first versions of the 32 bit linker accepted
either PE or OMF inputs, and had a converter that would convert OMF
files to PE (frankly they still might, but I don’t know). MASM for a
long time was the major MS tool that *didn't* support PE, so any 32
bit assembler was generated in OMF by MASM, and converted to PE by the
linker. Current versions of MASM generate PE directly (I think it was
around the time MASM started generating 64 bit code).
And FWIW, OMF was defined by Intel well before MS stared to use it,
and many Intel development tools used OMF. Of course many development
tools from other folks also supported OMF.
Rare historical document:
http://www.bitsavers.org/pdf/intel/8086/
121748-001_8086_Relocatable_Object_Module_Formats_1981.pdf 13-Mar-2007
12:10 29M
Microsoft wandered from this standard abit iirc in their tool line.
About the time 32bit tools were coming on the scene the industry tried
to consolidate the differences with the Tools Interface Standard,
which went into detail on the OMF records and I think agreement on the
PE (or was it LE) forms. The TIS is still around the net, but this,
above, is the founding document. Reading it gives background sense to
the TIS info. The other thing to keep in mind is that memory was so
expensive back then that the ibm pc was launched with only 128k of
ram, and the expensive upgrade was 256k of ram, even though the 8086
cpu could address 1 mb, the original PC motherboard was only designed
to address 512k of ram, that is all it was socketed for, for ram
chips. Those restrictions forced incremental builds and component
sources into object modules to be combined by a linker, as a matter of
necessity.
Steve
The original PC was socketed for 256MB on the motherboard, and
additional memory could be plugged into the ISA slots. Many shipped
with less than 256KB, and could be bought with as little as 16KB (the
motherboard could accept either 16Kx1 or 64Kx1 DRAMs). The 16KB
configuration was not particularly useful, but was fairly common as
the retailers would often sell the customer cheaper non-IBM memory.
> Those restrictions forced incremental builds and component
> sources into object modules to be combined by a linker, as a matter of
> necessity.
Not really. The compile+link cycle has a really long history, and is
fairly fundamental to the development of large software projects.
Many compilers on earlier systems ran in considerably less than 256KB,
so the PC wasn't really that constrained for compilers. And a single
pass system was certainly possible, Turbo Pascal is a good example -
it produced executables directly, and ran in only 64KB. And given the
relative complexities of OMF and COM or MZ, it’s likely the compilers
would have been smaller had they emitted the executable directly.
For archive purposes, let us note that Robert meant 256KB, not 256MB.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.
Of course I did... Can I claim that it's the keyboard's fault since
the M key is right next to the K key?