A simple question I've never heard a sufficient answer to:
Why is it that, in assembly language, symbols used in C
always have an underbar i.e. _ prefixed onto them?
For instance, in nasm I have to call _strcpy, not strcpy.
Thanks.
It is so that an assembler for which, say, eax is a special symbol,
won't choke on a C variable named "eax". It also allows the C library
and C compiler to hide symbols that should not be visible to user
programs, like compiler-generated labels.
Incidentally, not all C ABIs use this convention, the Unix/ELF ABI
doesn't, for example. Instead, it relies on the assembler being able to
escape any special symbols, e.g. in NASM you can write "$eax" to
generate a symbol called "eax", and in gas registers have to be prefixed
with %, so "eax" is a symbol while "%eax" is a register. Unix/ELF
systems tend to use dots in symbols that should be hidden from C.
-hpa
I will describe it slightly differently: readability helps and the
fact that it helps in searching for specific things also applies.
Also, it may be that older C compilers modified their functions to
do more or provide something a little different. That might involve
the fact that a C compiler commonly provided their own libraries to
call functions from. The OS may sometimes have its own library too.
One last thing that I recall specifically about Windows, though,
involves the fact that I believe Microsoft commonly prefixed their
functions all with an underscore, perhaps to avoid lawsuits and
copyright infringements with other companies (Commodore, Motorola,
Atari, Apple). Perhaps they made their compiler to automatically
prefix ALL the functions in that manner when they were stuck in
an object file? Someone else needs to help out as I did not do too
much programming back then, except for playing with debug.exe.
I think the best answer though lies in the fact that when people
write a page of code, variable names do not always end up with the
same name, and not only to differentiate from registers, the eyes
and mind of a person wants (needs) to know that they are playing
with a function name Scope() and not with a DEFINED constant with
a name of SCOPE. Hungarian prefixing did not become too popular
until later on. It is very easy for a human eye/mind to confuse
variable names. Simple variables like i, j, k commonly only get
used in loops as counters (microprocessors have counter registers
but they are named CL, CH, CX, ECX and RCX now for the Intel
processors).
Someone that knows if the 8- and 16-bit registers are available
on the newer Intel processors (64-bit) needs to help out as I
still run on a 32-bit AMD Athlon.
How your code reads, basically depends upon what you like and
what works best for you. One more to thing to think about, also
includes that you want to be able to FIND a constant with the
same name as a variable and the constants commonly get listed
prior to even the function prototypes, thus enabling someone to
quickly find what they want to find. It ends up easier to read
a header file with the defined constants rather than starting
a search for the material in the actual file that holds the
code.
The underbar is actually called an underscore (sometimes an
underline). I do not think I heard it defined as an underbar.
Hope this helps.
--
Jim Carlock
True, but only locally true, I think. I suspect that the above
rationale was exploiting an existing, older convention. The underscore
name mangling has been in UNIX since forever. I found this rationale
at
http://www.iecc.com/linker/linker05.html
and it seems to ring true with my memories:
"In older object formats (before maybe 1970), compilers used names
from the source program directly as the names in the object file,
perhaps truncating long names to a name length limit. This worked
reasonably well, but caused problems due to collisions with names
reserved by compilers and libraries.
[snip]
The approach taken on UNIX systems was to mangle the names of C and
Fortran procedures so they wouldn't inadvertently collide with names
of library and other routines. C procedure names were decorated with a
leading underscore, so that main became _main. Fortran names were
further mangled with both a leading and trailing underscore so that
calc became _calc_. (This particular approach made it possible to call
C routines whose names ended with an underscore from Fortran, which
made it possible to write Fortran libraries in C.) The only
significant disadvantage of this scheme is that it shrank the C name
space from the 8 characters permitted by the object format to 7
characters for C and six characters for Fortran. At the time, the
Fortran-66 standard only required six character names, so it wasn't
much of an imposition."
-- Charles
> [snip] Perhaps they made their compiler to automatically
> prefix ALL the functions in that manner when they were stuck in
> an object file?
Yes, you need to find legacy docs to read of this.
"
Linking Microsoft C with Microsoft Assembler
Naming conventions. In Microsoft C and assembler, the assembly
modules must use a naming convention for segments and variables that
is compatible with that in C. All assembler references to functions
and variables in the C module must begin with an underscore (_).
Further, because C is 'case sensitive', the assembly module should use
the same case (upper or lower) for any variable names in common with
the C module.
"
This doesn't really answer the OP as to why. -Other than to say
_they_ did it that way.
As I recall, watcom used an underscore prefix and suffix, at least on
some versions.
Steve
[snip]
"H. Peter Anvin" <h...@MUNGED.microcosmotalk.com> wrote in message
news:4b17e879$0$4951$9a6e...@unlimited.newshosting.com...
I thought it was to avoid name clashes with certain other non-C languages,
such that, for example, C, ASM, Fortran, ... would not so easily step on
each other...
it is worth noting that Win64 also no longer uses '_' on names, and AFAIK
MASM has not adopted any special mangling.
now, as for whether or not it is possible to confuse the assembler by
compiling C code which contains globals/functions/... which clash with
register names, I don't know. oddly, I had not thought of this before...
> -hpa