--David--
"How to write an assembler" is much to much information to even
summarize in a news posting. Even a simple assembler needs some kind
of tokenizer, some kind of name lookup system (even if you don't
support user defined symbols), some representation of the legal
instruction set of the target machine, and a variety of less well
defined functions. You probably also need file I/O and expression
parsing. A real assembler should also have macro processing and ought
to have its own internal memory management system to support the
complex dynamic memory requirements of all those other modules.
Certainly an assembler CAN be written in assemby; However, it
shouldn't be.
It may be a good idea to have a look at the source code of some other
assemblers, just to see how they done it:
NASM: http://www.web-sites.co.uk/nasm/ (C source)
TMA: ftp://ftp.cdrom.com/pub/simtelnet/msdos/asmutl/ta980705.zip (asm
source)
GASM: I don't remember, search for it. (C source)
> and can it be done in assembly itself?
Ben there, done that (or 90% of it, I never finished it). I'd say:
DON'T, it's precisely the wrong thing to write in assembly.
Actually I'm doing it again (writing an assembler), in "fairly portable"
C this time, it's a choice I don't regret.
I'l recommend C or C++, or maybe even pascal (if you can stand it, I
can't).
It may also be possible to use parser generators like Bison/yacc, al
true I personally don't like them mush (or more precisely, *they* don't
like my assemblers syntax).
Good luck.
--
Leif
*** True freedom is when you don't need to know that time it is. ***
Thanks, i am having a serious look at NASM at the moment...
--David--
Thanks for the reply, I realise it is a big task that i asked. Just a few
more questions, when an instruction is assembled e.g. MOV, it is converted
into a hex equivalent(the opcode) and then this is changed into binary? is
this correct? And i also want to know how the file is created or found by an
OS(windows)...
Thanks for anyone who can help me out.
--David--
I've been wanting to write an assembler myself for a long time.
There are so many things that I would like to improve in the assemblers that
I know.
In fact the only thing that kept me from doing it was that building a list
of all symbols, opcodes, arguments etc. would just take too much time.
> and can it be done in assembly itself?
Of course. EVERYTHING can be written in assembly (at least with the right
assembler).
Generally (as with most other people asking questions in this group) you
shouldn't write a program if you don't know how to do it. In this case, if
you don't know the structure of program executables, the relations between
asm code and opcodes and know nothing about cpu architecture you should read
about it before coding anything.
If you already know all about these topics you should already know just
about all you need to know to write your own assembler.
So... read, read, read... no pain no gain.
While I agree with the posters who don't think assembly is the best
choice of language, certainly it *can* be done - the original one must
have been :)
It's a debugger, but David Lindauer's GRDB has both assembler and
disassembler functions, written in Tasm syntax, IIRC. It might give you
some ideas.
http://www.ladsoft.com
I think Spasm is written in Spasm, too. It's for Windows, tho, and
probably not as simple as you're looking for.
Best,
Frank
> more questions, when an instruction is assembled e.g. MOV, it is converted
> into a hex equivalent(the opcode) and then this is changed into binary? is
> this correct?
It is usually a mistake to think about numbers in a computer being
represented in "hex" or "binary". You should just think of them as
numbers. When the user puts a number in or gets a number out, THAT
usually happens in decimal or hex. The program must translate the
number between its "just a number" form and the sequence of ascii
characters that represent the number in decimal or hex.
In fact, the internal representation of the numbers in the computer
is based on binary numbering, but for most instructions that occurs
below the level of the instruction. So it is much less confusing to
think of them as just numbers, rather than as binary numbers.
As for changing MOV to a hex opcode and then changing that into
binary, this is not a reasonable sequence for an assembler.
When building an assembler, you probably build some table of text
that describes the instruction set in a consistent manner. (The NASM
source code has a good example of such a table). In that table, the
opcodes are probably represented in hex. That table will be processed
by one or more of the following tools: A text preprocessor, a
compiler, and assembler, the initialization code of the new assembler
it is part of. One of those tools will translate the hex into the
"just a number" form used internally.
When your assembler recognizes "MOV" (and the particular variant of
MOV used in a specific instruction) it finds the opcode in its
instruction table. It is not in hex at that point, it is just a
number. The assembler doesn't change it to binary; It just "emits"
it as one or more bytes of its output stream.
MOV is a two state pattern within the computers memory.
It can be replaced by another pattern by the assembler.
In the case of the x86 the actual pattern of the MOV
instruction depends on the other opcodes as well.
To 'see' those patterns a program has to be written that
converts them to form that can be displayed.
They can be displayed however you like.
dec hex binary asc
77 4D 00101101 M
79 4F 00101111 O
86 56 01010110 V
To answer you first question, no it is not converted to
its hex or binary equivalent but may be displayed as such
if you want. It is converted to another set of two state
patterns that the cpu will interpret as an instruction.
As regards your second question this depends on what os
and what type of program file you are talking about.
When you type the programs name at the DOS prompt (or
click its icon) the os has to load the file into memory
and perhaps modify it according to the address it decides
to copy the program to, then call XXXX where XXXX is the
start address of the program.
As regards a question you asked earlier,
> Hello, could anyone tell me how to write an assembler
> (a very simple one)-
Yes I could having written a very simple editor/assembler
for the C64 a much simpler machine (and more fun) then the PC.
Writing a simple editor/assembler for some of the x86
instructions would be a very good learning exercise.
If its simple why not do it in assembler.
--
John.
>
>> Hello, could anyone tell me how to write an assembler (a very simple one)-
>
> I've been wanting to write an assembler myself for a long time.
> There are so many things that I would like to improve in the assemblers that
> I know.
> In fact the only thing that kept me from doing it was that building a list
> of all symbols, opcodes, arguments etc. would just take too much time.
Having written several assemblers, I can assure you that these operations
you listed above are a tiny, tiny, part of the assembler.
Handling expressions, allowing higher level data types (e.g., structs),
and the grammar for the hundreds of instructions is where the real work is.
Of course, if you want to add macros and conditional assembly, that's
a real bear, too.
Randy Hyde
>
> Certainly an assembler CAN be written in assemby; However, it
> shouldn't be.
>
Like any other program, there are parts of an assembler that could benefit
by being written in assembly and parts that probably would not.
Like any language translation system (that doesn't do optimization), the
majority of the time spent in translation occurs in the lexical analyzer.
(I'd suspect that about 75%, or more, of the time is spent in scanning.)
Therefore, one can tremendously improve the performance of the scanner by
writing it in assembly. Even die-hard compiler writers recognize this
(most don't, however, since optimization currently reigns as the greatest
cycle-eater in a compiler; but that doesn't apply to [current] assemblers).
Because the scanner is logically distinct from the rest of the system, with
a very simple interface, it is real easy to code the scanner in assembly,
test it, and get it to work with the system. I don't even recommend writing
it in a HLL first, since the process is so straight-forward (I have written
many scanners in assembler, so this is from experience).
There is little benefit to writing the parser and code generator in
assembly, since so little time is spent in these phases of assembly.
BTW, using memory-mapped files for input and output *really* improves
performance of the scanner.
Randy Hyde
Hand coded, byte per byte to the ROM/punch card/whatever.
Good Luck
--
Alexei A. Frounze
alexfru [AT] chat [DOT] ru
http://alexfru.chat.ru
http://members.xoom.com/alexfru/
http://welcome.to/pmode/