new 80386 assembler

42 views
Skip to first unread message

Paul Edwards

unread,
Jan 27, 2022, 12:16:08 PMJan 27
to
There is a C compiler called SubC which is a now,
sort of, a usable subset of C90.

I am interested in modifying it to emit intel syntax
instead of the current AT&T syntax which is fed
into gnu as.

At the same time I would like to write an assembler
that takes that emitted source and produces a.out
object code, and later a linker to produce a.out
executables. I'm after simplicity at the moment.

I would like the assembler code above to be able
to be assembled with Microsoft's assembler.

The example of Microsoft code I have at the moment
looks like this:

.386

.model flat,c

_DATA segment dword 'DATA'
_DATA ends
_BSS segment dword 'BSS'
_BSS ends

_TEXT segment 'CODE'


public __setj
__setj proc env:dword
mov eax, env
push ebx


My expectation is that the "proc" would not be
supported by my assembler because it would not
be generated assembler. I would have a label instead.

My understanding is that Microsoft now supports
simplified directives of .code (or is it .text?) and .data.
I assume there is also a .bss. When did Microsoft start
supporting this syntax? I'm wondering whether I should
stick to the "segment" syntax.

The assembler will be public domain.

I have my own build of binutils 2.14a and I might be able
to modify "as" so that it also handles the above assembler
format. E.g. by making the ".intel_syntax" directive a
command line option.

I'm only vaguely aware of what I am trying to do.

What is a sensible goal? All the masm-compatible
assemblers that I am aware of, like wasm, are
copyrighted, and I would like to introduce
something, anything, that is public domain.

Thanks. Paul.

Frank Kotler

unread,
Jan 27, 2022, 5:07:16 PMJan 27
to
...
> I'm only vaguely aware of what I am trying to do.

No offense intended, Paul, but this sounds like a bad idea!

Best,
Frank

Paul Edwards

unread,
Jan 27, 2022, 6:14:04 PMJan 27
to
On Friday, January 28, 2022 at 9:07:16 AM UTC+11, Frank Kotler wrote:
> ...
> > I'm only vaguely aware of what I am trying to do.

> No offense intended, Paul, but this sounds like a bad idea!

Can you elaborate?

Thanks. Paul.

Frank Kotler

unread,
Jan 28, 2022, 12:42:44 AMJan 28
to
Well, yeah. If you're going to try to create another assembler, on top
of all the assemblers out there, I think you ought to have a firm idea
what you want to do. You actually seem to have a pretty good idea. You
want it to be Masm syntax. (why?) And you want it to be Public Domain.
Some people like Public Domain. I guess you do. Some people will tell
you that Public Domain is like freeing your slaves but allowing someone
else to enslave them again. I am not a lawyer.

If that's what you want to do, fine. You were the one who said you
didn't have a clear idea...

For example:
Masm:
mov eax, offset foo ; address
mov eax, foo ; contents
Nasm:
mov eax, foo ; address
mov eax, [foo] ; contents

Is there a reason to prefer one over the other? Perhaps so.
Just random? That's what doesn't seem like a good idea to me. Maybe you
just don't like AT&T? (G)as and Yasm will accept either Intel or AT&T.

I like Nasm, but I don't have a good reason why. But then, I'm not
writing it...

That's all I meant...

Best,
Frank

Paul Edwards

unread,
Jan 28, 2022, 4:07:59 AMJan 28
to
On Friday, January 28, 2022 at 4:42:44 PM UTC+11, Frank Kotler wrote:

> Well, yeah. If you're going to try to create another assembler, on top
> of all the assemblers out there, I think you ought to have a firm idea
> what you want to do. You actually seem to have a pretty good idea. You
> want it to be Masm syntax. (why?) And you want it to be Public Domain.

I was brought up with Microsoft being the industry
standard and everyone looking for compatibility with
them.

For my MSDOS programming I have written my code
such that it assembles with tasm and wasm, with the
belief that it will also assemble on masm, but not
actually testing that.

Now I am doing something new - writing an OS in C.
Actually I've already done that (PDOS/386), but now
I have a new design, PDOS-generic, and I'm on the
verge of having a new compiler, SubC. But SubC
generates AT&T syntax and depends on the rest of
the GNU toolchain.

But this switch of toolchain is just a new idea I had.
It is only in the last week or so that SubC has sort of
become viable, and only in the last few days that I
found out that a simple assembler/archiver/linker
is only 2500 lines of code in total (based on the DOS
one that comes with SubC), so something I may
be able to achieve.

If I combine this with msged's ability to import and
export text files, I should be able to put together an
entirely public domain development environment.
This would allow freeware people to have a base to
work with instead of contributing to copyrighted
projects, and unrestricted commercial derivatives.

Part of competing with Microsoft's monopoly.

BFN. Paul.

Kerr-Mudd, John

unread,
Jan 28, 2022, 4:14:54 AMJan 28
to
On Fri, 28 Jan 2022 00:42:53 -0500
Frank Kotler <fbko...@myfairpoint.net> wrote:

> On 01/27/2022 06:14 PM, Paul Edwards wrote:
> > On Friday, January 28, 2022 at 9:07:16 AM UTC+11, Frank Kotler wrote:
> >> ...
> >>> I'm only vaguely aware of what I am trying to do.
> >
> >> No offense intended, Paul, but this sounds like a bad idea!
> >
> > Can you elaborate?
> >
>
> Well, yeah. If you're going to try to create another assembler, on top
> of all the assemblers out there, I think you ought to have a firm idea
[]
>
> Is there a reason to prefer one over the other? Perhaps so.
> Just random? That's what doesn't seem like a good idea to me. Maybe you
> just don't like AT&T? (G)as and Yasm will accept either Intel or AT&T.
>
> I like Nasm, but I don't have a good reason why. But then, I'm not
> writing it...
>
I use Nasm; it took me a little while (mostly learning the macro language) and I really don't like that they force [segreg:reg] e.g.

mov al,[es:di] ; nasm
mov al,es:[di] ; no good under nasm

I'm a lot happier not to have to specify
assume cs:code, ds:code
like you do in Masm *every time*!

--
Bah, and indeed Humbug.

Rod Pemberton

unread,
Jan 29, 2022, 2:53:10 PMJan 29
to
On Thu, 27 Jan 2022 09:16:07 -0800 (PST)
Paul Edwards <muta...@gmail.com> wrote:

> There is a C compiler called SubC which is a now,
> sort of, a usable subset of C90.

In regards to using SubC or another C compiler,
see my reply on a.o.d.

Just an FYI, Nils M Holm, the author of SubC is a
citizen of Germany. Germany's legal system doesn't
have a public domain concept, which is a U.S. legal
system concept used to represent works of the U.S.
government owned by it's citizens. The concept of
public domain has never been tested for individuals
in the U.S. and may not be legal. I.e., everything
produced by Nils is automatically copyrighted under
German laws, and the same is probably true for you
as well as an Australian citizen ...

I didn't bring up copyrights again to start an
argument, but just so that you could compare with
the open source licensed software below, which are
"irrelevant" if you're not modifying their code, i.e.,
if you're just using the software.

> I am interested in modifying it to emit intel syntax
> instead of the current AT&T syntax which is fed
> into gnu as.

Ok.

> At the same time I would like to write an assembler
> that takes that emitted source and produces a.out
> object code, and later a linker to produce a.out
> executables. I'm after simplicity at the moment.

You might look at JWasm, Japheth's replacement
version of OpenWatcom's WASM, which is a
MASM-compatible assembler. It's not public domain,
but is open source (Sybase license).

https://www.japheth.de/JWasm.html

Personally, I prefer NASM's syntax to MASM or
AT&T/GAS. I'm the least comfortable with MASM.
YASM also uses NASM syntax, which is supposedly
like TASM's ideal-mode syntax. NASM and YASM
are both open source, under 2- and 3- clause
BSD licenses. BSD and MIT licenses are usually
about as close to public domain as you can get.

https://www.nasm.us/
https://yasm.tortall.net/


--
Biden is proving that Trump was correct all along.

Paul Edwards

unread,
Jan 29, 2022, 3:57:55 PMJan 29
to
On Sunday, January 30, 2022 at 6:53:10 AM UTC+11, Rod Pemberton wrote:

> Just an FYI, Nils M Holm, the author of SubC is a
> citizen of Germany. Germany's legal system doesn't
> have a public domain concept, which is a U.S. legal
> system concept used to represent works of the U.S.
> government owned by it's citizens.

It's also what happens to all copyrighted work 70 years
after the death of the author. Not just in the US either.
What did you think the legal status of Shakespeare's
Hamlet was in Germany? GPL v2?

What is different in Germany etc is whether you can
release your work into the public domain with a PD
notice. Nils has CC0 to cover that in jurisdictions
where that is an issue. As do I for PDOS.

> The concept of
> public domain has never been tested for individuals
> in the U.S. and may not be legal.

The fact that no individual has ever released their work
to the public domain and then turned around and attempted
to sue someone for copyright infringement is a good track
record, not something to be concerned about.

> I.e., everything
> produced by Nils is automatically copyrighted under
> German laws, and the same is probably true for you
> as well as an Australian citizen ...

Even if things are copyrighted by default, that doesn't
mean you need to make matters worse by explicitly
including a copyright statement.

Even if a jurisdiction doesn't recognize releasing things
to the public domain as valid, that doesn't mean there is
something wrong with writing those words regardless,
to show what you really want to do, even if some
jurisdictions ignore it.

> I didn't bring up copyrights again to start an
> argument, but just so that you could compare with
> the open source licensed software below, which are
> "irrelevant" if you're not modifying their code, i.e.,
> if you're just using the software.

It's not "irrelevant" to me. I don't trust that a court
won't find that one of the 273 license conditions in
GPL was violated by me using the software. Only a
court can decide that, repeatedly. Until the court
decision is overturned, repeatedly.

But it's irrelevant anyway - if I discover a problem with
software I am using, I want to be able to debug it and
fix it and recoup my costs by distributing a closed-source
version of the software that is better than the original.

> > At the same time I would like to write an assembler
> > that takes that emitted source and produces a.out
> > object code, and later a linker to produce a.out
> > executables. I'm after simplicity at the moment.

> You might look at JWasm, Japheth's replacement

That doesn't produce a.out:

native support for output formats Intel OMF, MS Coff (32/64-bit), Elf (32/64-bit), Binary, Windows PE (32/64-bit) and DOS MZ

But again, I'm after a public domain assembler, and it
sounds like mine is going to be the best one on the
market after I've written 1 line of code.

> NASM and YASM
> are both open source, under 2- and 3- clause
> BSD licenses. BSD and MIT licenses are usually
> about as close to public domain as you can get.

Besides actual public domain/CC0, like mine will be.
Mine might be based on Nils's public domain 8086
assembler though.

BFN. Paul.

Rod Pemberton

unread,
Jan 29, 2022, 3:59:20 PMJan 29
to
On Thu, 27 Jan 2022 09:16:07 -0800 (PST)
Paul Edwards <muta...@gmail.com> wrote:

> There is a C compiler called SubC which is a now,
> sort of, a usable subset of C90.
>
> I am interested in modifying it to emit intel syntax
> instead of the current AT&T syntax which is fed
> into gnu as.
>

There appears to be a limited 8086 assembler
s86 included with Subc.

> At the same time I would like to write an assembler
> that takes that emitted source and produces a.out
> object code, and later a linker to produce a.out
> executables. I'm after simplicity at the moment.
>
> I would like the assembler code above to be able
> to be assembled with Microsoft's assembler.
>

It appears you need to change the code in:

cgsynth() in files in src/targets/cg/
cgsynth() in src/cg.c
ngen() sgen() lgen() etc in src/gen.c

Most of this appears to be simple text strings.
So, you just need to replace with equivalent syntax.

Rod Pemberton

unread,
Jan 29, 2022, 4:18:08 PMJan 29
to
On Sat, 29 Jan 2022 12:57:55 -0800 (PST)
Paul Edwards <muta...@gmail.com> wrote:

> On Sunday, January 30, 2022 at 6:53:10 AM UTC+11, Rod Pemberton wrote:

> > You might look at JWasm, Japheth's replacement
>
> That doesn't produce a.out:
>

What does?

> native support for output formats Intel OMF, MS Coff (32/64-bit), Elf
> (32/64-bit), Binary, Windows PE (32/64-bit) and DOS MZ

There are object converters. Maybe, there is one for a.out?

I know of GNU objcopy and Agner Fog's objconv. On 64-bit Linux,
objcopy lists "a.out-i386-linux". a.out isn't listed for objconv.

Paul Edwards

unread,
Jan 29, 2022, 7:55:38 PMJan 29
to
On Sunday, January 30, 2022 at 7:59:20 AM UTC+11, Rod Pemberton wrote:

> > There is a C compiler called SubC which is a now,
> > sort of, a usable subset of C90.
> >
> > I am interested in modifying it to emit intel syntax
> > instead of the current AT&T syntax which is fed
> > into gnu as.
> >
> There appears to be a limited 8086 assembler
> s86 included with Subc.

Yes, that's what made me realize that I should be able
to write an 80386 assembler too. It's only 1400 lines
of code.

>> That doesn't produce a.out:

> What does?

The binutils included with EMX 0.9d does, which is where
I was introduced to a.out in the first place, and saw that it
was simple.

So I built binutils 2.14a myself with the a.out target, and still
use it today for PDOS/386. I also built the PE target which
PDOS/386 now also supports.

BFN. Paul.

Rod Pemberton

unread,
Jan 30, 2022, 1:09:25 AMJan 30
to
On Sat, 29 Jan 2022 16:55:37 -0800 (PST)
Paul Edwards <muta...@gmail.com> wrote:

> On Sunday, January 30, 2022 at 7:59:20 AM UTC+11, Rod Pemberton wrote:

> > > There is a C compiler called SubC which is a now,
> > > sort of, a usable subset of C90.
> > >
> > > I am interested in modifying it to emit intel syntax
> > > instead of the current AT&T syntax which is fed
> > > into gnu as.
> > >
> > There appears to be a limited 8086 assembler
> > s86 included with Subc.
>
> Yes, that's what made me realize that I should be able
> to write an 80386 assembler too. It's only 1400 lines
> of code.
>

Well, I created a disassembler, which very closely
matches NDISASM's output (comes with NASM), but only
for 16-bit code, in about 250 lines of C. It is much
faster than NDISASM, but it can't do 32-bit, 64-bit,
mixed-mode code, etc. I haven't been able to do a full
brute force comparison of the entire 16-bit x86 instruction
set between the two because of size of the output.

A GAS syntax style disassembler (again, not an assembler)
called 486DIS (486dis_c.zip) is only about 1500 lines
of C code. I worked on updating it for a while, but
never released my fixes. It has a very simple design,
and used char arrays of strings for the instructions.
It was very easy to modify. It was for DOS, open source.

486dis_c.zip is on Simtel. Maybe, there is a decently
designed assembler there too, which could provide ideas.

disassemblers
(index)
http://www.lanet.lv/simtel.net/msdos/disasm-pre.html
(files)
https://ftp.sunet.se/mirror/archive/ftp.sunet.se/pub/simtelnet/msdos/disasm/

assemblers
(index)
http://www.lanet.lv/simtel.net/msdos/asmutl-pre.html
(files)
https://ftp.sunet.se/mirror/archive/ftp.sunet.se/pub/simtelnet/msdos/asmutl/

Paul Edwards

unread,
Feb 24, 2022, 2:08:54 PMFeb 24
to
Someone else (from Slovakia) volunteered to write the
assembler, but wanted AT&T syntax instead of masm,
which is certainly fine for now. It can produce a.out or
coff object code. You can find the code here:

https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdas/

We're still debugging it so that it can assemble the output
of SubC. It currently assembles it, but there are problems
with the generated code.

BFN. Paul.

Paul Edwards

unread,
Feb 24, 2022, 2:10:56 PMFeb 24
to
On Friday, January 28, 2022 at 8:14:54 PM UTC+11, Kerr-Mudd, John wrote:

> I'm a lot happier not to have to specify
> assume cs:code, ds:code
> like you do in Masm *every time*!

Are you sure you need to specify that in masm?

masm is happy with this code:

https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/winsupa.asm

BFN. Paul.

Kerr-Mudd, John

unread,
Feb 24, 2022, 3:37:25 PMFeb 24
to
Ah, well I was/am talking about early masm (2.0?)

Paul Edwards

unread,
Mar 16, 2022, 3:42:22 AMMar 16
to
On Friday, February 25, 2022 at 6:08:54 AM UTC+11, Paul Edwards wrote:

> https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdas/
>
> We're still debugging it so that it can assemble the output
> of SubC. It currently assembles it, but there are problems
> with the generated code.

As of a few hours ago, the debugging has been done and
we now have an 80386 assembler capable of handling
SubC recompiling itself, so we have a totally public domain
development suite, limited to the subset of C90 that SubC
supports.

BFN. Paul.
Reply all
Reply to author
Forward
0 new messages