Smaller C

Alexei A. Frounze

unread,

Nov 22, 2012, 8:02:48 AM11/22/12

to

I've been working on a simple C compiler lately and here's what I've got so far. It's a fun project, I must tell. :)

Steve and Rod may be interested to take a look.

Code (ugly in some places and ways, but apparently functional):
https://github.com/alexfru/SmallerC

Alex

dos...@googlemail.com

unread,

Nov 22, 2012, 1:19:14 PM11/22/12

to

Hi Alexei,

at least we learn what you are doing now ;-)

Maybe these links are helpful for your project:
http://www.cs.utexas.edu/users/tbone/c--/
http://c--sphinx.narod.ru/indexe.htm

or Jean-Marc's project:
https://github.com/cod5/cod5#readme

Georg

Alexei A. Frounze

unread,

Nov 22, 2012, 5:33:50 PM11/22/12

to

Hi Georg!

How do you know what's going to be helpful? I didn't even ask for help nor did I describe any problems. You must be all- seeing and knowing, no less! :)

Alex

Rod Pemberton

unread,

Nov 22, 2012, 9:26:05 PM11/22/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message
news:1debc959-af42-490a...@googlegroups.com...

Time becomes scarce this time of year because of the Holidays, but I'll take
a quick look. The intro notes indicate you've implemented quite a bit of
stuff!

On a personal note, it seems that you're alive! :-) We hadn't heard for
you in such a long time, I was wondering if you did something to annoy
micro-dictator Putin, like be in a band, sing songs, be silly, and dance
around in public ... ;-) That's definately an offense worthy of the
"Gulag"! (sarcasm) Doesn't allowing Russian nationals to work in the US or
emigrate here "undermine the moral foundations" of Russia too? She passed
away so we can no longer ask Stalin's daughter... ;-) Politicians are
idiots! What difference does it make if a nobody makes a fool of themself
in public? What difference does it make if you were educated in the US, EU,
or Russia? What difference does it make if you decide to live in another
country? The will of the people will be done. It's always just a matter of
time. It will happen in China someday too.

Rod Pemberton

Alexei A. Frounze

unread,

Nov 23, 2012, 3:03:30 AM11/23/12

to

On Thursday, November 22, 2012 6:21:35 PM UTC-8, Rod Pemberton wrote:
> "Alexei A. Frounze" wrote in message

>
> news:1debc959-af42-490a...@googlegroups.com...
>
> > I've been working on a simple C compiler lately and here's what I've got
>
> so far. It's a fun project, I must tell. :)
>
> >
>
> > Steve and Rod may be interested to take a look.
>
> >
>
> > Code (ugly in some places and ways, but apparently functional):
>
> > https://github.com/alexfru/SmallerC
>
> >
>
>
>
> Time becomes scarce this time of year because of the Holidays, but I'll take
>
> a quick look. The intro notes indicate you've implemented quite a bit of
>
> stuff!

Yeah, it's a bit surprising even for myself how much can be done in 6KLOC/160KB of code in about 3 months, considering this is my first compiler ever. :)

> On a personal note, it seems that you're alive! :-) We hadn't heard for
>
> you in such a long time, I was wondering if you did something to annoy
>
> micro-dictator Putin, like be in a band, sing songs, be silly, and dance
>
> around in public ... ;-) That's definately an offense worthy of the
>
> "Gulag"! (sarcasm) Doesn't allowing Russian nationals to work in the US or
>
> emigrate here "undermine the moral foundations" of Russia too? She passed
>
> away so we can no longer ask Stalin's daughter... ;-) Politicians are
>
> idiots! What difference does it make if a nobody makes a fool of themself
>
> in public? What difference does it make if you were educated in the US, EU,
>
> or Russia? What difference does it make if you decide to live in another
>
> country? The will of the people will be done. It's always just a matter of
>
> time. It will happen in China someday too.
>

I was pretty busy with work and at the same time the group seemed to had gone much quieter and I didn't feel I could contribute much other than perhaps answering the same OS/BIOS/hardware/asm/C questions all over again. And then recently I've got some free time on my hands and chose to make a small compiler for fun and share it with you folks.

While in the past 7 years I've had opportunities (if you can call it that) to piss off various people (officials included) in like 5 different countries, somehow I have resisted doing that and am still free, which, of course, means nothing. In Russia there's this old saying going something like "no one can be safe from poverty or prison". And the saying is still very actual, in some respects more than ever. In Russia you don't seek the various freedoms that you take for granted in, say, the US. Not these days, not yet.

As for politicians, look up George Carlin on politics/politicians on youtube. I think he put it quite well. Russia or America, it doesn't make too much difference. Garbage in, garbage out. :)

Alex

Jean-Marc Lienher

unread,

Nov 23, 2012, 5:17:19 AM11/23/12

to

Hi,

Alexei A. Frounze a écrit :

> On Thursday, November 22, 2012 10:19:14 AM UTC-8, dos...@googlemail.com wrote:
>> Maybe these links are helpful for your project:
>>
>> http://www.cs.utexas.edu/users/tbone/c--/
>>
>> http://c--sphinx.narod.ru/indexe.htm

I would add this non-helpful link ;-) :
http://www.t3x.org/subc/current/index.html
It is the only working compiler for a subset of C that
I know that is really in the public domain.

>> or Jean-Marc's project:
>>
>> https://github.com/cod5/cod5#readme

My project is a little bit stall these days.
I don't have time to work on it, so I think it is really not
yet helpful to anybody.

Jean-Marc

Jean-Marc Lienher

unread,

Nov 23, 2012, 5:20:48 AM11/23/12

to

Hi,

Alexei A. Frounze a écrit :

> On Thursday, November 22, 2012 10:19:14 AM UTC-8, dos...@googlemail.com wrote:
>> Maybe these links are helpful for your project:
>>
>> http://www.cs.utexas.edu/users/tbone/c--/
>>
>> http://c--sphinx.narod.ru/indexe.htm

I would add this non-helpful link ;-) :
http://www.t3x.org/subc/current/index.html
It is the only working compiler for a subset of C that
I know that is really in the public domain.

>> or Jean-Marc's project:
>>
>> https://github.com/cod5/cod5#readme

s_dub...@yahoo.com

unread,

Nov 23, 2012, 10:13:05 AM11/23/12

to

On Thursday, November 22, 2012 7:02:49 AM UTC-6, Alexei A. Frounze wrote:
> I've been working on a simple C compiler lately and here's what I've got so far. It's a fun project, I must tell. :)
>

Do tell!

>
>
> Steve and Rod may be interested to take a look.
>

Yes, I'm certainly interested in your work. It's great that you're still around and have found some time to program for fun!

Steve

BGB

unread,

Nov 23, 2012, 12:08:04 PM11/23/12

to

On 11/23/2012 4:17 AM, Jean-Marc Lienher wrote:
> Hi,
>

> Alexei A. Frounze a �crit :

I wrote a C compiler before, but it hasn't really been maintained, and
even then the thing was very slow/buggy.

for the most part it has been more recently "don't have the time really
needed to justify working on all this...".

the C compiler frontend has since then more been serving as a tool for
things like reflection metadata and aiding gluing between my script
language and C. well, and also being a tool to bundle up this metadata,
along with script code, inside PE/COFF and ELF images (using a
hierarchical WAD-based container format).

dos...@googlemail.com

unread,

Nov 24, 2012, 3:12:35 AM11/24/12

to

Many small C compilers are mentioned and discussed in this thread:

http://www.bttr-software.de/forum/mix_entry.php?id=10980

Georg

Rod Pemberton

unread,

Nov 24, 2012, 3:56:13 AM11/24/12

to

<dos...@googlemail.com> wrote in message
news:03925182-11d1-4b6c...@googlegroups.com...

> Many small C compilers are mentioned and discussed in this thread:
>
> http://www.bttr-software.de/forum/mix_entry.php?id=10980
>

Yes, there are. Like me, Rugxulo usually does a pretty good job of
remembering or collecting lists of information. Over some years, I've
posted quite a few lists of various C compilers to various newsgroups.
Except, I haven't made a formal list of all the C compilers I've located. I
should probably scrounge all my old posts to do so ... and Rugxulo's too!
Unfortunately, I don't think Rugxulo reads alt.os.development, or he'd
probably have a few more C compilers in his lists ... I've mentioned some
rather obscure and some no longer available. He does read/post to
comp.os.msdos.* and BTTR forums. What's odd is one post by "bocke" links to
an old post of mine where I also happen to list a few C compilers at the end
... He didn't link to it for the list of compilers. He linked to my post
for producing a minimal OpenWatcom. There is also a list for a minimal
DJGPP in that thread too. Hopefully, that information is still valid.

Rod Pemberton

Alexei A. Frounze

unread,

Nov 24, 2012, 10:44:21 AM11/24/12

to

Looks like I've been able to make the compiler generate 32-bit code as well as 16-bit piggybacking on gcc's stdlib with DJGPP and MinGW. I'll be updating the code on github in a couple of days if not sooner.

Alex

s_dub...@yahoo.com

unread,

Nov 24, 2012, 11:49:27 AM11/24/12

to

I made a copy of cmplr.c as smlrc.c to work with..

$ gcc cmplr.c -o cmplr
$ ./cmplr smlrc.c
$ nasm -g -f obj smlrc.nsm -Z errstate.txt // does the .obj

$ nasm -g -f elf smlrc.nsm -o smlrcelf -Z errstate.txt // attempt -f elf

smlrc.nsm:1: warning: Unknown section attribute 'PUBLIC' ignored on declaration of section `_TEXT'
smlrc.nsm:1: warning: Unknown section attribute 'CLASS=CODE' ignored on declaration of section `_TEXT'
smlrc.nsm:1: warning: Unknown section attribute 'USE16' ignored on declaration of section `_TEXT'
smlrc.nsm:2: warning: Unknown section attribute 'PUBLIC' ignored on declaration of section `_DATA'
smlrc.nsm:2: warning: Unknown section attribute 'CLASS=DATA' ignored on declaration of section `_DATA'

(smlrcelf gets produced but, this is 16-bit after all.)
fyi,

Steve

dos...@googlemail.com

unread,

Nov 24, 2012, 3:03:14 PM11/24/12

to

Hi Rod,

since you seem to have a good overview of the various C compilers, what C compiler would you recommend to use for writing an operating system?

Georg

Alexei A. Frounze

unread,

Nov 24, 2012, 7:28:23 PM11/24/12

to

Sorry, if I confused you, but the 32-bit capable version exists only on my computer and hasn't been yet uploaded to github, so it's attempting to make ELFs out of the compiler's 16-bit asm output is a bit pointless at the moment.

And then, even if you had the 32-bit capable version, you'd probably need to change something (a command line option, a compile-time macro, etc) to choose between 16-bit and 32-bit output. Currently there isn't any code to allow that choice to be made in an interactive way and I'm changing a global variable in the source code to make the choice. I should probably write some code for this.

Wait a bit for the updated code.

Alex

Alexei A. Frounze

unread,

Nov 24, 2012, 7:36:32 PM11/24/12

to

I'm not Rod, but I have successfully used Turbo C++, Open Watcom and gcc (DJGPP) for building various parts of my OS (boot loader + kernel). Unless you have any specific needs or restrictions, these 3 compilers are good enough for most things 16-bit and 32-bit. Though, you need to get a bit intimate with every compiler toolset to replace the startup and stdlib code with some code of yours and disable or workaround a couple of things here and there. But that's expected since the C standard does not define in any way the implementation details like these, they are left up to the compiler developers.

Alex

dos...@googlemail.com

unread,

Nov 25, 2012, 4:56:53 AM11/25/12

to

Thank you for your message Alex.

>Though, you need to get a bit intimate with every compiler >toolset to replace the startup and stdlib code with some code >of yours and disable or workaround a couple of things here and >there.

Since I have not yet written a C compiler, is there a way to save myself these changes by using a compiler written for OS development? Or maybe use an already available package to modify a common C compiler for that?

Georg

Rod Pemberton

unread,

Nov 25, 2012, 5:06:19 AM11/25/12

to

<dos...@googlemail.com> wrote in message
news:4379c4ae-5022-40d6...@googlegroups.com...

>
> since you seem to have a good overview of the various
> C compilers, what C compiler would you recommend
> to use for writing an operating system?
>

Well, my OS project is stalled. I started using both OpenWatcom (v1.3) and
DJGPP (v2.03) (GCC based) with MS-DOS as my OS. OW can produce 16-bit and
32-bit DPMI applications for DOS. DJGPP can produce 32-bit DPMI
applications for DOS. I was using those two compilers and MS-DOS for
reasons unrelated to my OS project. I also didn't originally plan on coding
an OS. It was the outcome of other projects. So, those compilers and host
OS weren't a planned choice as in: "What's the best compiler and host OS for
developing an OS in C?". That's where you're starting, presumptively. If I
had asked that question first, I may have taken a different path.

The use of pre-existing C compilers and easy access to the hardware under
MS-DOS allowed for very quick development in C, at least initially. There
is very little assembly in my OS. What there is of it is mostly as inline
assembly for each C compiler. I also have a couple trivial files in NASM.
OpenWatcom produces faster code and catches a few C errors that GCC doesn't.
However, OW uses a flat memory model which can potentially hinder
multi-tasking, i.e., non-relocatable code. Eventually, I ran into some
bottlenecks or obstructions with code generated by the compilers and their
libraries. Due to that, I decided I needed to control the compiler I used
for my OS. IMO, although OW has some nice features for OS development that
GCC doesn't, OW seemed to have more issues producing "independent" code than
GCC. I'd probably drop OW entirely if I was starting over.

Currently, I'm working on a variety of C and C-like language compilers and
assemblers etc, most by me, but also a few by others. One of them I've
created seems promising for potential future use with my OS, except that it
needs one of my assemblers which I haven't completed yet ... I had hoped it
would be C based, especially a subset of C, or worst case C-like. But, the
syntax is already different ... It will have functionality from C that I
like, even if the C syntax is lost. I decided simplicity of implementation
and ease of parsing is more important that the actual language, as long as
it's not as primitive as assembly, i.e., still a "high-level" language.
Hopefully, someday, I'll be able to recode my OS in it. Of course, there is
still quite some work to do on that compiler before I can restart my OS
project.

As to what I'd recommend, I know that developing C code on DOS, since it
allows almost complete, unhindered access to the PC's hardware, is
definately a time saver. However, the C compilers available for DOS may not
be the optimal choice for you. But, they did work well for as far as I got.
I probably could've continued much further with them if I hadn't become
aggravated and had problems with my development machine. Unfortunately, I
think I still would've eventually been halted by those issues. Perhaps, I
could've migrated to another, better C compiler at that point ... I.e., you
may wish to use whatever compiler and OS you believe to be the best choice
to develop the non-hardware portions of the OS, but maybe use MS-DOS to
develop code for the hardware since it has almost no hardware restrictions.

Personally, although I'm working on a SmallC variant, I wouldn't use SmallC
to compile my OS. It's too primitive. I'd like to have at least a minimal
subset of C that includes structures. Most versions of SmallC don't have
structures. So, you need a C compiler a bit more powerful than that.
Fortunately, many C compilers are "complete" or at least far more complete
than a C compiler like SmallC. If you're decent in C, you can do without
many language features, and you can do without the C libraries.

Most people using C to develop an OS (around 50%) tend to go the Linux
development route, or, less frequently, the Windows plus emulator route
(Bochs, VirtualBox, Qemu, etc). Everyone else uses assembly.

The problem with emulators is that sometimes they don't accurately capture
the correct operation of the hardware. So, at some point, you need to do
testing on real hardware. E.g., Ben Lunt, who posts here, found problems
with floppy-disk drivers in ... Qemu (?). IIRC, Alexei Frounze found that
emulators sometimes catch errors not detected on real hardware too.

Rod Pemberton

Alexei A. Frounze

unread,

Nov 25, 2012, 6:50:04 AM11/25/12

to

I'm not aware of any compiler specifically targeting OS development, which doesn't mean there isn't one. In practice, gcc may be a good choice since it's available on so many platforms, which I equate with being flexible enough to adapt to the host and target environments.

If you learn about code linking, object/executable formats, code relocation and enough assembly, you'll be able to figure out most of problems and come up with solutions to them. Most compiler adaptation/porting problems are in these areas (unless the compiler code is a mess that adds considerable amount of unwanted and avoidable work).

Alex

Alexei A. Frounze

unread,

Nov 25, 2012, 7:08:35 AM11/25/12

to

On Sunday, November 25, 2012 2:01:45 AM UTC-8, Rod Pemberton wrote:
...

> Currently, I'm working on a variety of C and C-like language compilers and
>
> assemblers etc, most by me, but also a few by others. One of them I've
>
> created seems promising for potential future use with my OS, except that it
>
> needs one of my assemblers which I haven't completed yet ... I had hoped it
>
> would be C based, especially a subset of C, or worst case C-like. But, the
>
> syntax is already different ... It will have functionality from C that I
>
> like, even if the C syntax is lost. I decided simplicity of implementation
>
> and ease of parsing is more important that the actual language, as long as
>
> it's not as primitive as assembly, i.e., still a "high-level" language.
>
> Hopefully, someday, I'll be able to recode my OS in it. Of course, there is
>
> still quite some work to do on that compiler before I can restart my OS
>
> project.

I've recently attended a talk about the Google's Go language.
You might be interested in taking a look at its documentation.
About a half of the language was designed after C with a number of improvements:
- sane syntax, e.g. '[10][20]*[30][40]*int' instead of 'int*(*[10][20])[30][40]'
- numeric types of fixed sizes are guaranteed: int8,int32,int64,etc
- numeric constants don't have "ambiguous" types
- break is assumed at the end of every case
- ++ and -- are not operators, they are statements
- fewer punctuators necessary, e.g. the semicolon and parens in if (IIRC)
- type conversions must be done explicitly when you're mixing types in expressions
- etc
All of the above (along with other unmentioned features) makes the code easier to read and write and gives you fewer opportunities to shoot off your feet.

...

> The problem with emulators is that sometimes they don't accurately capture
>
> the correct operation of the hardware. So, at some point, you need to do
>
> testing on real hardware. E.g., Ben Lunt, who posts here, found problems
>
> with floppy-disk drivers in ... Qemu (?). IIRC, Alexei Frounze found that
>
> emulators sometimes catch errors not detected on real hardware too.
>
> Rod Pemberton

Virtualization/emulation works both ways. It can hide and reveal code bugs.

Alex

s_dub...@yahoo.com

unread,

Nov 25, 2012, 11:13:49 AM11/25/12

to

>
> Sorry, if I confused you, but the 32-bit capable version exists only on my computer and hasn't been yet uploaded to github, so it's attempting to make ELFs out of the compiler's 16-bit asm output is a bit pointless at the moment.
>

Oh, I understood that. Sorry for being terse in the previous msg, I was short for time.

I wanted to see if gcc on this debian box would compile your SmallerC.c into an elf executable, it did. So, the result is a cross development tool. I used that tool to self-compile its C source, to look at the nasm syntax output. (I like your idea of placing expression evaluation state in the comments, btw.)

Nasm builds the -f obj ok, but I wasn't sure if nasm was capable of auto translating the section/segment information to -f elf, it doesn't apparently. So, as you already indicated, the nasm backend in smaller-c needs adjustment for -f elf section information as well as for 16 to 32 bit.

When I went from 16 to 32 bit versions in my smallc, CCNA.C -> CCNA32.C, 16 bit for winXP, 32 bit for linux, it was much easier to make separate versions stand-alone, instead of using compile time switches, because of the way smallc handles integer in its primitive fashion. -also, the -f elf section naming changes for the nasm backend. However, the bulk of the c frontend is the same between the two versions.

>
> And then, even if you had the 32-bit capable version, you'd probably need to change something (a command line option, a compile-time macro, etc) to choose between 16-bit and 32-bit output. Currently there isn't any code to allow that choice to be made in an interactive way and I'm changing a global variable in the source code to make the choice. I should probably write some code for this.
>
> Wait a bit for the updated code.

Sure, no hurry, I have a full plate and not enough time anyway,

Steve

>
> Alex

Alexei A. Frounze

unread,

Nov 25, 2012, 11:57:12 AM11/25/12

to

On Sunday, November 25, 2012 8:13:49 AM UTC-8, s_dub...@yahoo.com wrote:
...

> (I like your idea of placing expression evaluation state in the comments, btw.)

Thanks! It has saved me some debugging time.

> Nasm builds the -f obj ok, but I wasn't sure if nasm was capable of auto translating the section/segment information to -f elf, it doesn't apparently. So, as you already indicated, the nasm backend in smaller-c needs adjustment for -f elf section information as well as for 16 to 32 bit.

Section-wise, I only needed to use "section .text" and "section .data" for ELF instead of what you're seeing for OMF.

> When I went from 16 to 32 bit versions in my smallc, CCNA.C -> CCNA32.C, 16 bit for winXP, 32 bit for linux, it was much easier to make separate versions stand-alone, instead of using compile time switches, because of the way smallc handles integer in its primitive fashion. -also, the -f elf section naming changes for the nasm backend. However, the bulk of the c frontend is the same between the two versions.

I managed to keep the difference between 16-bit and 32-bit codegens to a minimum. The only practical problem w.r.t. integers is that you can't always generate correct 32-bit asm code from a 16-bit version of the compiler because 32-bit constants get truncated to 16 bits. I'm thinking of including a workaround for this situation. The only lacking part is the 32-bit division made using 16-bit ints.

Btw, do your C programs compiled with gcc on Linux include an automatically generated call to __main() at the beginning of main()? This is something I'm seeing in gcc 4.6.x from MinGW and something that isn't in gcc 3.x.x from DJGPP.
I think I need to add a command line option to include that call for gcc.

Alex

Rod Pemberton

unread,

Nov 26, 2012, 4:07:01 AM11/26/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:ee29c026-c1c2-4d19...@googlegroups.com...

>
> I've recently attended a talk about the Google's Go language.
> You might be interested in taking a look at its documentation.
> About a half of the language was designed after C with a number
> of improvements:

...

> - sane syntax, e.g. '[10][20]*[30][40]*int' instead of
> 'int*(*[10][20])[30][40]'

Actually, I'm not quite sure what that is ... At first, I thought it was a
function pointer, but it's missing some parens for that. My guess is an
array of an array (10x20) of pointers to an array of an array (30x40) of
pointers to int ... Wrong? That seems to work for the first "sane" syntax
too.

Sometimes, you'll see complicated declarations for function pointers or
procedures, but I've not generally seem them for arrays.

Usually, I've found that if it's too complicated to read in C, it's too
complicated to use. Although many complicated declarations in C can be
created, a function pointer should be the most difficult syntax you need in
a program. Everything else should be standard types or structs.

> - numeric types of fixed sizes are guaranteed: int8,int32,int64,etc

IMO, that's good. I'd much rather be able to specify absolute sizes. C has
added that ability or something close with the C99 stdint.h types. Of
course, I happen know the cpu mode and integer sizes that my C code is being
written for. I'm only going to compile my code for the correct integer
sizes. The problem is when my code is being compiled for a different mode
or processor that uses integers of different sizes by someone else.

Let's take x86. In 16-bit modes, it's native sizes are 8-bits and 16-bits.
In 32-bit modes, it's native sizes are 8-bits and 32-bits. Of course, you
can get 32-bits in 16-bit mode and 16-bits in 32-bit mode with overrides
after the 386 (or 486?). So, someone coding in C for x86 is likely to use
8-bit integer and whatever native integer larger than 8-bits is available
whenever they need more bits. I.e., the larger integer is not a fixed size
but is either 16-bits or 32-bits as is available for that cpu mode. The
problem when only absolute sizes are available is that an absolute size may
or may not be needed for the code to work correctly, but requires generating
code for that size.

Let's say a programmer specifies int32 since only absolute sizes are
available. Let's also say they are compiling the code as 32-bits. What if
someone else is now compiling the code as 16-bit? Is a 32-bit integer
actually required for the code to work correctly, or not? In most cases,
16-bits will probably work instead of 32-bits. In a rare cases, it won't.
However, since a fixed 32-bit size was specified, the 16-bit code generator
*must* generate 32-bit integers in 16-bit code for all integers ... That's
not good.

I.e., there should be a way to specify exact types when they are needed, but
also allow the compiler to select best fit integers too for "portability"
...

> - numeric constants don't have "ambiguous" types

That may or may not be good depending on what they did to solve the problem.
It ensures the constant is a specific size, but that can result in other
issues:

1) size mismatch between comparison or assignment of a numeric type and a
numeric constant
2) increased use of casts to ensure types match in size

Having "implicit" sizes for constants, based on the size of the type they
are being assigned to or compared with, seems to be a better solution to me.

> - break is assumed at the end of every case

That's bad. I don't want that!

IMO, fall-through is an important programming concept. Fall-through is
almost a necessity for justifying use of a switch() in the first place.
Without fall-through, you'd only need to use a switch() for readability, or
a large number of case statements ... Without fall-through, a switch() is
just nested if-thens. So, you might as well code it as such.

Yes, I understand this is an attempt to reduce coding errors by novices. So
too were 'void' and 'void *'. They prevented some errors, but they also
cause more problems than they prevent. So, it can be argued that they
were misguided. I suspect automatic breaks are misguided too.

I'm sure they probably eliminated the the unstructured switch() that C
supports too, i.e., single-level switch without a section { } which
effectively acts as multiple goto's.

Did they also eliminate pointers too - like Java?

Once they get done, they'll reinvent Pascal ... ;-) I.e., no power,
ultra-safe, no usefulness, etc.

> - ++ and -- are not operators, they are statements

These are insignificant today, but were once convenient. Unfortunately,
I've been using them by themselves for years due to ANSI C's sequence points
... I.e., their advantage of being used correctly within an expression is
no longer guaranteed since ANSI C went into effect.

They also complicate parsing by simple parsers since they don't follow the
same pattern as other operators, i.e., inbetween.

> - fewer punctuators necessary, e.g. the semicolon and parens in if (IIRC)

I'm not sure what parens they could remove from C. Arguments and parameters
are the primary place they are used. They are used for casts and precedence
of operations too ...

I'd think that they'd keep the semicolon since it's primarily used to mark
the end-of-line for the C statement or C declaration. It seems odd that
they'd remove it.

Except for for() and procedure calls, I think that comma's can be eliminated
from C's syntax ... I rarely use them otherwise.

> - type conversions must be done explicitly when you're mixing types in
> expressions

More casts ... I don't see how that's a benefit.

More type constraints ... That's probably a problem.

Although many argue that C's type system is weak, there are many situations
where it gets in the way and is difficult to work around.

> All of the above (along with other unmentioned features) makes the code
> easier to read and write and gives you fewer opportunities to shoot off
> your feet.

Pointers are one of the more powerful programming concepts, yet they allow
you to "shoot off your feet". So, what's better, being able to "shoot off
your feet" (too much power), or not being able to put shoes on your feet to
protect them from frostbite in the first place (too little power)?

Unfortunately, a significant part of C isn't actually part of C. It's the
abilities of the C pre-processor ... When was the last time you coded a
constant without using a #define?

You mentioned that they attempt to ensure better matching of types. What
did they do about if() and switch()? I.e., they are "overloaded" to accept
int's, char's, long's, signed and unsigned ...

(See, there _is_ "overloading" in C. C had it first ... ;-)

Rod Pemberton

Alexei A. Frounze

unread,

Nov 26, 2012, 7:15:40 AM11/26/12

to

On Monday, November 26, 2012 1:02:25 AM UTC-8, Rod Pemberton wrote:
> "Alexei A. Frounze" <alex...@gmail.com> wrote in message

>
> news:ee29c026-c1c2-4d19...@googlegroups.com...
>
> >
>
> > I've recently attended a talk about the Google's Go language.
>
> > You might be interested in taking a look at its documentation.
>
> > About a half of the language was designed after C with a number
>
> > of improvements:
>
> ...
>
>
>
> > - sane syntax, e.g. '[10][20]*[30][40]*int' instead of
>
> > 'int*(*[10][20])[30][40]'
>
>
>
> Actually, I'm not quite sure what that is ... At first, I thought it was a
>
> function pointer, but it's missing some parens for that. My guess is an
>
> array of an array (10x20) of pointers to an array of an array (30x40) of
>
> pointers to int ... Wrong? That seems to work for the first "sane" syntax
>
> too.

You parsed it correctly, there wasn't any function involved, just arrays, pointers and int.

In C(++), pointers to functions and arrays are declared very similarly, it's just the parens and brackets that change: '(*something)(params)' or '(*something)[element_count]'.

> Sometimes, you'll see complicated declarations for function pointers or
>
> procedures, but I've not generally seem them for arrays.

Pointers to arrays typically start becoming useful when you want to pass around pointers to multidimensional arrays. For 1d arrays, it's a waste of code.

> Usually, I've found that if it's too complicated to read in C, it's too
>
> complicated to use. Although many complicated declarations in C can be
>
> created, a function pointer should be the most difficult syntax you need in
>
> a program. Everything else should be standard types or structs.

I think once you've truly mastered C, the only readability problem that's left is someone else's unfamiliar, large, poor and buggy code. C itself at that point isn't a problem.

Go does support native-machine-word-sized ints. They kept int probably for the reasons you're stating.

> > - numeric constants don't have "ambiguous" types
>
>
>
> That may or may not be good depending on what they did to solve the problem.
>
> It ensures the constant is a specific size, but that can result in other
>
> issues:
>
>
>
> 1) size mismatch between comparison or assignment of a numeric type and a
>
> numeric constant
>
> 2) increased use of casts to ensure types match in size
>
>
>
> Having "implicit" sizes for constants, based on the size of the type they
>
> are being assigned to or compared with, seems to be a better solution to me.

The compiler checks whether a constant can be used in the context where it appears. If it's too big for the target type, the code fails to compile.

IMO, that's far better than writing 123456 (just an arbitrary constant, don't be picky) and not knowing whether it's going to be int or long or unsigned long or something else, potentially introducing undesired (and potentially overlooked) sign/zero-extensions leading to bugs, and whether or not it's going to be truncated (because, say, it's C89 without long longs, or vice versa, C89 code would function differently when compiled with a C99 compiler).

> > - break is assumed at the end of every case
>
>
>
> That's bad. I don't want that!
>
>
>
> IMO, fall-through is an important programming concept. Fall-through is
>
> almost a necessity for justifying use of a switch() in the first place.
>
> Without fall-through, you'd only need to use a switch() for readability, or
>
> a large number of case statements ... Without fall-through, a switch() is
>
> just nested if-thens. So, you might as well code it as such.

I should've mentioned that there is fall-through. It's just that they reversed the default/implicit behavior:

C: fall-through is implicit, break must be written explicitly
Go: break is implicit, fallthrough must be written explicitly

That's a good change, IMO.

> Yes, I understand this is an attempt to reduce coding errors by novices. So
>
> too were 'void' and 'void *'. They prevented some errors, but they also
>
> cause more problems than they prevent. So, it can be argued that they
>
> were misguided. I suspect automatic breaks are misguided too.

They aren't. Most of the time you intend to and use breaks. You rarely stack up more than 2 cases with fall-throughs, but you often have switches with 4+ cases, most of which end with break.

Some of the problems with C are due to the facts that:
- frankly, C is an odd beast
- many people don't know (enough) C but expect it to behave in logical-to-them ways, e.g. they expect C arithmetic to be your regular math class arithmetic (promotions/extensions/truncations or lack thereof come as surprise). Making a bunch of mistakes and reading good C books cures that, but for one reason or the other that cure may take years to happen
- there have been many bad C books touching very lightly on important topics or explaining them poorly. The "Teach Yourself <BLAH> in <X> Days" is often the kind of books from which one can kind of get the feel of the language, but will still have a lot of glaring holes in their knowledge of it. These days it's far more easier to find enough information online about language X or get some opinions on books on it. It didn't use to be so in the past, especially in countries lagging behind and needing to translate books from foreign languages.
- C compilers, especially the early ones, could provide very little help in terms of useful warnings or static code analysis. Little memories and storage, slow CPUs and overall (im)maturity of the industry were not very conducive to having great compilers. I'm looking at my C compiler... I can barely fit all the logic and data structures into 64K-128K without using assembly or overlays or multiple compilation stages or disk storage. I'm not saying a C compiler should fit into under 128K, I'm saying the computers of the past didn't have much more than that. The further you dig into the past, the less RAM and storage you find there.

> I'm sure they probably eliminated the the unstructured switch() that C
>
> supports too, i.e., single-level switch without a section { } which
>
> effectively acts as multiple goto's.

I view the switch, the way it is, as an artifact of history or just a bug that couldn't be fixed without breaking lots of code.

> Did they also eliminate pointers too - like Java?

Yes and no. There are pointers. There's no pointer arithmetic. There's GC.

> Once they get done, they'll reinvent Pascal ... ;-) I.e., no power,
>
> ultra-safe, no usefulness, etc.

Not true. I have not mentioned many other things, you can read about them on the web. And the guys are somewhat done, they have frozen the language for the community to capitalize on the stability of the language. No major changes (except bugfixes and further development of libraries) are planned.

> > - ++ and -- are not operators, they are statements
>
>
>
> These are insignificant today, but were once convenient. Unfortunately,
>
> I've been using them by themselves for years due to ANSI C's sequence points
>
> ... I.e., their advantage of being used correctly within an expression is
>
> no longer guaranteed since ANSI C went into effect.

As long as sequence points are observed, there's no problem with them. In here you have a tension and a trade-off between unspecified implementation details (the order of events/evaluation), which benefits compiler writers, and the desire to optimize code as much as practically possible, which mostly benefits compiler users and the two things go hand-in-hand. Write code not exhibiting undefined behavior or undesired implementation-specific/defined behavior and you will be OK.

> They also complicate parsing by simple parsers since they don't follow the
>
> same pattern as other operators, i.e., inbetween.

The high terseness/succinctness of C syntax was probably important when the cost of a bytes of RAM or disk was more than it is today and when displays and printers weren't as good and as widespread as they are today.

It could be relaxed/improved today.

> > - fewer punctuators necessary, e.g. the semicolon and parens in if (IIRC)
>
>
>
> I'm not sure what parens they could remove from C. Arguments and parameters
>
> are the primary place they are used. They are used for casts and precedence
>
> of operations too ...

IIRC, the if/while/switch condition occupies the whole line of code and as such needs not surrounding parens. They may be needed if you write your code more compactly, but I don't know if that's the case.

> I'd think that they'd keep the semicolon since it's primarily used to mark
>
> the end-of-line for the C statement or C declaration. It seems odd that
>
> they'd remove it.

You don't really need it after a number of things. You don't need it after break, continue, goto, label, etc. In many cases it's just a convenient (for the parser) thing that bears no function. Whenever the compiler can't accept another token, it could assume the existence of an invisible semicolon there. And ambiguous cases, if any, could be fixed either with an explicit semicolon or something else.

> > - type conversions must be done explicitly when you're mixing types in
>
> > expressions
>
>
>
> More casts ... I don't see how that's a benefit.

It makes you think or at least it gives you a chance to think of what you're doing and whether it's right. Of course, you can misuse casts just as well as ignore (in all senses of the word) the implicit conversions as they're done in C. Less of a chance to commit a silly mistake needing extra code reviews or debugging.

> Although many argue that C's type system is weak, there are many situations
>
> where it gets in the way and is difficult to work around.

True. I've written 32-bit +,-,*,/,%,compare using 16-bit signed ints (I'm thinking of fully supporting 32-bit code generation in 16-bit versions of the compiler and I need 32-bit arithmetic for that). It's a PITA to do it. It's a somewhat artificial problem, but it does illustrate some of the issues well.

> > All of the above (along with other unmentioned features) makes the code
>
> > easier to read and write and gives you fewer opportunities to shoot off
>
> > your feet.
>
>
>
> Pointers are one of the more powerful programming concepts, yet they allow
>
> you to "shoot off your feet". So, what's better, being able to "shoot off
>
> your feet" (too much power), or not being able to put shoes on your feet to
>
> protect them from frostbite in the first place (too little power)?

I'd prefer a better balance. But C is too old and too important to be changed quickly and drastically.

> Unfortunately, a significant part of C isn't actually part of C. It's the
>
> abilities of the C pre-processor ... When was the last time you coded a
>
> constant without using a #define?

In C, that's about the only thing you can do besides enumerations.

IMO, the C preprocessor is ugly.

> You mentioned that they attempt to ensure better matching of types. What
>
> did they do about if() and switch()? I.e., they are "overloaded" to accept
>
> int's, char's, long's, signed and unsigned ...
>
>
>
> (See, there _is_ "overloading" in C. C had it first ... ;-)

I'm not sure I understand the question or know the answer to it.

Alex

Rod Pemberton

unread,

Nov 27, 2012, 7:31:40 AM11/27/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:90f256bc-3a39-4127...@googlegroups.com...

> On Monday, November 26, 2012 1:02:25 AM UTC-8, Rod Pemberton wrote:

...

> > Having "implicit" sizes for constants, based on the size of the type
> > they are being assigned to or compared with, seems to be a better
> > solution to me.
>
> The compiler checks whether a constant can be used in the context
> where it appears. If it's too big for the target type, the code fails to
> compile.
>
> IMO, that's far better than writing 123456 (just an arbitrary constant,
> don't be picky) and not knowing whether it's going to be int or long or
> unsigned long or something else, potentially introducing undesired
> (and potentially overlooked) sign/zero-extensions leading to bugs, and
> whether or not it's going to be truncated (because, say, it's C89 without
> long longs, or vice versa, C89 code would function differently when
> compiled with a C99 compiler).

Truncation is an issue with placing a large value into a smaller integer,
but I'd think a warning would suffice for that.

> I should've mentioned that there is fall-through. It's just that they
> reversed the default/implicit behavior:
>
> C: fall-through is implicit, break must be written explicitly
> Go: break is implicit, fallthrough must be written explicitly
>
> That's a good change, IMO.

I'd probably say acceptable.

Did they introduce a new keyword to allow for fall-through, or did they
re-use "continue"?

> You rarely stack up more than 2 cases with fall-throughs,
> but you often have switches with 4+ cases, most of which end with break.

Interesting...

I'm not sure how many fall-throughs I've used at once. I seem to recall one
situation with three, but I can't find it right now to confirm. Most seem
to be two case _sections_ even though some have many case labels...

> I'm looking at my C compiler... I can barely fit all the logic and
> data structures into 64K-128K without using assembly or overlays
> or multiple compilation stages or disk storage. I'm not saying a C
> compiler should fit into under 128K, I'm saying the computers of
> the past didn't have much more than that. The further you dig into
> the past, the less RAM and storage you find there.

A) Well, you just have too much stuff in your C compiler already! :)

B) Yeah, that's why Steve and I were working with SmallC. It's small. ;)

You can make your C smaller, if need be. I'm not going to go through your
list of stuff you implemented, but I seem to recall stuff like variadic
arguments, VLAs, unnamed variables or structs or unions, the :? operator,
etc. You can cut stuff like that out ... You can even remove for() and
do-while, etc, since a while() with break; and continue; can do the same
thing. Or, you can cut switch() since many can be implemented as if-thens.
Etc. You need, what, two integer types? A large integer is needed for
arithmetic and a small integer or a char is needed for ASCII/EBCDIC strings.

> > Did they also eliminate pointers too - like Java?
>
> Yes and no. There are pointers. There's no pointer arithmetic. There's GC.

Okay, so no _real_ pointers ... :-)

I.e., "no pointer arithmetic" means you can't point *into* an object, like
to increment through a string or an array. So, they've got Java style
"pointers" or references. GC seems to also imply Java style references.

> > > - fewer punctuators necessary, e.g. the semicolon and parens in if
> >

> > I'm not sure what parens they could remove from C. Arguments and
> > parameters are the primary place they are used. They are used for casts
> > and precedence of operations too ...
>
> IIRC, the if/while/switch condition occupies the whole line of code and as
> such needs not surrounding parens. They may be needed if you write your
> code more compactly, but I don't know if that's the case.

Okay, that sounds like they've embraced a modern IDE, or perhaps a
line-based code editor for coding.

> > I'd think that they'd keep the semicolon since it's primarily used to
> > mark the end-of-line for the C statement or C declaration. It seems
> > odd that they'd remove it.
>
> You don't really need it after a number of things. You don't need it after
> break, continue, goto, label, etc. In many cases it's just a convenient
> (for the parser) thing that bears no function.

You don't?

if(sts) break; if(cnd) ...

if(sts)
break;
if(cnd) ...

If newlines are mostly unimportant to parsing, as in C, I'd think they'd
need it. If they switched to line based parsing, instead of semicolon
based, you wouldn't need it.

What do they do for line continuation? E.g., for string concatenation
across multiple text lines.

> Whenever the compiler can't accept another token, it could assume the
> existence of an invisible semicolon there. And ambiguous cases, if any,
> could be fixed either with an explicit semicolon or something else.

So, I'll take it the language isn't LALR(1)...

C basically has an "implicit" or "invisible" keyword for the *use* of a
typedef. Compare the declaration and usage syntax for the struct and union
to that for typedefs. That missing keyword makes C not be LALR(1) and
causes lots of parsing problems. (I'm sure I've mentioned this previously.)

> > Although many argue that C's type system is weak, there are many
> > situations where it gets in the way and is difficult to work around.
>
> True. I've written 32-bit +,-,*,/,%,compare using 16-bit signed ints
> (I'm thinking of fully supporting 32-bit code generation in 16-bit
> versions of the compiler and I need 32-bit arithmetic for that). It's a
> PITA to do it. It's a somewhat artificial problem, but it does illustrate
> some of the issues well.

On x86, for 16-bit code, you've used 16-bit signed ints for 32-bit integer
code? Why?! Why didn't you just use address-size and operand-size
overrides?

For processors that support mixed-mode code, i.e, post 386 or post 486 ...
(?), I think you can use any 32-bit instruction in 16-bit or 16-bit
instruction in 32-bit code, as long as the overrides are correct ...

> > You mentioned that they attempt to ensure better matching of types.
> > What did they do about if() and switch()? I.e., they are "overloaded"
> > to accept int's, char's, long's, signed and unsigned ...
>

> I'm not sure I understand the question or know the answer to it.
>

What size is the "what_size_is_this_integer" integer below?

if("what_size_is_this_integer") break;

Isn't if() and switch() required to accept a variety of integer sizes and
types: int's, char's, long's, signed and unsigned integers? (and
expressions and perhaps floats too...)

I.e., can if() and switch() truncate an ULL such as 0xF000000000000000 to
become a zero, or not?

If if() and switch() are required to accept multiple integer sizes and
types, then they are "overloaded", yes?

Rod Pemberton

Alexei A. Frounze

unread,

Nov 27, 2012, 8:36:07 AM11/27/12

to

On Sunday, November 25, 2012 8:13:49 AM UTC-8, s_dub...@yahoo.com wrote:
> >
>

Version 1.00 supporting 32-bit code generation is up for grabs!

Code:
https://github.com/alexfru/SmallerC/tree/master/v0100

Latest "documentation":
https://github.com/alexfru/SmallerC/wiki/Smaller-C-Wiki

I'm interested to see if it's going to work on Linux all the way: compiling itself and then bootstrapping with gcc's stdlib.

Alex

Alexei A. Frounze

unread,

Nov 27, 2012, 10:40:29 AM11/27/12

to

On Tuesday, November 27, 2012 4:27:02 AM UTC-8, Rod Pemberton wrote:
> "Alexei A. Frounze" <alex...@gmail.com> wrote in message

>
> news:90f256bc-3a39-4127...@googlegroups.com...
>
> > On Monday, November 26, 2012 1:02:25 AM UTC-8, Rod Pemberton wrote:
>
> ...
>
>
>
> > > Having "implicit" sizes for constants, based on the size of the type
>
> > > they are being assigned to or compared with, seems to be a better
>
> > > solution to me.
>
> >
>
> > The compiler checks whether a constant can be used in the context
>
> > where it appears. If it's too big for the target type, the code fails to
>
> > compile.
>
> >
>
> > IMO, that's far better than writing 123456 (just an arbitrary constant,
>
> > don't be picky) and not knowing whether it's going to be int or long or
>
> > unsigned long or something else, potentially introducing undesired
>
> > (and potentially overlooked) sign/zero-extensions leading to bugs, and
>
> > whether or not it's going to be truncated (because, say, it's C89 without
>
> > long longs, or vice versa, C89 code would function differently when
>
> > compiled with a C99 compiler).
>
>
>
> Truncation is an issue with placing a large value into a smaller integer,
>
> but I'd think a warning would suffice for that.

People routinely ignore warnings. Compilation errors make silent errors loud.

> > I should've mentioned that there is fall-through. It's just that they
>
> > reversed the default/implicit behavior:
>
> >
>
> > C: fall-through is implicit, break must be written explicitly
>
> > Go: break is implicit, fallthrough must be written explicitly
>
> >
>
> > That's a good change, IMO.
>
>
>
> I'd probably say acceptable.
>
>
>
> Did they introduce a new keyword to allow for fall-through, or did they
>
> re-use "continue"?

They have "fallthrough".

> > You rarely stack up more than 2 cases with fall-throughs,
>
> > but you often have switches with 4+ cases, most of which end with break.
>
>
>
> Interesting...
>
>
>
> I'm not sure how many fall-throughs I've used at once. I seem to recall one
>
> situation with three, but I can't find it right now to confirm. Most seem
>
> to be two case _sections_ even though some have many case labels...

That's exactly the point of the change, to optimize the syntax for the most common case and avoid problems with forgotten breaks.

> > I'm looking at my C compiler... I can barely fit all the logic and
>
> > data structures into 64K-128K without using assembly or overlays
>
> > or multiple compilation stages or disk storage. I'm not saying a C
>
> > compiler should fit into under 128K, I'm saying the computers of
>
> > the past didn't have much more than that. The further you dig into
>
> > the past, the less RAM and storage you find there.
>
>
>
> A) Well, you just have too much stuff in your C compiler already! :)
>
>
>
> B) Yeah, that's why Steve and I were working with SmallC. It's small. ;)
>
>
>
> You can make your C smaller, if need be. I'm not going to go through your
>
> list of stuff you implemented, but I seem to recall stuff like variadic
>
> arguments, VLAs, unnamed variables or structs or unions, the :? operator,
>
> etc.

I don't have VLAs, structs/unions, ?:.

> You can cut stuff like that out ... You can even remove for() and
>
> do-while, etc, since a while() with break; and continue; can do the same
>
> thing. Or, you can cut switch() since many can be implemented as if-thens.
>
> Etc. You need, what, two integer types? A large integer is needed for
>
> arithmetic and a small integer or a char is needed for ASCII/EBCDIC strings.

And I currently do have only two integer types, int and char.

> > > > - fewer punctuators necessary, e.g. the semicolon and parens in if
>
> > >
>
> > > I'm not sure what parens they could remove from C. Arguments and
>
> > > parameters are the primary place they are used. They are used for casts
>
> > > and precedence of operations too ...
>
> >
>
> > IIRC, the if/while/switch condition occupies the whole line of code and as
>
> > such needs not surrounding parens. They may be needed if you write your
>
> > code more compactly, but I don't know if that's the case.
>
>
>
> Okay, that sounds like they've embraced a modern IDE, or perhaps a
>
> line-based code editor for coding.

I'm not sure what you mean here. I don't think ED/EDLIN and the like from the past have anything to do with the parens.

> > > I'd think that they'd keep the semicolon since it's primarily used to
>
> > > mark the end-of-line for the C statement or C declaration. It seems
>
> > > odd that they'd remove it.
>
> >
>
> > You don't really need it after a number of things. You don't need it after
>
> > break, continue, goto, label, etc. In many cases it's just a convenient
>
> > (for the parser) thing that bears no function.
>
>
>
> You don't?
>
>
>
> if(sts) break; if(cnd) ...
>
>
>
> if(sts)
>
> break;
>
> if(cnd) ...
>
>
>
> If newlines are mostly unimportant to parsing, as in C, I'd think they'd
>
> need it. If they switched to line based parsing, instead of semicolon
>
> based, you wouldn't need it.

Can't say much here, the documentation should answers to this.

> What do they do for line continuation? E.g., for string concatenation
>
> across multiple text lines.

Ditto.

> > Whenever the compiler can't accept another token, it could assume the
>
> > existence of an invisible semicolon there. And ambiguous cases, if any,
>
> > could be fixed either with an explicit semicolon or something else.
>
>
>
> So, I'll take it the language isn't LALR(1)...
>
>
>
> C basically has an "implicit" or "invisible" keyword for the *use* of a
>
> typedef. Compare the declaration and usage syntax for the struct and union
>
> to that for typedefs. That missing keyword makes C not be LALR(1) and
>
> causes lots of parsing problems. (I'm sure I've mentioned this previously.)

I'm not sure there's a problem here.

My compiler parses declarations of arbitrary complexity containing void/char/int, [], * and (). It does not look ahead for more than one token. Every new input token drives state transition. There's no reparsing.

Replace void/char/int with typedef and you have the same parsing problem to solve.

What's so special about structs? Your base type changes from

void/char/int (or typedef)

to

struct tag-opt {/*other stuff, which you know how to deal with already, recursive struct included*/}

and you're back to solving the same parsing problem.

Really, unless we're talking about C++, where parsing ambiguities exist and need to be resolved in non-trivial ways, declaration parsing isn't that complex/complicated in C.

This is what C declarations (non-K&R) boil down to:

base-type stars-optional object-optional brackets/parens-optional

In there, base-type is these things:

void, char, int, struct tag-optional {}, typedef, etc

And stars are zero or more of:

*

In there, object is either

nothing

or

an identifier

or the already familiar construct, now parenthesized

(stars-optional object-optional brackets/parens-optional)

Parameter declarations inside parens are the same thing pretty much.

And I fail to see the magic in tools like cdecl. They're helpful, but there's no magic, complex declarations are manually parseable and constructable.

> > > Although many argue that C's type system is weak, there are many
>
> > > situations where it gets in the way and is difficult to work around.
>
> >
>
> > True. I've written 32-bit +,-,*,/,%,compare using 16-bit signed ints
>
> > (I'm thinking of fully supporting 32-bit code generation in 16-bit
>
> > versions of the compiler and I need 32-bit arithmetic for that). It's a
>
> > PITA to do it. It's a somewhat artificial problem, but it does illustrate
>
> > some of the issues well.
>
>
>
> On x86, for 16-bit code, you've used 16-bit signed ints for 32-bit integer
>
> code? Why?! Why didn't you just use address-size and operand-size
>
> overrides?

Um, because the compiler is written in C and I expect it to:
- be compilable with other (plain/normal) C compilers
- be able to compile itself
?

Asm code in the compiler source code will make the compiler itself less portable.

How about you compile Smaller C to 16-bit, 32-bit and 64-bit (or maybe even 24-bit and 100-bit?) executables with compilers of your choice on platforms of your choice? You should be able to do it now and the binary should be functional (except, perhaps, the error reporting function error(), which has a dirty hack to simulate the va_something macros, and that little piece of code may crash the compiler if there's a compilation error when you compile something else with it).

> > > You mentioned that they attempt to ensure better matching of types.
>
> > > What did they do about if() and switch()? I.e., they are "overloaded"
>
> > > to accept int's, char's, long's, signed and unsigned ...
>
> >
>
> > I'm not sure I understand the question or know the answer to it.
>
> >
>
>
>
> What size is the "what_size_is_this_integer" integer below?
>
>
>
> if("what_size_is_this_integer") break;
>
>
>
> Isn't if() and switch() required to accept a variety of integer sizes and
>
> types: int's, char's, long's, signed and unsigned integers? (and
>
> expressions and perhaps floats too...)
>
>
>
> I.e., can if() and switch() truncate an ULL such as 0xF000000000000000 to
>
> become a zero, or not?
>
>
>
> If if() and switch() are required to accept multiple integer sizes and
>
> types, then they are "overloaded", yes?

I'm not sure. Really, all if, while and for care about is whether the conditional expression evaluates to true (a non-zero value) or false (a zero value).

Alex

Alexei A. Frounze

unread,

Nov 27, 2012, 3:03:28 PM11/27/12

to

OK, I've just got out my testbox with x86 Ubuntu, put Smaller C on it, compiled it with gcc, recompiled it with itself, compiled that with NASM and then after one extra step I linked the thing with gcc's stdlib and it all worked. That bootstrapped executable in turn was able to compile Smaller C!

Turns out, I only need to drop the underscore prefix from things like _printf and _main. And that's the extra step.

OK, one more command line option is needed to deal with the underscores.
And then I need to rename all the internal labels from something like L1234 to something else in order to avoid name collisions with C objects and functions.

Alex

wolfgang kern

unread,

Nov 27, 2012, 4:00:16 PM11/27/12

to

Alexei A. Frounze" posted:
perhaps due to Google-Goggle which use a non-standard news format:
...

|OK, I've just got out my testbox with x86 Ubuntu, put Smaller C on it,
|compiled it with gcc, recompiled it with itself, compiled that with NASM
and |then after one extra step I linked the thing with gcc's stdlib and it
all |worked. That bootstrapped executable in turn was able to compile
Smaller C!

|Turns out, I only need to drop the underscore prefix from things like
|_printf and _main. And that's the extra step.

|OK, one more command line option is needed to deal with the underscores.
|And then I need to rename all the internal labels from something like L1234
|to something else in order to avoid name collisions with C objects and
|functions.

|Alex

It's quite a long time I've see you posting here ...
Also: Haven't seen any post from Maxim since a while ?

Methink we all learned something from each other in the past,
and I miss Beth and her inspireing ideas along with all the 'useless'
(on a first glimpse) discussions about programming at all ...

I continued on my way with some minor success ... made money at least.
but still not fully satisfied to see my ideas become standard :):):)

__
wolfgang (remisce about ideas we once shared)

s_dub...@yahoo.com

unread,

Nov 27, 2012, 4:34:20 PM11/27/12

to

On Tuesday, November 27, 2012 2:03:28 PM UTC-6, Alexei A. Frounze wrote:
> On Tuesday, November 27, 2012 5:36:07 AM UTC-8, Alexei A. Frounze wrote:
>
> > On Sunday, November 25, 2012 8:13:49 AM UTC-8, s_dub...@yahoo.com wrote:

> > > I wanted to see if gcc on this debian box would compile your SmallerC.c into an elf executable, it did. So, the result is a cross development tool. I used that tool to self-compile its C source, to look at the nasm syntax output. (I like your idea of placing expression evaluation state in the comments, btw.)
>

>
> > > Nasm builds the -f obj ok, but I wasn't sure if nasm was capable of auto translating the section/segment information to -f elf, it doesn't apparently. So, as you already indicated, the nasm backend in smaller-c needs adjustment for -f elf section information as well as for 16 to 32 bit.
>

> > I'm interested to see if it's going to work on Linux all the way: compiling itself and then bootstrapping with gcc's stdlib.
>
> >
>
>
>
> OK, I've just got out my testbox with x86 Ubuntu, put Smaller C on it, compiled it with gcc, recompiled it with itself, compiled that with NASM and then after one extra step I linked the thing with gcc's stdlib and it all worked. That bootstrapped executable in turn was able to compile Smaller C!
>
>
>
> Turns out, I only need to drop the underscore prefix from things like _printf and _main. And that's the extra step.
>
>
>
> OK, one more command line option is needed to deal with the underscores.
>
> And then I need to rename all the internal labels from something like L1234 to something else in order to avoid name collisions with C objects and functions.
>
>
>
> Alex

Good, you've beat me to it.

(copied smlrc.c as smlrc32.c)

$ ./smlrc -seg32 smlrc32.c >smlrc32.nsm

$ nasm -f elf -o smlrc32.o smlrc32.nsm -Z errfile
(errfile is empty - good!)

$ ld --dynamic-linker /lib/ld-linux.so.2 -lc -o smlrc32a smlrc32.o

ld - The GNU linker also gives (besides trouble with leading underscores)..

ld: warning: cannot find entry symbol _start; defaulting to 0000000008048190

I think you need someone more versed in ld than I, I think there is a switch for this.

But for CCNA32 I #include'd a prolog file that had:

[SECTION .text]

global _start

_start:
nop ;; no-op for gdb...

mov esp, stacktop

;; -= MAIN =-

call _main ;; internally generated label with leading underscore for
;; main()

Done:
. . .

Steve

Alexei A. Frounze

unread,

Nov 27, 2012, 4:46:11 PM11/27/12

to

On Tuesday, November 27, 2012 1:34:20 PM UTC-8, s_dub...@yahoo.com wrote:
> On Tuesday, November 27, 2012 2:03:28 PM UTC-6, Alexei A. Frounze wrote:
>
> > On Tuesday, November 27, 2012 5:36:07 AM UTC-8, Alexei A. Frounze wrote:
>
> >
>
> > > On Sunday, November 25, 2012 8:13:49 AM UTC-8, s_dub...@yahoo.com wrote:
>
>
>
> > > > I wanted to see if gcc on this debian box would compile your SmallerC.c into an elf executable, it did. So, the result is a cross development tool. I used that tool to self-compile its C source, to look at the nasm syntax output. (I like your idea of placing expression evaluation state in the comments, btw.)
>
> >
>
>
>
> >
>
> > > > Nasm builds the -f obj ok, but I wasn't sure if nasm was capable of auto translating the section/segment information to -f elf, it doesn't apparently. So, as you already indicated, the nasm backend in smaller-c needs adjustment for -f elf section information as well as for 16 to 32 bit.
>
> >
>
>
>
>
>
>
>
>
>
> > > I'm interested to see if it's going to work on Linux all the way: compiling itself and then bootstrapping with gcc's stdlib.
>
> >
>
> > >
>
> >
>
> >
>
> >
>
> > OK, I've just got out my testbox with x86 Ubuntu, put Smaller C on it, compiled it with gcc, recompiled it with itself, compiled that with NASM and then after one extra step I linked the thing with gcc's stdlib and it all worked. That bootstrapped executable in turn was able to compile Smaller C!
>
> >
>
> >
>
> >
>
> > Turns out, I only need to drop the underscore prefix from things like _printf and _main. And that's the extra step.
>
> >
>
> >
>
> >
>
> > OK, one more command line option is needed to deal with the underscores.
>
> >
>
> > And then I need to rename all the internal labels from something like L1234 to something else in order to avoid name collisions with C objects and functions.
>
> >
>
> >
>
> >
>
> > Alex
>
>
>
> Good, you've beat me to it.
>
>
>
> (copied smlrc.c as smlrc32.c)
>
>
>
> $ ./smlrc -seg32 smlrc32.c >smlrc32.nsm
>
>
>
> $ nasm -f elf -o smlrc32.o smlrc32.nsm -Z errfile
>
> (errfile is empty - good!)
>
>
>
> $ ld --dynamic-linker /lib/ld-linux.so.2 -lc -o smlrc32a smlrc32.o
>
>
>
> ld - The GNU linker also gives (besides trouble with leading underscores)..
>
>
>
> ld: warning: cannot find entry symbol _start; defaulting to 0000000008048190
>
>
>
> I think you need someone more versed in ld than I, I think there is a switch for this.
>

I didn't invoke ld directly. I just passed smlrc32.o to gcc as if it were a .c file. That should save some (or a lot of?) unnecessary trouble. Try it too.

Alex

Alexei A. Frounze

unread,

Nov 28, 2012, 2:30:22 AM11/28/12

to

I've updated the code to support ununderscored names for Linux ELF, please get it from the same place: https://github.com/alexfru/SmallerC/tree/master/v0100

This is what works for me on x86 Ubuntu:

gcc -Wall -Wextra -O2 smlrc.c -o smlrc
./smlrc -no-leading-underscore -seg32 smlrc.c >smlrclinux.asm
nasm -f elf smlrclinux.asm -o smlrclinux.o
gcc smlrclinux.o -o smlrclinux

Alex

Rod Pemberton

unread,

Nov 28, 2012, 4:09:06 AM11/28/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:52ffcced-1d7c-4332...@googlegroups.com...

> On Tuesday, November 27, 2012 4:27:02 AM UTC-8, Rod Pemberton
> wrote:

...

> > You can make your C smaller, if need be. I'm not going to go
> > through your list of stuff you implemented, but I seem to
> > recall stuff like variadic arguments, VLAs, unnamed variables
> > or structs or unions, the :? operator,
>

> I don't have VLAs, structs/unions, ?:.

Did I go the wrong link ... ?
I was looking at the links posted by "dosusb" too.
Perhaps, I overlooked the "... NOT supported ..." title.

> > > Whenever the compiler can't accept another token, it could
> > > assume the existence of an invisible semicolon there. And
> > > ambiguous cases, if any, could be fixed either with an
> > > explicit semicolon or something else.
>
> > So, I'll take it the language isn't LALR(1)...
>
> > C basically has an "implicit" or "invisible" keyword for the
> > *use* of a typedef. Compare the declaration and usage syntax
> > for the struct and union to that for typedefs. That missing
> > keyword makes C not be LALR(1) and causes lots of parsing
> > problems. (I'm sure I've mentioned this previously.)
>
> I'm not sure there's a problem here.
>
> My compiler parses declarations of arbitrary complexity

> containing [...]

I think I lost you. I was referring to use of a typedef.

E.g.,

struct first {
/* ... */
} ;

Here, "first" is an identifier.

struct first my_struct;

The "struct" keyword is used for both declaration and usage of a
struct. This is true of struct's, union's, and enum's in C.

typedef int second;

Here, "second" is a typedef name or typename.

second my_int;

The "typedef" keyword is _not_ used for the usage of a typedef.
It's only used for the typedef's declaration. I.e., there is an
invisible or implicit keyword for usage of a typedef:

"typedef" second my_int;

The problem for a parser is that when it comes across the token
"second", it isn't able to determine if "second" is an identifier
or a typedef name or an implicit int, without help. That help is
called a symbol table. The problem is that without a keyword for
usage of a "typedef" a symbol table must be constructed
dynamically, at compile-time, to determine what "second" actually
is. The parser does a lookup in the symbol table to determine how
to proceed. Since "struct" precedes "first", you can determine
that "first" is a "struct", i.e., "first" is an identifier for a
struct. But, you cannot determine from purely from the code near
where "second" is used, what a token of "second" actually is. It
can be an identifier, a typedef name, or an implicit int. If
using tools like flex/lex or yacc/bison, this requires a "hack" to
pass the relevant information to construct the dynamic symbol
table.

When one adds in implicit int's, the ability to resolve whether a
token is a typdef name or an implicit int becomes _very_
complicated. So, they eliminated implicit int's for C99. (Some
years ago, I attempted to create a table of all possible
combinations. I ended up with 36 states and still hadn't
completely resolved all interactions ...)

In Annex A of ISO C99, the grammer rule for 'typedef-name' is:

(6.7.7) typedef-name:
identifier

Supposedly, this rule is the second LALR(1) ambiguity in the C
grammar, after the dangling if problem, i.e., terminated if()
versus if()-else(). I'm not real clear on what LALR(1) is other
than it's a type of parsing with lookahead of one character.

If there was a keyword for the use of a typedef, as there is for
struct's, union's, or enum's, the typedef rule would look like
rules for the struct, union, and enum keywords:

E.g., for enum's:
(6.7.2.2) enum-specifier
...
enum identifier

- where 'enum' is the 'enum' keyword.

E.g., for struct's or union's:
(6.7.2.1) struct-or-union-specifier:
...
struct-or-union identifier

- where 'struct-or-union' is another rule for the 'struct' or
'union' keyword.

In each case, a keyword is followed by an identifier. So, 6.7.7
would need to be changed like so:
(6.7.7) typedef-name:
'typename' identifier

- where 'typename' is whatever keyword is used to represent usage
of a typedef. Currently, the 'typename' keyword is "invisible" or
"implicit".

> > On x86, for 16-bit code, you've used 16-bit signed ints for
> > 32-bit integer code? Why?! Why didn't you just use
> > address-size and operand-size overrides?
>
> Um, because the compiler is written in C and I expect it to:
> - be compilable with other (plain/normal) C compilers
> - be able to compile itself
> ?
>
> Asm code in the compiler source code will make the
> compiler itself less portable.

It's in C? I thought you emitted NASM. From your Smaller C wiki:

"Currently it generates 16-bit and 32-bit 80386+ assembly code for
NASM ..."

> How about you compile Smaller C to 16-bit, 32-bit and 64-bit
> (or maybe even 24-bit and 100-bit?) executables with compilers
> of your choice on platforms of your choice? You should be able
> to do it now and the binary should be functional (except,
> perhaps, the error reporting function error(), which has a dirty
> hack to simulate the va_something macros, and that little piece
> of code may crash the compiler if there's a compilation error
> when you compile something else with it).

Steve was attempting that. I may attempt it later on.

> > > > You mentioned that they attempt to ensure better matching
> > > > of types. What did they do about if() and switch()?
> > > > I.e., they are "overloaded" to accept int's, char's,
> > > > long's, signed and unsigned ...
> > >
> > > I'm not sure I understand the question or know the answer to
> > > it.
> >
> > What size is the "what_size_is_this_integer" integer below?
> >
> > if("what_size_is_this_integer") break;
> >
> > Isn't if() and switch() required to accept a variety of
> > integer sizes and
> >
> > types: int's, char's, long's, signed and unsigned integers?
> > (and expressions and perhaps floats too...)
> >
> > I.e., can if() and switch() truncate an ULL such as
> > 0xF000000000000000 to become a zero, or not?
> >
> > If if() and switch() are required to accept multiple integer
> > sizes and types, then they are "overloaded", yes?
>
> I'm not sure. Really, all if, while and for care about is
> whether the conditional expression evaluates to true
> (a non-zero value) or false (a zero value).

Well, yes, the expression evaluates to false/zero or not. That
result will be represented as an integer. But, what size integer?
The compiler must emit code appropriate for the size of the
integer representing the result of the expression. If the
expression is true/false, i.e., produced by logical and's and
logical or's, it's not a problem. But, if someone places a large
integer like the example above, what happens? Does the compiler
only accept 16-bit integer or 32-bit integer, i.e., native size?
Does it truncate? If the compiler can generate code for multiple
sizes, then if() and switch() are aware of the type size.

Rod Pemberton

Alexei A. Frounze

unread,

Nov 28, 2012, 4:57:43 AM11/28/12

to

On Wednesday, November 28, 2012 1:09:06 AM UTC-8, Rod Pemberton wrote:
> "Alexei A. Frounze" <alex...@gmail.com> wrote in message

>
> news:52ffcced-1d7c-4332...@googlegroups.com...
>
> > On Tuesday, November 27, 2012 4:27:02 AM UTC-8, Rod Pemberton
>
> > wrote:
>
> ...
>
>
>
> > > You can make your C smaller, if need be. I'm not going to go
>
> > > through your list of stuff you implemented, but I seem to
>
> > > recall stuff like variadic arguments, VLAs, unnamed variables
>
> > > or structs or unions, the :? operator,
>
> >
>
> > I don't have VLAs, structs/unions, ?:.
>
>
>
> Did I go the wrong link ... ?
>
> I was looking at the links posted by "dosusb" too.
>
> Perhaps, I overlooked the "... NOT supported ..." title.

You probably did overlook it.

...

> I think I lost you. I was referring to use of a typedef.

...

> But, you cannot determine from purely from the code near
>
> where "second" is used, what a token of "second" actually is. It
>
> can be an identifier, a typedef name, or an implicit int. If
>
> using tools like flex/lex or yacc/bison, this requires a "hack" to
>
> pass the relevant information to construct the dynamic symbol
>
> table.

Of course, one needs a symbol table of sorts to figure out what an identifier means. And I have one.

> When one adds in implicit int's, the ability to resolve whether a
>
> token is a typdef name or an implicit int becomes _very_
>
> complicated. So, they eliminated implicit int's for C99. (Some
>
> years ago, I attempted to create a table of all possible
>
> combinations. I ended up with 36 states and still hadn't
>
> completely resolved all interactions ...)

I do not support implicit ints either. It's too much useless trouble. I do not have intentions to support some of the ugly C89- features. I'd rather have a smaller subset of the cleaner C99.

...

> > > On x86, for 16-bit code, you've used 16-bit signed ints for
>
> > > 32-bit integer code? Why?! Why didn't you just use
>
> > > address-size and operand-size overrides?
>
> >
>
> > Um, because the compiler is written in C and I expect it to:
>
> > - be compilable with other (plain/normal) C compilers
>
> > - be able to compile itself
>
> > ?
>
> >
>
> > Asm code in the compiler source code will make the
>
> > compiler itself less portable.
>
>
>
> It's in C? I thought you emitted NASM. From your Smaller C wiki:
>
>
>
> "Currently it generates 16-bit and 32-bit 80386+ assembly code for
>
> NASM ..."

Indeed, the compiler is in C and it emits NASM-consumable assembly code, 16- or 32-bit.

If I compile my compiler with Turbo C/C++ or Open Watcom C/C++ into a 16-bit DOS .EXE, my compiler will have 16-bit ints in the .EXE. If I use this .EXE to compile some other code into 32-bit assembly code, the resultant 32-bit assembly code will be wrong in the places where 32-bit constants have been truncated because of ints being 16-bit and not 32-bit.

Look, suppose I want to compile this line of code with my compiler:

int x = 32768;

The compiler will parse 32768 using whatever int it's got because that's the biggest integer type there is. If the compiler is a 16-bit binary, 32768 becomes an int equal -32768. If the compiler is a 32-bit binary, 32768 stays 32768. And then this constant makes it into the 32-bit assembly output as either -32768 (wrong) or 32768 (correct). In most cases the difference between the two values will be noticeable.

Or, say, this is what we're compiling as 32-bit:

printf("%d\n", 32768);

The assembly code for the above will be either this (wrong):

push -32768 ; dword push, -32768 is sign-extented to 32 bits
push L1234
call _printf
sub esp, -8

or this (correct):

push 32768 ; dword push, 32768 is zero-extended to 32 bits
push L1234
call _printf
sub esp, -8

And if the constant is 65536, you'll get 0 or 65536 respectively.

So if I want 16-bit binaries of my compiler to be able to produce the same assembly output as 32-bit binaries, I need to add into the compiler's C code some 32-bit arithmetic implemented with ints, which can be as small as 16 bits. And that's the whole (or)deal.

And I've got that arithmetic implemented and tested in a separate app. Looks like it may add something around 3K of compiled code to the compiler and because of that I'm not rushing into putting it into the compiler. It's already at 53+K of 16-bit compiled code (data not counted, it's in a separate segment). I think, before I add that I should add structures. You've expressed a valid point about their importance. I think that a much more usable small C kind of C compiler should support structures while unions can be left unsupported.

...

Why do you care? The compiler can insert an implicit conversion from whatever type the conditional expression is into "_Bool", which may be same size as int, but will only be equal to 0 or 1.

I kind of do the same in my compiler, where in the output assembly code all expressions are calculated as int (chars get converted to ints when loaded from memory). if/while/for statements emit code to compare (e)ax (where the result of the evaluated conditional expression ends up) to 0 and they also emit a conditional jump, je or jne, whichever is appropriate for the statement (if/while/for).

Alex

Rod Pemberton

unread,

Nov 28, 2012, 7:00:17 AM11/28/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:35275c1c-8987-48a9...@googlegroups.com...

> On Wednesday, November 28, 2012 1:09:06 AM UTC-8, Rod Pemberton
> wrote:
> > "Alexei A. Frounze" <alex...@gmail.com> wrote in message
> > news:52ffcced-1d7c-4332...@googlegroups.com...
> > > On Tuesday, November 27, 2012 4:27:02 AM UTC-8, Rod
> > > Pemberton

...

> > > > On x86, for 16-bit code, you've used 16-bit signed ints
> > > > for 32-bit integer code? Why?! Why didn't you just use
> > > > address-size and operand-size overrides?
>
> > > Um, because the compiler is written in C and I expect it to:
> > > - be compilable with other (plain/normal) C compilers
> > > - be able to compile itself
>

> > > Asm code in the compiler source code will make the
> > > compiler itself less portable.
>
> > It's in C? I thought you emitted NASM. From your Smaller C
> > wiki:
>
> > "Currently it generates 16-bit and 32-bit 80386+ assembly code
> > for NASM ..."
>
> Indeed, the compiler is in C and it emits NASM-consumable
> assembly code, 16- or 32-bit.

...

> If I compile my compiler with Turbo C/C++ or Open Watcom
> C/C++ into a 16-bit DOS .EXE, my compiler will have 16-bit
> ints in the .EXE. If I use this .EXE to compile some other code
> into 32-bit assembly code, the resultant 32-bit assembly code
> will be wrong in the places where 32-bit constants have been
> truncated because of ints being 16-bit and not 32-bit.

...

> Look, suppose I want to compile this line of code with my
> compiler:
>
> int x = 32768;
>
> The compiler will parse 32768 using whatever int it's got
> because that's the biggest integer type there is. If the
> compiler is a 16-bit binary, 32768 becomes an int equal
> -32768. If the compiler is a 32-bit binary, 32768 stays 32768.
> And then this constant makes it into the 32-bit assembly output
> as either -32768 (wrong) or 32768 (correct). In most cases the
> difference between the two values will be noticeable.

If the value is greater than 65535, then I can see you having a
truncation issue using your current numeric conversion method.

I probably would've attempted to convert the entire value as a
text string to hexadecimal without being limited by an integer
size. If you parse it "all-at-once", then you're limited to the
integer size that the executable supports, as you stated you are.
If you do it incrementally, i.e., byte-at-time, then you should be
able to convert numbers larger in size, unlimited actually. For
hexadecimal, this is very easy. You can do two hex digits, i.e.,
one byte, until you run out of hex digits, for an unlimited
length, as long as you have area of bytes to stack/store every two
digits ... For decimal, it's a bit more complicated. You have to
keep track of how much of each digit goes to the current byte or
next byte.

The signed and unsigned representations for x86 are equivalent in
binary. Hexadecimal is sufficient to specify an exact binary
representation of any integer. In many cases, NASM will emit the
appropriate size for the code based on BITS 16 or BITS 32 and
whether the value needs an override to use mixed-mode code. E.g,,
for 32768 as hex 0x8000 or 08000h, NASM will emit 0x8000 for
16-bit code and 0x00008000 for 32-bit code. E.g., for 131071 as
hex 0x1FFFF or 01FFFFh, NASM will emit 0x0001FFFF for 16-bit code
inserting an operand size override (o32 - 66h) and will emit
0x0001FFFF for 32-bit code. Unfortunately, NASM doesn't seem to
emit an override in 16-bit code for an oversized PUSH. I'm not
sure about MOV or other instructions.

Whether you're limited or not, depends on how you parse and
convert the number to binary.

> Or, say, this is what we're compiling as 32-bit:
>
> printf("%d\n", 32768);
>
> The assembly code for the above will be either this (wrong):
>
> push -32768 ; dword push, -32768 is sign-extented to 32 bits

For both "wrong" and "correct", I'd think you'd want to use
hexadecimal, one or the other:

; NASM should compile the appropriate size PUSH
; with small value PUSH of 32768
; BITS 16 or BITS 32 sets the code size
; i.e., push 16-bit for 16-bit mode code
; i.e., push 32-bit for 32-bit mode code

push 0x8000
push 08000h

; NASM needs an o32 for BITS 16
; with large value PUSH of 131071

%define "your size define"
%ifdef "your size define"
BITS 16
%else
BITS 32
%endif

%ifdef "your size define"
o32
%endif
push 0x1FFFF
%ifdef "your size define"
o32
%endif
push 01FFFFh

I think MOV's should probably do o32 automatically, but I didn't
confirm. You'll have to check the NASM manual to see if there is
an easier method to determine BITS 16 or BITS 32.

> So if I want 16-bit binaries of my compiler to be able to
> produce the same assembly output as 32-bit binaries, I
> need to add into the compiler's C code some 32-bit arithmetic
> implemented with ints, which can be as small as 16 bits.
> And that's the whole (or)deal.

Ok. I wasn't really considering input, only output ... At this
point, I think you're limited by the number conversion method you
used.

> Why do you care [if "if() and switch() are aware of the type
> size"]?

Well, you mentioned that Go attempted to fix type conversion
errors by requiring the programmer to be more specific. The
situations mentioned had fewer automatic type conversions. if's
and switches are a situation where automatic conversions seem to
be needed.

Rod Pemberton

Alexei A. Frounze

unread,

Nov 28, 2012, 12:55:15 PM11/28/12

to

It's not only that. I do constant folding, which requires 32-bit arithmetic for 32-bit C input / 32-bit assembly output.

> > Why do you care [if "if() and switch() are aware of the type
>
> > size"]?
>
>
>
> Well, you mentioned that Go attempted to fix type conversion
>
> errors by requiring the programmer to be more specific. The
>
> situations mentioned had fewer automatic type conversions. if's
>
> and switches are a situation where automatic conversions seem to
>
> be needed.

Yep. And that does not surprise me.

Alex

s_dub...@yahoo.com

unread,

Nov 28, 2012, 1:28:58 PM11/28/12

to

Good! works for me on debian too. Don't forget to update the .gz, I pulled this off raw.github by cut and paste.

Congrats,

Steve

Alexei A. Frounze

unread,

Nov 28, 2012, 1:53:32 PM11/28/12

to

Cool!

As for the .gz/.zip, I think it's github's duty to "update" it. I've never created it in the first place.

Alex

s_dub...@yahoo.com

unread,

Nov 28, 2012, 2:57:17 PM11/28/12

to

Note-
http://en.wikipedia.org/wiki/LALR_parser

"
Because the LALR parser performs a right derivation instead of the more intuitive left derivation, understanding how it works is quite difficult. This makes the process of finding a correct and efficient LALR grammar very demanding and time consuming. For the same reason error reporting can be quite hard because LALR parser errors cannot always be interpreted into messages meaningful for the end user. For this reason the recursive descent parser is sometimes preferred over the LALR parser. This parser requires more hand-written code because of less language recognition power. However, it does not have the special difficulties of the LALR parser because it performs left-derivation. Notable examples of this phenomenon are the C and C++ parsers of GCC. They started as LALR parsers but were later changed to recursive descent parsers.

AIUI the result of the expression, for if() and case:, is an int. Floats are not an option, and char and long would be cast to an int before the evaluation of the expression. ISTM, the rule of the cast would depend additionally on whether char is signed or unsigned.

> > > >
>
> > > > I'm not sure I understand the question or know the answer to
>
> > > > it.
>
> > >
>
> > > What size is the "what_size_is_this_integer" integer below?
>
> > >
>
> > > if("what_size_is_this_integer") break;
>
> > >
>
> > > Isn't if() and switch() required to accept a variety of
>
> > > integer sizes and
>
> > >
>
> > > types: int's, char's, long's, signed and unsigned integers?
>
> > > (and expressions and perhaps floats too...)
>
> > >
>
> > > I.e., can if() and switch() truncate an ULL such as
>
> > > 0xF000000000000000 to become a zero, or not?
>
> > >
>
> > > If if() and switch() are required to accept multiple integer
>
> > > sizes and types, then they are "overloaded", yes?
>
> >
>
> > I'm not sure. Really, all if, while and for care about is
>
> > whether the conditional expression evaluates to true
>
> > (a non-zero value) or false (a zero value).
>
>

I think Bool was an effort to clear up this dark corner.

>
> Well, yes, the expression evaluates to false/zero or not. That
>
> result will be represented as an integer. But, what size integer?
>
> The compiler must emit code appropriate for the size of the
>
> integer representing the result of the expression. If the
>
> expression is true/false, i.e., produced by logical and's and
>
> logical or's, it's not a problem. But, if someone places a large
>
> integer like the example above, what happens? Does the compiler
>
> only accept 16-bit integer or 32-bit integer, i.e., native size?
>
> Does it truncate? If the compiler can generate code for multiple
>
> sizes, then if() and switch() are aware of the type size.
>
>
>
>
>
> Rod Pemberton

C is messy. Say what you will about Pascal, at least it has a clean EBNF, but no one (hardly) uses it.

I'll share this recent find's snippet, check out the semicolon-empty statement in the 'while()'.

int scan (char c)
{
int tmpval;

switch (c)
{
case '{': /* keyword IF */
valsptr = valuestack;
opsptr = operatorstack;

while ((c = *bufptr++) != NEWLINE && c != END
&& flag == GREEN && scan(c)) ;

..omitted remainder..
evaluation of 'while()' causes a recursive call to scan(c) until one of the checks fail; - the empty statement ';' holds the while statement together. Hopefully, the order of evaluation is the same as what the programmer envisioned, for 'c'.

Steve

Alexei A. Frounze

unread,

Nov 28, 2012, 4:06:53 PM11/28/12

to

On Wednesday, November 28, 2012 11:57:17 AM UTC-8, s_dub...@yahoo.com wrote:
...
> AIUI the result of the expression, for if() and case:, is an int. Floats are not an option, and char and long would be cast to an int before the evaluation of the expression. ISTM, the rule of the cast would depend additionally on whether char is signed or unsigned.

The conditional expression has to be evaluated first. And then compared to 0. And only then cast to int. The expression may contain floats and pointers and whatnot. As long as it does not evaluate to an empty or void expression or a structure/union, it's good for if() because it can be compared to 0 and cast to int.

...

> I'll share this recent find's snippet, check out the semicolon-empty statement in the 'while()'.
>
>
>
>
>
> int scan (char c)
>
> {
>
> int tmpval;
>
>
>
> switch (c)
>
> {
>
> case '{': /* keyword IF */
>
> valsptr = valuestack;
>
> opsptr = operatorstack;
>
>
>
> while ((c = *bufptr++) != NEWLINE && c != END
>
> && flag == GREEN && scan(c)) ;
>
>
>
> ..omitted remainder..
>
> evaluation of 'while()' causes a recursive call to scan(c) until one of the checks fail; - the empty statement ';' holds the while statement together. Hopefully, the order of evaluation is the same as what the programmer envisioned, for 'c'.
>

The order of evaluation of the operands of && and || is well defined by the C standard, it's left to right. The compiler can reorder them IFF it can prove that reordering results no difference in observable side effects. Here, for example, it may be able to reorder evaluation of 'c != END' and 'flag == GREEN' with respect to each other.

Alex

Rod Pemberton

unread,

Nov 29, 2012, 3:52:45 AM11/29/12

to

<s_dub...@yahoo.com> wrote in message
news:15dfe7d8-3a60-4aa8...@googlegroups.com...

> On Wednesday, November 28, 2012 3:09:06 AM UTC-6, Rod Pemberton
> wrote:

...

> > I'm not real clear on what LALR(1) is other
> > than it's a type of parsing with lookahead of one character.
> >

> Note- [link to Wikipedia]

> "
> Because the LALR parser performs a right derivation instead of
> the more intuitive left derivation, understanding how it works
> is quite difficult. This makes the process of finding a correct
> and efficient LALR grammar very demanding and time consuming.
> For the same reason error reporting can be quite hard because
> LALR parser errors cannot always be interpreted into messages
> meaningful for the end user. For this reason the recursive
> descent parser is sometimes preferred over the LALR parser. This
> parser requires more hand-written code because of less language
> recognition power. However, it does not have the special
> difficulties of the LALR parser because it performs
> left-derivation. Notable examples of this phenomenon are the C
> and C++ parsers of GCC. They started as LALR parsers but were
> later changed to recursive descent parsers.
> "

I'll reveal a secret of mine...

For an experimental C parser of mine - one of many - I use a
sliding window of three characters. The middle character is where
the input stream is being parsed. The other two allow for
look-ahead and look-behind. So, you have the power or flexibility
of look-ahead and look-behind at the same time! This is very easy
to implement too. You only need three characters and a couple of
assignments in a read loop. The assignments are to perform the
character shift. I'm not sure what they'd name it, perhaps,
LALBLR(1)(1) ... ? ;-)

> C is messy. Say what you will about Pascal, at least it has a
> clean EBNF, but no one (hardly) uses it.

Perhaps, the Pascal EBNF should be reworked for near C
or almost C or something with the appearance of C ...

:-)

> I'll share this recent find's snippet, check out the
> semicolon-empty statement in the 'while()'.
>
> int scan (char c)
> {
> int tmpval;
>
> switch (c)
> {
> case '{': /* keyword IF */
> valsptr = valuestack;
> opsptr = operatorstack;
>
> while ((c = *bufptr++) != NEWLINE && c != END
> && flag == GREEN && scan(c)) ;
>
> ..omitted remainder..
> evaluation of 'while()' causes a recursive call to scan(c) until
> one of the checks fail; - the empty statement ';' holds the
> while statement together. Hopefully, the order of evaluation
> is the same as what the programmer envisioned, for 'c'.

IMO, for()'s without a body are far more common, but I also have
some while()'s without them too. They're convenient when
searching for something, e.g., string for a character, or
linked-list for a node, etc. Once the loop has found what is
being searched for, then the code continues. Also, you'll see
them when waiting for something too, like port I/O.

Rod Pemberton

Benjamin David Lunt

unread,

Dec 1, 2012, 7:57:00 PM12/1/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:35275c1c-8987-48a9...@googlegroups.com...

On Wednesday, November 28, 2012 1:09:06 AM UTC-8, Rod Pemberton wrote:

> I think, before I add that I should add structures. You've expressed
> a valid point about their importance. I think that a much more usable
> small C kind of C compiler should support structures while unions can
> be left unsupported.

Hi guys,

I have been following this thread with much interest. I have looked over
your code and have added a few things myself. For example, it will be
very easy for you to add #include files.

Under the '#define found' part of your code in GetToken(), add something
like:

} else if (!strcmp(TokenIdentName, "include")) {
FILE *fin;
int tok;

SkipSpace(0, 0);
tok = GetToken();
if (tok == tokLitStr) {
if (InclFlag == INCMAX)
error("Include file nested to deep");

if ((fin = fopen(GetTokenValueString(), "rt")) == NULL)
error("Cannot open file \"%s\"\n", GetTokenValueString());

InclFlag++;
strcpy(inc_stack[InclFlag].filename, GetTokenValueString());
inc_stack[InclFlag].fin = fin;
inc_stack[InclFlag].linenum = 1;
inc_stack[InclFlag].linepos = 1;
inc_stack[InclFlag].CharQueueLen = 0;

// populate CharQueue[] with the initial file characters
ShiftChar();
p = inc_stack[InclFlag].CharQueue;
} else
error("Unknown parameter for #included");

Notice that I now use a structure array for the specific items.
You need to have a FILE *, linenum, linepos, and Queue and QueueLen
for each file, including the main .c file. Please also note that
you must update *p on invoke

// at end of GetToken()
// found end of file
// if nested includes, go back to last one and continue on.
fclose(inc_stack[InclFlag].fin);
if (InclFlag > 0) {
InclFlag--;
p = inc_stack[InclFlag].CharQueue;
} else
break;

and then also restore it at end of file as in the above code. Notice
that I moved the fclose() out of ShiftChar() and into GetToken().

You also need a struct in your main.c file.

int InclFlag = 0;
struct incstack {
FILE *fin;
int linenum;
int linepos;
char filename[128];
char CharQueue[MAX_CHAR_QUEUE_LEN];
int CharQueueLen;
} inc_stack[INCMAX+1]; // first is for base filename, etc.

Please note that this no longer makes it self-compilable though, however
I am sure you can come up with something else.

There are a few other changes I needed to make for all of this to work,
but the above code saves the floating queue and comes back to it at end
of the included file. This code also allows up to INCMAX nested includes,
currently set at 10, but could be up to 100's if memory allows.

Anyway, just a few small comments. I also added the #pragma keyword
to set different items so that you can do it during compile time, not
at the command line.

Anyway, just my current thoughts. I have been following this thread
with great interest and anticipate updates daily. :-)

Thanks,
Ben

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Forever Young Software
http://www.fysnet.net/index.htm
http://www.fysnet.net/The_Universal_Serial_Bus.htm
To reply by email, please remove the zzzzzz's

Batteries not included, some assembly required.

Alexei A. Frounze

unread,

Dec 1, 2012, 9:47:42 PM12/1/12

to

On Saturday, December 1, 2012 4:57:00 PM UTC-8, Benjamin David Lunt wrote:
> "Alexei A. Frounze" wrote in message

>
> news:35275c1c-8987-48a9...@googlegroups.com...
>
> On Wednesday, November 28, 2012 1:09:06 AM UTC-8, Rod Pemberton wrote:
>
>
>
> > I think, before I add that I should add structures. You've expressed
>
> > a valid point about their importance. I think that a much more usable
>
> > small C kind of C compiler should support structures while unions can
>
> > be left unsupported.
>
>
>
> Hi guys,
>
>
>
> I have been following this thread with much interest. I have looked over
>
> your code and have added a few things myself. For example, it will be
>
> very easy for you to add #include files.

Hi Ben!

Nice to see you guys trying it out!

I will soon add support for #include. And until I get structures supported, it will still be structure-less code.

The I/O-related code will have to be tweaked to allow for the differentiation between #include <> and #include "".

Also, as soon as we get the #include, living without #if/#ifdef/#if defined() is going to be uncomfortable. :) So, some of those should be added as well.

Alex

Benjamin David Lunt

unread,

Dec 1, 2012, 11:04:21 PM12/1/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:ecc3a038-3ff4-4f4c...@googlegroups.com...

> I will soon add support for #include. And until I get structures
> supported, it will still be structure-less code.
>
> The I/O-related code will have to be tweaked to allow for the
> differentiation between #include <> and #include "".
>
> Also, as soon as we get the #include, living without #if/#ifdef/#if
> defined() is going to be uncomfortable. :) So, some of those should be
> added as well.
>

The #if/#else/#endif, etc., is actually quite easy to implement. You simply
need an integer that starts at one, and as long as this integer is not zero,
output text to the target file. No need to skip over C source code. Simply
test do_out for non zero before sending to the out file. Continue to parse
tokens, translate code, etc, only checking for this flag to be non zero
before sending the compiled code to the out file. This also works with
nested #if blocks.

The do_out flag is changed with the following no matter the depth of the
nested #if blocks:

if an #if (TRUE) is found, do_out is set to one.
if an #if (FALSE) is found, set do_out to zero.
if an #elseif (TRUE) is found, set do_out to one.
if an #elseif (FALSE) is found, set do_out to zero.
if an #else is found, decrement do_out.
if an #endif is found, set do_out to one.

If I remember correctly, this works every time. However, it has been a few
years since I went over my code. I used this technique in NBASM and it
has worked so far. If someone finds a flaw in this technique, please
let me know, I may have forgotten something.

Anyway, this should be pretty simply to implement if you don't mind the
wasted time parsing code that is in a FALSE #if block.

Alexei A. Frounze

unread,

Dec 2, 2012, 12:42:18 AM12/2/12

to

On Saturday, December 1, 2012 8:04:21 PM UTC-8, Benjamin David Lunt wrote:
> "Alexei A. Frounze" wrote in message

#if/#elseif expect an expression, not just a number or a macro expanding to a number.

I'll begin with #if(n)def.

Alex

Benjamin David Lunt

unread,

Dec 2, 2012, 10:28:30 AM12/2/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:16824e0b-2eda-4bda...@googlegroups.com...

> #if/#elseif expect an expression, not just a number or a macro expanding
> to
> a number.
>
> I'll begin with #if(n)def.

Indeed. Sorry if you feel I am telling you what and how to do it.
Besides filesystems, and of course, USB, I very much enjoy compiler
design.

Thanks for your work, and I still am expecting..., uhh, I mean am
anticipating daily updates. :-)

Alexei A. Frounze

unread,

Dec 2, 2012, 4:15:06 PM12/2/12

to

On Sunday, December 2, 2012 7:28:30 AM UTC-8, Benjamin David Lunt wrote:
> "Alexei A. Frounze" wrote in message

>
> news:16824e0b-2eda-4bda...@googlegroups.com...
>
>
>
> > #if/#elseif expect an expression, not just a number or a macro expanding
>
> > to
>
> > a number.
>
> >
>
> > I'll begin with #if(n)def.
>
>
>
> Indeed. Sorry if you feel I am telling you what and how to do it.
>
> Besides filesystems, and of course, USB, I very much enjoy compiler
>
> design.
>
>
>
> Thanks for your work, and I still am expecting..., uhh, I mean am
>
> anticipating daily updates. :-)

You're very impatient. Santa may not like it. :)

Alex

Alexei A. Frounze

unread,

Dec 7, 2012, 6:42:20 AM12/7/12

to

On Saturday, December 1, 2012 4:57:00 PM UTC-8, Benjamin David Lunt wrote:
> "Alexei A. Frounze" wrote in message

#include's been added.

#if(n)def/#else/#endif are baking.

Alex

Benjamin David Lunt

unread,

Dec 7, 2012, 10:37:32 AM12/7/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:00b1d9b6-8270-435c...@googlegroups.com...

>
> #include's been added.
>
> #if(n)def/#else/#endif are baking.
>

That was a *long* day. Santa comes by my place more frequently than
that... :-)

I will have a look to see how you implemented them. It will be
interesting to see the difference between yours and mine.

Thanks,
Ben

Alexei A. Frounze

unread,

Dec 7, 2012, 3:31:39 PM12/7/12

to

On Friday, December 7, 2012 7:37:32 AM UTC-8, Benjamin David Lunt wrote:
> "Alexei A. Frounze" wrote in message

>
> news:00b1d9b6-8270-435c...@googlegroups.com...
>
> >
>
> > #include's been added.
>
> >
>
> > #if(n)def/#else/#endif are baking.
>
> >
>
>
>
> That was a *long* day. Santa comes by my place more frequently than
>
> that... :-)

What I initially thought would take a day or two max, took more as I had outsmarted myself and then some other things came up to interrupt the work. :)

> I will have a look to see how you implemented them. It will be
>
> interesting to see the difference between yours and mine.

I had some three (no, that's not threesome!:) implementations, all of which I scrapped as they weren't covering some case here or there or were unnecessarily complicating things elsewhere. Simplicity seemed elusive until last night when, I think, I finally thought through the problem and coded up the solution.

Alex
P.S. Santa's visit time hasn't yet come. :)

Benjamin David Lunt

unread,

Dec 8, 2012, 11:17:52 PM12/8/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:15a5abbd-8465-4263...@googlegroups.com...

:-)

A question:

+// Data structures to support #include
+int FileCnt = 0;
+char FileNames[MAX_INCLUDES][MAX_FILE_NAME_LEN + 1];
+FILE* Files[MAX_INCLUDES];
+char CharQueues[MAX_INCLUDES][3];
+int LineNos[MAX_INCLUDES];
+int LinePoss[MAX_INCLUDES];

+ // store the including file's position and buffered chars
+ LineNos[FileCnt - 1] = LineNo;
+ LinePoss[FileCnt - 1] = LinePos;
+ memcpy(CharQueues[FileCnt - 1], CharQueue, CharQueueLen);

Instead of doing the above, storing the current status of the
Queue, lineno, and pos, why can't you simply do something like:

// main c file initialization
FileCnt = 0;
if (OpenNewFile(FileCnt, command_line_parameter))
FileCnt++;
else
error

...

// Found #include 'asdfasdf'
if (OpenNewFile(FileCnt, 'asdfasdf'))
FileCnt++;
else
error

...

// Open a file
int OpenNewFile(const int indx, const char *filename) {
strcpy(FileNames[indx], filename);
Files[indx] = fopen(filename);
memset(CharQueues[indx], 0, 3);
CharQuePos[indx] = 0;
LineNos[indx] = 0;
LinePoss[indx] = 0;
ShiftChar();
return (TRUE if no error);
}

No need to "save" the current state. You simple "leave" the current
state behind, create a new state, and go from there. At end of file,
simply

int EndOfFiles(void)
FileCnt--;
if (FileCnt < 0)
done, exit, finish, complete, insert fork, etc.;
}

and you are back to exactly were you where when you found the #include.

Just a comment asking why. When I manipulated your code a bit to
support #include's earlier, this is what I did. Just asking why.

Ben

Benjamin David Lunt

unread,

Dec 8, 2012, 11:20:48 PM12/8/12

to

"Benjamin David Lunt" <zf...@fysnet.net> wrote in message
news:ka13e3$m1r$1...@speranza.aioe.org...

>
> Just a comment asking why. When I manipulated your code a bit to
> support #include's earlier, this is what I did. Just asking why.

P.S. Don't take my comments as criticism, I am curious why you
saved and restored the current state, rather than simply doing a
"push"/"pop" style state. :-)

Alexei A. Frounze

unread,

Dec 9, 2012, 6:24:06 AM12/9/12

to

On Saturday, December 8, 2012 8:20:48 PM UTC-8, Benjamin David Lunt wrote:
> "Benjamin David Lunt" <zf...@fysnet.net> wrote in message
>
> news:ka13e3$m1r$1...@speranza.aioe.org...
>
> >
>
> > Just a comment asking why. When I manipulated your code a bit to
>
> > support #include's earlier, this is what I did. Just asking why.
>
>
>
> P.S. Don't take my comments as criticism, I am curious why you
>
> saved and restored the current state, rather than simply doing a
>
> "push"/"pop" style state. :-)

First, some intro.

The char queue (that's used in GetToken()) always contains at least 3 chars (to allow simple parsing of 3-char-long operators and punctuators such as "<<=" and "..."). The queue can grow bigger if a macro gets expanded, because it gets expanded back into the queue. If the input file does not have any more characters, the queue gets padded with '\0' until there're at least 3 chars again.

This is the design as it was intended originally and crystallized with the latest change.

Now, when GetToken() gets to parse something like
"#include <string.h>\n#include <stdlib.h>\n",
parsing of the first #include ends right before "\n" (AFAIR) and the queue at that point has "\n#i" in it. I can't drop these queued chars there as these lines of yours seem to be suggesting:

memset(CharQueues[indx], 0, 3);

...
ShiftChar();
If I do so, when GetToken() comes back to the including file, it will have to parse "nclude <stdlib.h>\n", which isn't what it's supposed to parse. Of course, if there are enough new line characters or spaces after "#include <string.h>\n", it will work even if some of them get lost.

If you're asking about either of the following:

- why I still have LineNo and LinePos even though they're kind of duplicated in LineNos[] and LinePoss[]

- why I don't have a function for file opening

then the answer is kind of simple. I'm already planning a number of code clean-up sweeps and these parts will have their time. Also, I'm planning to make the file I/O indirect (via function pointers), so the compiler can be used as a library in a different program.

I hope that makes some sense and answers your questions.

Alex

Benjamin David Lunt

unread,

Dec 9, 2012, 12:47:55 PM12/9/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:93e82eb7-f353-49bb...@googlegroups.com...

On Saturday, December 8, 2012 8:20:48 PM UTC-8, Benjamin David Lunt wrote:
> "Benjamin David Lunt" <zf...@fysnet.net> wrote in message
>
> news:ka13e3$m1r$1...@speranza.aioe.org...
>

>First, some intro.
>
>The char queue (that's used in GetToken()) always contains at least
> 3 chars (to allow simple parsing of 3-char-long operators and
> punctuators such as "<<=" and "..."). The queue can grow bigger
> if a macro gets expanded, because it gets expanded back into the
> queue. If the input file does not have any more characters, the>
> queue gets padded with '\0' until there're at least 3 chars again.
>
>This is the design as it was intended originally and crystallized
>with the latest change.
>
>Now, when GetToken() gets to parse something like
>"#include <string.h>\n#include <stdlib.h>\n",
>parsing of the first #include ends right before "\n" (AFAIR) and
>the queue at that point has "\n#i" in it. I can't drop these queued
> chars there as these lines of yours seem to be suggesting:
> memset(CharQueues[indx], 0, 3);
> ...
> ShiftChar();
>
>If I do so, when GetToken() comes back to the including file, it
>will have to parse "nclude <stdlib.h>\n", which isn't what it's
>supposed to parse. Of course, if there are enough new line
>characters or spaces after "#include <string.h>\n", it will work
>even if some of them get lost.

But if you look a little closer, it is not clearing out the \n#i, it is
actually making sure that the *next* queue is cleared before I start
to use it. Remember, indx now is 1, not zero.

>If you're asking about either of the following:
>
>- why I still have LineNo and LinePos even though they're kind of
>duplicated in LineNos[] and LinePoss[]
>
>- why I don't have a function for file opening
>
>then the answer is kind of simple. I'm already planning a number
>of code clean-up sweeps and these parts will have their time. Also,
>I'm planning to make the file I/O indirect (via function pointers),
>so the compiler can be used as a library in a different program.

I had figured something along these lines.

>I hope that makes some sense and answers your questions.

Anyway, please have a look at my previous post again.
At first call to OpenFile(), indx = 0. After the file is opened,
indx is now 1. However, all of the:

+// Data structures to support #include
+int FileCnt = 0;
+char FileNames[MAX_INCLUDES][MAX_FILE_NAME_LEN + 1];
+FILE* Files[MAX_INCLUDES];
+char CharQueues[MAX_INCLUDES][3];
+int LineNos[MAX_INCLUDES];
+int LinePoss[MAX_INCLUDES];

items still use zero a zero index:

FileCnt = 1
CurFileIndx = 0;

Then when a new #include is found

FileCnt = 2
CurFileIndx = 1

In otherwords, FileCnt holds the count of nested open files which
also, when used as an index, points to the *next* empty array while
CurFileIndx is a zero based index into the currently used array.
Then, there is no need to save/restore Queues. BTW,

+char CharQueues[MAX_INCLUDES][3];

will need to be

+char CharQueues[MAX_INCLUDES][MAXQUEUESIZE+1];

Does this make sense?

Thanks,
Ben

Alexei A. Frounze

unread,

Dec 9, 2012, 5:10:53 PM12/9/12

to

On Sunday, December 9, 2012 9:47:55 AM UTC-8, Benjamin David Lunt wrote:
> "Alexei A. Frounze" wrote in message

I missed that bit, you're right.

Right.

> Then, there is no need to save/restore Queues. BTW,
>
>
>
> +char CharQueues[MAX_INCLUDES][3];
>
>
>
> will need to be
>
>
>
> +char CharQueues[MAX_INCLUDES][MAXQUEUESIZE+1];
>
>
>
> Does this make sense?

Not quite yet. I could use "fullsize" queues at every level of file inclusion and do away with the single CharQueue[MAX_CHAR_QUEUE_LEN], but I didn't like the seemingly wasted space.

Turns out, it's really wasted, if every queue is fullsize.

You see, there are 3 chars in CharQueue[] at all times, unless a macro gets expanded.

For example, given

"#define TEN 10"

and then later on something like

"int x = TEN, y = x;"

CharQueue[] is ", y" (3 chars) before TEN is expanded and
CharQueue[] is "10, y" (5 chars) after TEN is expanded.

But you aren't supposed to ever see more than 3 chars in CharQueue[] when you #include, unless you do something pretty weird and illegal like:

#define INCLUDE #include "foo" bar();
INCLUDE

in which case CharQueue[] swells and at first becomes

"#include \"foo\" bar();"

and then, after "#include \"foo\" " is consumed,

"bar();"

which gives you 6 chars in CharQueue[] to save/restore.

But you aren't supposed to do things like that in the first place, they are illegal:

#define INCLUDE #include // can't put #include inside macro body
#include "foo" bar(); // can't write code after #include, comments OK though

If you find a legal situation where CharQueue[] can contain more than 3 chars after "#include somefile" has been extracted, then this is a bug. Until then, 3 chars per including file is the maximum amount of text that needs to be saved/restored.

Or did I not understand you again?

Alex

Benjamin David Lunt

unread,

Dec 9, 2012, 9:09:46 PM12/9/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:c60c80ea-ba83-4807...@googlegroups.com...

No, you pretty much got exactly what I was talking about. The way
I did it, yes, it would be a full size QUEUE each time. However,
with only 256 bytes (average) in each queue, I wasn't to worried about
wasted space. I can now finally use 4 bytes for the date, instead of
the original 2. :-)

Anyway, I have enjoyed (and will enjoy) reading your code and your
progress. It has sparked my interest in picking up my compiler and
continue working on it. I have made great progress in the last few
weeks (as time allows). Thanks for that spark.

Ben

Alexei A. Frounze

unread,

Dec 9, 2012, 10:28:02 PM12/9/12

to

On Sunday, December 9, 2012 6:09:46 PM UTC-8, Benjamin David Lunt wrote:
> "Alexei A. Frounze" wrote in message

I'm trying to keep the thing small.

What date, YYYY/MM/DD? What about it?

> Anyway, I have enjoyed (and will enjoy) reading your code and your
>
> progress. It has sparked my interest in picking up my compiler and
>
> continue working on it. I have made great progress in the last few
>
> weeks (as time allows). Thanks for that spark.

You're welcome. :)

Alex

Benjamin David Lunt

unread,

Dec 9, 2012, 10:50:35 PM12/9/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:c4755646-73fb-4357...@googlegroups.com...

>> No, you pretty much got exactly what I was talking about. The way
>> I did it, yes, it would be a full size QUEUE each time. However,
>> with only 256 bytes (average) in each queue, I wasn't to worried about
>> wasted space. I can now finally use 4 bytes for the date, instead of
>> the original 2. :-)
>
> I'm trying to keep the thing small.
>
> What date, YYYY/MM/DD? What about it?

:-) It is a "Bill Gates: 'We only need two bytes for the year'",
comment. With today's hardware, a single byte can use up to 4096 bytes.

Alexei A. Frounze

unread,

Dec 10, 2012, 4:38:51 AM12/10/12

to

#undef, #ifdef, #ifndef, #else and #endif have been added.

Now we should be able to use include files for real, woo hoo!

Alex

Alexei A. Frounze

unread,

Dec 12, 2012, 10:01:30 PM12/12/12

to

I've added the "-I dir" option to specify search paths for include files (required for <stdio.h>-kind of headers as their path isn't built-in).

Alex

Benjamin David Lunt

unread,

Dec 12, 2012, 10:42:20 PM12/12/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:76341bbf-1091-46e6...@googlegroups.com...

Alex,

<off topic>
If I may, may I ask why you include all of the previous posts in your two
line post? i.e.: Since yours (or somewhere in-between yours and my)
newsreader
adds double Carriage Returns for every line, I scrolled through almost 100
lines just to get to the two lines you quoted above. Is this a newsreader
problem, the app you are using to post, my newsreader, Google Groups, ?????
</off topic>

I have not added the difference between #include "" and <> yet in my code.
I have parsed the difference, but currently only combine the two and search
the current path or the path given. I think I will add this code next. I
do have the "-I dir" code already done, now I just need to start using it
:-)

As always, enjoy reading your updates to see your ideas on doing the same
thing I am doing. :-)

Alexei A. Frounze

unread,

Dec 13, 2012, 12:22:18 AM12/13/12

to

On Wednesday, December 12, 2012 7:42:20 PM UTC-8, Benjamin David Lunt wrote:
> Alex,
>
>
>
> <off topic>
>
> If I may, may I ask why you include all of the previous posts in your two
>
> line post? i.e.: Since yours (or somewhere in-between yours and my)
>
> newsreader
>
> adds double Carriage Returns for every line, I scrolled through almost 100
>
> lines just to get to the two lines you quoted above. Is this a newsreader
>
> problem, the app you are using to post, my newsreader, Google Groups, ?????
>
> </off topic>

At the moment I'm using groups.google.com to post. I don't know of a better way now that NNTP is not carried by many (most?) ISPs and my favorite (or much accustomed to) Outlook Express NNTP client is replaced by a very optional knockoff app called Windows Live Mail (presumably based on the same code base with a number of features removed or broken). Since I'm not doing heavy newsgroup reading/posting, I can live with groups.google.

I'm not doing anything special, nothing, in fact, so, if my posts come out odd-looking, award the blame to google, they need some.

I'll poke around to see if there's anything that can be tweaked on the website, though I don't have much hope there is.

> I have not added the difference between #include "" and <> yet in my code.
>
> I have parsed the difference, but currently only combine the two and search
>
> the current path or the path given. I think I will add this code next. I
>
> do have the "-I dir" code already done, now I just need to start using it
>
> :-)

Um, are you hacking away your own flavor of Smaller C? Do you mind sharing the features you've added that I haven't yet? I might include some of them.

> As always, enjoy reading your updates to see your ideas on doing the same
>
> thing I am doing. :-)

It's not only to let you know of what's happening but also to possibly give you a chance to get the new feature without needing you to implement it (unless you're so eager and can't wait:) and waste the effort. I know, I know, Small C has been done and redone and there are many C compilers out there, so this effort of mine is a waste to a certain degree as well. :) But it's a fun experience nonetheless.

Alex

Alexei A. Frounze

unread,

Dec 13, 2012, 12:27:03 AM12/13/12

to

On Dec 12, 7:42 pm, "Benjamin David Lunt" <zf...@fysnet.net> wrote:
> "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in messagenews:76341bbf-1091-46e6...@googlegroups.com...

>
> > On Monday, December 10, 2012 1:38:51 AM UTC-8, Alexei A. Frounze wrote:
>
> >> Now we should be able to use include files for real, woo hoo!
>
> > I've added the "-I dir" option to specify search paths for include files
> > (required for <stdio.h>-kind of headers as their path isn't built-in).
>
> Alex,
>
> <off topic>
> If I may, may I ask why you include all of the previous posts in your two
> line post? i.e.: Since yours (or somewhere in-between yours and my)
> newsreader
> adds double Carriage Returns for every line, I scrolled through almost 100
> lines just to get to the two lines you quoted above. Is this a newsreader
> problem, the app you are using to post, my newsreader, Google Groups, ?????
> </off topic>

I've just switched back to the old interface. Looks like they fucked
up the new one and that's where the empty lines are coming from. New !
= (Good || Better).

Alex

Benjamin David Lunt

unread,

Dec 13, 2012, 1:23:26 AM12/13/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:03b08f05-9bc0-4d8c...@googlegroups.com...

On Wednesday, December 12, 2012 7:42:20 PM UTC-8, Benjamin David Lunt wrote:
>> Alex,
>>
>

>I'll poke around to see if there's anything that can be tweaked on the
>website,
>though I don't have much hope there is.

Looks much better.

>> I have not added the difference between #include "" and <> yet in my
>> code.
>> I have parsed the difference, but currently only combine the two and
>> search
>> the current path or the path given. I think I will add this code next.
>> I
>> do have the "-I dir" code already done, now I just need to start using it
>> :-)
>
>Um, are you hacking away your own flavor of Smaller C? Do you mind sharing
>the
>features you've added that I haven't yet? I might include some of them.

No, I started my own more than 10 years ago.
http://myweb.cableone.net/benlunt/newbasic.htm
(Look at the NBC line)

I was suprised to see the date I had put on one comment within my code
dating more than 10 years ago. Anyway, I have gotten back at it again.
It is quite a bit different than your SmallerC code. I just enjoy looking
at how you have implemented a feature.

>> As always, enjoy reading your updates to see your ideas on doing the same
>> thing I am doing. :-)
>
>It's not only to let you know of what's happening but also to possibly give
>you a chance to get the new feature without needing you to implement it
>(unless you're so eager and can't wait:) and waste the effort. I know, I
>know, Small C has been done and redone and there are many C compilers out
>there, so this effort of mine is a waste to a certain degree as well. :)
>But it's a fun experience nonetheless.
>
>Alex

That is exactly the reason I am doing mine. A fun experience.

Thanks,
Ben

Alexei A. Frounze

unread,

Dec 13, 2012, 1:44:23 AM12/13/12

to

On Dec 12, 10:23 pm, "Benjamin David Lunt" <zf...@fysnet.net> wrote:
> "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in messagenews:03b08f05-9bc0-4d8c...@googlegroups.com...
...

> >Um, are you hacking away your own flavor of Smaller C? Do you mind sharing
> >the
> >features you've added that I haven't yet? I might include some of them.
>
> No, I started my own more than 10 years ago.
> http://myweb.cableone.net/benlunt/newbasic.htm
> (Look at the NBC line)

Looked. What I saw was very informative. :)

> I was suprised to see the date I had put on one comment within my code
> dating more than 10 years ago. Anyway, I have gotten back at it again.

And I started working on my OS like what? 12+ years ago. :) Haven't
touched it for a few years. The last thing I did was the FAT12/16/32
code. And an unfinished over-UART boot loader.

> It is quite a bit different than your SmallerC code. I just enjoy looking
> at how you have implemented a feature.

Is there anything in particular worth mentioning?

Alex

Benjamin David Lunt

unread,

Dec 17, 2012, 6:30:10 PM12/17/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:05ae5be8-9723-40f2...@r10g2000pbd.googlegroups.com...

On Dec 12, 10:23 pm, "Benjamin David Lunt" <zf...@fysnet.net> wrote:
> "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in
> messagenews:03b08f05-9bc0-4d8c...@googlegroups.com...
...
>> >Um, are you hacking away your own flavor of Smaller C? Do you mind
>> >sharing
>> >the
>> >features you've added that I haven't yet? I might include some of them.
>>
>> No, I started my own more than 10 years ago.
>> http://myweb.cableone.net/benlunt/newbasic.htm
>> (Look at the NBC line)
>
>Looked. What I saw was very informative. :)

:-). Just a referrence to it, indicating that I have had it for a while.

>> I was suprised to see the date I had put on one comment within my code
>> dating more than 10 years ago. Anyway, I have gotten back at it again.
>
>And I started working on my OS like what? 12+ years ago. :) Haven't
>touched it for a few years. The last thing I did was the FAT12/16/32
>code. And an unfinished over-UART boot loader.

I have done a little here and there on my OS, but not much. Spent
a lot of time on the book, currently working on another(s) in the series.

I am starting to blame you for getting my interest back in to this
C Compiler stuff... :-) I can't seem to put it down. It's like a
good book.

>> It is quite a bit different than your SmallerC code. I just enjoy looking
>> at how you have implemented a feature.
>
>Is there anything in particular worth mentioning?

I don't have a "queue" like yours is. I have a pointer to the current
character in a line. Then I do a match for a specific keyword or
character. If found, I advance the "pointer" to just after it. If not
found, I continue to try to match other items.

I also have Structures, Enums, Casts, etc., implemented. The Structs
are not nestable (yet), but are coded so that it should be a simple
task to do so (I hope).

Casts are a little different than ANSI C, C89, C99, whatever it is called
now, but it is close. I think I can get a little more work on it and
it should be complete.

For example, currently the following cast (in my compiler)

i = (unsigned int) a_char_buffer[10];

is the equivilant of the following cast (in ANSI C, etc)

i = * (unsigned int *) &a_char_buffer[10];

both getting the 'unsigned int' sized word from byte offset 10
in a_char_buffer and placing that value in 'i'.

I also have
bool, char, short, int, and long (all signed or unsigned)
implemented.

'bool' and 'char' both occupy 8 bits, 'short' and 'int' occupy 16 bits,
while 'int' occupies 32 bits in 32-bit mode, and 'long' occupies 32 bits
in either mode. I haven't done 'long long' yet, but with the way it is
coded, I should be able to implement the allocation quite easily, though
the assembly output will be quite the task to implement.

I also have 'const' implemented, but do not have the capability to
assign a local variable (at declare time) a value yet, so only
parameters and globals take advantage of the 'const' keyword.

Anyway, I have been working on my compiler for many years off and on.
I am sure within a few months you will be passing me up. Then it will
be really interesting :-)

Alexei A. Frounze

unread,

Dec 18, 2012, 7:43:55 AM12/18/12

to

On Dec 17, 3:30 pm, "Benjamin David Lunt" <zf...@fysnet.net> wrote:
> "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in messagenews:05ae5be8-9723-40f2...@r10g2000pbd.googlegroups.com...

> On Dec 12, 10:23 pm, "Benjamin David Lunt" <zf...@fysnet.net> wrote:
>
> > "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in
> > messagenews:03b08f05-9bc0-4d8c...@googlegroups.com...
> ...
> >> >Um, are you hacking away your own flavor of Smaller C? Do you mind
> >> >sharing
> >> >the
> >> >features you've added that I haven't yet? I might include some of them.
>
> >> No, I started my own more than 10 years ago.
> >>http://myweb.cableone.net/benlunt/newbasic.htm
> >> (Look at the NBC line)
>
> >Looked. What I saw was very informative. :)
>
> :-). Just a referrence to it, indicating that I have had it for a while.
>
> >> I was suprised to see the date I had put on one comment within my code
> >> dating more than 10 years ago. Anyway, I have gotten back at it again.
>
> >And I started working on my OS like what? 12+ years ago. :) Haven't
> >touched it for a few years. The last thing I did was the FAT12/16/32
> >code. And an unfinished over-UART boot loader.
>
> I have done a little here and there on my OS, but not much. Spent
> a lot of time on the book, currently working on another(s) in the series.

The USB book? That's one area I haven't yet explored.

> I am starting to blame you for getting my interest back in to this
> C Compiler stuff... :-) I can't seem to put it down. It's like a
> good book.

:)

> >> It is quite a bit different than your SmallerC code. I just enjoy looking
> >> at how you have implemented a feature.
>
> >Is there anything in particular worth mentioning?
>
> I don't have a "queue" like yours is. I have a pointer to the current
> character in a line. Then I do a match for a specific keyword or
> character. If found, I advance the "pointer" to just after it. If not
> found, I continue to try to match other items.

That doesn't look like a major difference.

> I also have Structures, Enums, Casts, etc., implemented. The Structs
> are not nestable (yet), but are coded so that it should be a simple
> task to do so (I hope).

Structs will take some time to implement. Hopefully, not too much.
Casts should be easier and quicker to implement (just piggy back on
the code for sizeof(type)).

> Casts are a little different than ANSI C, C89, C99, whatever it is called
> now, but it is close. I think I can get a little more work on it and
> it should be complete.
>
> For example, currently the following cast (in my compiler)
>
> i = (unsigned int) a_char_buffer[10];
>
> is the equivilant of the following cast (in ANSI C, etc)
>
> i = * (unsigned int *) &a_char_buffer[10];
>
> both getting the 'unsigned int' sized word from byte offset 10
> in a_char_buffer and placing that value in 'i'.

Ouch.

> I also have
> bool, char, short, int, and long (all signed or unsigned)
> implemented.
>
> 'bool' and 'char' both occupy 8 bits, 'short' and 'int' occupy 16 bits,
> while 'int' occupies 32 bits in 32-bit mode, and 'long' occupies 32 bits
> in either mode. I haven't done 'long long' yet, but with the way it is
> coded, I should be able to implement the allocation quite easily, though
> the assembly output will be quite the task to implement.

Types bigger than the machine word are painful to implement.

I've been working for 3 days now adding just 'unsigned int'.
While it looks like a simple thing (and it is) to add when plain 'int'
is already supported, it drags in a number of other things and creates
additional work items:
- parsing of the base type (I now need to parse 'unsigned', 'unsigned
int', 'signed' and 'signed int' in addition to simply 'void', 'char'
and 'int')
- parsing of numeric constants (to either 'int' or 'unsigned int'
depending on the value and base)
- adjusting array dimension parsing (to take into account 'int' vs
'unsigned int' dimensions)
- adjusting sizeof (to return 'unsigned int' instead of 'int')
- adding implicit conversions of 'int' and 'char' to 'unsigned
int' (wherever needed)
- adjusting constant subexpression evaluation and folding (to support
both 'int' and 'unsigned int')
- morphing signed /,%,>>,/=,%=,>>= into special unsigned counterparts
(wherever needed)
- adding code to actually generate DIV and SHR
- changing some other things from 'int' to 'unsigned int'
- a few other things here and there
And that's already an extra 300 lines of code for just one 'unsigned
int'. :)

> I also have 'const' implemented, but do not have the capability to
> assign a local variable (at declare time) a value yet, so only
> parameters and globals take advantage of the 'const' keyword.

At the moment I'm not planning to meaningfully support 'const' or
'volatile'. I might just ignore them in the source code. Actually,
'volatile' is pretty much how variables behave right now since there's
no code optimization. And 'const' and 'volatile' can appear at
different levels, e.g. 'const char* const* const* const a[5];', and
then they can be inside of 'struct'... I don't think it's something
that's going to be worthy of my time.

> Anyway, I have been working on my compiler for many years off and on.
> I am sure within a few months you will be passing me up. Then it will
> be really interesting :-)

Not sure how much more spare time I'll have to work on mine. So far I
have spent 3.5 months (or some 500 hours, give or take?) since I
started working on the thing in September. Sadly. other things require
attention and time as well, so the development will slow down at some
point soon. But we'll see. :)

Alex

Benjamin David Lunt

unread,

Dec 18, 2012, 12:51:03 PM12/18/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:4ec440bf-aa70-45cd...@jj5g2000pbc.googlegroups.com...

>
>> I have done a little here and there on my OS, but not much. Spent
>> a lot of time on the book, currently working on another(s) in the series.
>
>The USB book? That's one area I haven't yet explored.

With out sounding like I am blowing my own horn here, if you (anyone in
general) is thinking of adding USB support to their OS, wanting to control
USB devices directly from the hardware, or just want to play with the
hardware,
this is a book to explore. :)

I spent a few years doing research for it, enjoyed every minute of it, and
have ever since. I have sold more copies than I ever imagined. Speaking
of,
I sent a copy in to Slashdot for someone to request, read, and send in a
review. If anyone is willing to write a review for Slashdot, simply write
them requesting for the book, write the review, then send the review in.
Afterwhich, the book is yours to keep.

Write a request to books [at] slashdot [dot] org with your background and
the desire to review the book, and if you are the first to be selected, the
book is yours to keep.

Anyway, it has been a very enjoyable project, which has not ended. Many
have continued interest in the additions and hardware that has followed.

Anyway, thanks for the comment. I enjoy writing about it, (as you can see).
[
for more info, see
http://www.fysnet.net/The_Universal_Serial_Bus.htm
and my blog at
http://www.fysnet.net/blog/2012/12/
]

Thanks,
Ben

P.S. getting back to the subject at hand, when the following is used:

signed int i;

do you simply ignore the 'signed' keyword, not counting any error checking
of the line itself (i.e.:

signed unsigned int i;

), with no symbol flag or type denoting signed? i.e.: you only have a
flag stating 'unsigned', therefore if the 'unsigned' flag is not set, then
it is assumed 'signed'.

Just curious. This is what I do.

Alexei A. Frounze

unread,

Dec 18, 2012, 8:19:23 PM12/18/12

to

On Dec 18, 9:51 am, "Benjamin David Lunt" <zf...@fysnet.net> wrote:
> "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in messagenews:4ec440bf-aa70-45cd...@jj5g2000pbc.googlegroups.com...

>
>
>
> >> I have done a little here and there on my OS, but not much. Spent
> >> a lot of time on the book, currently working on another(s) in the series.
>
> >The USB book? That's one area I haven't yet explored.
>
> With out sounding like I am blowing my own horn here, if you (anyone in
> general) is thinking of adding USB support to their OS, wanting to control
> USB devices directly from the hardware, or just want to play with the
> hardware,
> this is a book to explore. :)

Point taken. :)

> I spent a few years doing research for it, enjoyed every minute of it, and
> have ever since. I have sold more copies than I ever imagined. Speaking
> of,
> I sent a copy in to Slashdot for someone to request, read, and send in a
> review. If anyone is willing to write a review for Slashdot, simply write
> them requesting for the book, write the review, then send the review in.
> Afterwhich, the book is yours to keep.
>
> Write a request to books [at] slashdot [dot] org with your background and
> the desire to review the book, and if you are the first to be selected, the
> book is yours to keep.
>
> Anyway, it has been a very enjoyable project, which has not ended. Many
> have continued interest in the additions and hardware that has followed.

USB3! :) Last year I looked at some USB specs and code "from afar".
The scene was frightening. Mounds of details, gigantic state machines,
vasts of code. How can one turn a relatively simple thing in such a
beast of documentation and code? :) And why?

> Anyway, thanks for the comment. I enjoy writing about it, (as you can see).
> [
> for more info, see
> http://www.fysnet.net/The_Universal_Serial_Bus.htm
> and my blog at
> http://www.fysnet.net/blog/2012/12/
> ]
>
> Thanks,
> Ben
>
> P.S. getting back to the subject at hand, when the following is used:
>
> signed int i;
>
> do you simply ignore the 'signed' keyword, not counting any error checking
> of the line itself (i.e.:
>
> signed unsigned int i;
>
> ), with no symbol flag or type denoting signed? i.e.: you only have a
> flag stating 'unsigned', therefore if the 'unsigned' flag is not set, then
> it is assumed 'signed'.
>
> Just curious. This is what I do.

I've had different approaches (this code isn't the first one of mine
to parse C declarations). Right now I do something like this:

int ParseBase(int tok, int* base)
{
if (tok == tokVoid || tok == tokChar || tok == tokInt)
{
*base = tok;
tok = GetToken();
}
else if (tok == tokSigned || tok == tokUnsigned)
{
int sign = tok;
tok = GetToken();

if (tok == tokChar)
error("Error: ParseBase(): 'signed char' and 'unsigned char' not
supported\n");

if (tok == tokInt)
tok = GetToken();

if (sign == tokUnsigned)
*base = tokUnsigned;
else
*base = tokInt;
}
// TBD!!! more error checks

return tok;
}

If I ever get to shorts, longs and other niceties, I'll add special
tokens to represent 'unsigned char', 'signed char', ..., 'unsigned
long long', 'long long'. There's no point in carrying around multiple
tokens representing parts of the whole, 'signed/unsigned', 'short/
long', 'int'. In the above code I'm using tokUnsigned to represent
'unsigned int' and tokInt to represent plain/'signed' 'int'.

My general approach to parsing C code is simple and can be summarized
as follows.

I look at the token I've got and dispatch further parsing to other
methods (or "branches" within the current method) if possible and
error out if not.
Parsing continues while it makes sense and while there are more
tokens.
When parsing of a "grammatical construct" (e.g. of a base portion of a
declaration or of the for statement), finishes, the method that's been
parsing it returns the next (not-yet-consumed) token to the caller.
This lets all subparsers have a peek at what's coming next and frees
them from having to put it back if it can't be consumed on the spot.
They just hand it in to the caller.

I'm fairly sure I'm missing a few checks in the code here and there,
but this is what works well.

Alex

Benjamin David Lunt

unread,

Dec 18, 2012, 10:44:16 PM12/18/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:47528ede-1d38-46d0...@l3g2000pbq.googlegroups.com...

>
> USB3! :) Last year I looked at some USB specs and code "from afar".
> The scene was frightening. Mounds of details, gigantic state machines,
> vasts of code. How can one turn a relatively simple thing in such a
> beast of documentation and code? :) And why?

It actually isn't that bad, once you have a look at it and implement it.
My book does include USB 3.0, how to use it, and includes a few example
devices from the time it is plugged in, enumerated, and then configured,
step by step.

Thanks. I was curious. I simply ignore the 'signed' keyword (token)
and through out my code if the 'unsigned' flag is not set for a symbol,

it is assumed 'signed'.

Thanks,
Ben

Alexei A. Frounze

unread,

Dec 19, 2012, 10:32:08 AM12/19/12

to

On Dec 18, 7:44 pm, "Benjamin David Lunt" <zf...@fysnet.net> wrote:
> "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in messagenews:47528ede-1d38-46d0...@l3g2000pbq.googlegroups.com...

>
>
>
> > USB3! :) Last year I looked at some USB specs and code "from afar".
> > The scene was frightening. Mounds of details, gigantic state machines,
> > vasts of code. How can one turn a relatively simple thing in such a
> > beast of documentation and code? :) And why?
>
> It actually isn't that bad, once you have a look at it and implement it.
> My book does include USB 3.0, how to use it, and includes a few example
> devices from the time it is plugged in, enumerated, and then configured,
> step by step.

Once you've done it, it's not that bad anymore, true. :)

> >> P.S. getting back to the subject at hand, when the following is used:
>
> >> signed int i;
>
> >> do you simply ignore the 'signed' keyword, not counting any error
> >> checking
> >> of the line itself (i.e.:
>
> >> signed unsigned int i;
>
> >> ), with no symbol flag or type denoting signed? i.e.: you only have a
> >> flag stating 'unsigned', therefore if the 'unsigned' flag is not set,
> >> then
> >> it is assumed 'signed'.
>
> >> Just curious. This is what I do.

...

> Thanks. I was curious. I simply ignore the 'signed' keyword (token)
> and through out my code if the 'unsigned' flag is not set for a symbol,
> it is assumed 'signed'.

How about this one?:

signed x;

It's a valid declaration equivalent to

int x;

and

signed int x;

Alex

Benjamin David Lunt

unread,

Dec 19, 2012, 10:35:11 AM12/19/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:704e01b0-fea2-47fc...@m4g2000pbd.googlegroups.com...

>> Thanks. I was curious. I simply ignore the 'signed' keyword (token)
>> and through out my code if the 'unsigned' flag is not set for a symbol,
>> it is assumed 'signed'.
>
>How about this one?:
>
> signed x;
>
>It's a valid declaration equivalent to
>
> int x;
>
>and
>
> signed int x;
>

Exactly. Mine treats all three of those the same. I was just
curious if you had two different indicators, one for signed,
one for unsigned. Where as I have a single indicator, unsigned.
Then if that indicator is not indicating unsigned, then signed
is used.

Thanks,
Ben

Rod Pemberton

unread,

Dec 20, 2012, 4:30:51 PM12/20/12

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:47528ede-1d38-46d0...@l3g2000pbq.googlegroups.com...
...

> If I ever get to shorts, longs and other niceties, I'll add
> special tokens to represent 'unsigned char', 'signed char', ...,
> 'unsigned long long', 'long long'. There's no point in carrying
> around multiple tokens representing parts of the whole,

> 'signed/unsigned', 'short/long', 'int'. In the above code I'm

> using tokUnsigned to represent 'unsigned int' and tokInt to
> represent plain/'signed' 'int'.

I thought 'long long' was a token ...

It just has white space in the middle. :-)

I.e., 'unsigned' and 'signed' are type modifiers, while 'char' and
'long' and 'short' and 'int' and 'long long' are types.

Well, I thought I had implement 'long long' as a single token in
flex/bison based C parser, but it seems I never got around to
that. Alternately, I might've removed it for some reason, or
maybe it's in one of my other parsers. I probably still would do
that though. Otherwise, 'long long' is the only (?) situation
where you have to combine two tokens to get the type. I keep
thinking there is one other situation where a token in C might
have whitespace in the middle, but I can't seem to recall what it
is.

Rod Pemberton

Rod Pemberton

unread,

Dec 20, 2012, 4:33:52 PM12/20/12

to

"Benjamin David Lunt" <zf...@fysnet.net> wrote in message
news:kard7k$d35$1...@speranza.aioe.org...

> [...] I simply ignore the 'signed' keyword (token)

> and through out my code if the 'unsigned' flag is not set for a
> symbol, it is assumed 'signed'.
>

For decades, I've coded all char types as 'unsigned char' in C.

There is only one case where I can remember where it was
convenient to use a signed int. That was in the past few years.
I was incrementing and decrementing through zero. I didn't feel
like preserving an offset to make the code unsigned. Anyway, that
was an int, not a char.

Even so, over the past few years, I've reworked a bunch of my code
to eliminate casts on char types, or to eliminate sign change
warnings for calls to C library functions when using strict
compiler settings. Also, without unsigned char's, other people
won't be confused about use of unsigned char's. AFAICT, there is
no need whatsoever for signed chars in C, except that that's what
the C library expects ...

Rod Pemberton

wolfgang kern

unread,

Dec 21, 2012, 3:21:53 PM12/21/12

to

Benjamin David Lunt said in part:
...
and as I understood:
have "signed int 32" as the default...

Ok, I never wrote an assembler nor any compiler, but my \
disassembler and its associatet code-analyzer treat every
32 bit value as unsigned by default, any signed characteristic
comes with address rollover anyway. X86-CPUs dont know signed.
__
wolfgang

Alexei A. Frounze

unread,

Dec 24, 2012, 3:21:07 AM12/24/12

to

In case anyone's following closely...

I've added in the past two weeks:
- unsigned int
- subtraction of a pointer from a pointer
- the comma operator
- macro definition via command line parameters: -D macro[=expansion
text]

I've also made a few bugfixes and improvements here and there.

Alex

Message has been deleted

Alexander Kobets

unread,

Jan 15, 2013, 5:09:55 PM1/15/13

to

пятница, 23 ноября 2012 г., 12:03:30 UTC+4 пользователь Alexei A. Frounze написал:

> While in the past 7 years I've had opportunities (if you can call it that) to piss off various people (officials included) in like 5 different countries, somehow I have resisted doing that and am still free, which, of course, means nothing. In Russia there's this old saying going something like "no one can be safe from poverty or prison". And the saying is still very actual, in some respects more than ever. In Russia you don't seek the various freedoms that you take for granted in, say, the US. Not these days, not yet.

Is relly you something do not give or prohibited? What "special" you doing, which anger somebody?

Sorry for offtopic. May be about this talked before, but I unable to read complete 77 messages in this topic.

Alexei A. Frounze

unread,

Nov 25, 2013, 3:32:42 AM11/25/13

to

On Thursday, November 22, 2012 5:02:49 AM UTC-8, Alexei A. Frounze wrote:
> I've been working on a simple C compiler lately and here's what I've got so far. It's a fun project, I must tell. :)
>
>
>
> Steve and Rod may be interested to take a look.
>
>
>
> Code (ugly in some places and ways, but apparently functional):
>
> https://github.com/alexfru/SmallerC
>
>
>
> Alex

I've accumulated some changes and it's about time for an update. :)

Here's what has happened since around January 2013 in the compiler:
- MIPS code generator (I can compile small apps for RetroBSD, running on a MIPS CPU with 96K of application RAM); a small MIPS emulator as well
- output of generated asm code to a file (if given), else to the console
- support for asm("asm code");
- support '# linenum filename flags' lines from external, gcc-like, preprocessors when Smaller C is compiled with its own preprocessor disabled via the NO_PREPROCESSOR macro
- support for dimensionless [] as in extern char earr[]; and int main(int argc, char* argv[]);
- support for 1-d global (=file scope) array initialization
- support for static (file scope only)
- support for types signed char and unsigned char (in addition to the already existing plain char)
- support for implicit function declarations as in C89
- -verbose option to print warnings and names of the functions being compiled
- lots of bugfixes and improvements
- new test, a snake game, you may treat it as an early Thanksgiving present :)

Work and other matters haven't let me do more, but I'm still making progress.
Right now I'm cleaning up error reporting to free up some memory for new stuff and to still fit into 64K code + 64K data in 16-bit mode. Looks like I've just beaten the original Small C in terms of functionality. :) I'm planning to add support for struct by Xmas/2014 or so.

Cheers,
Alex

Benjamin David Lunt

unread,

Nov 25, 2013, 5:04:50 PM11/25/13

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:b7e3451e-ea4d-4e37...@googlegroups.com...

On Thursday, November 22, 2012 5:02:49 AM UTC-8, Alexei A. Frounze wrote:
> I've been working on a simple C compiler lately and here's what I've got
> so far. It's a fun project, I must tell. :)
>
>
>
> Steve and Rod may be interested to take a look.
>
>
>
> Code (ugly in some places and ways, but apparently functional):
>
> https://github.com/alexfru/SmallerC
>
>
>
> Alex

<quote>

</quote>

Hi Alex,

I just grabbed the new files and am interested in looking at them.

Thank you,

Ben

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Forever Young Software
http://www.fysnet.net/index.htm

http://www.fysnet.net/the_universal_serial_bus.htm

To reply by email, please remove the zzzzzz's

Batteries not included, some Assembly required.

s_dub...@yahoo.com

unread,

Nov 26, 2013, 10:47:24 AM11/26/13

to

Thanks for sharing Alex!, Happy Thanksgiving to you and everyone else on A.O.D.

Steve

Alexei A. Frounze

unread,

Dec 21, 2013, 6:48:49 PM12/21/13

to

There's no struct yet, but there's something else:
- goto
- conditional/ternary operator ?:
- type casts!
- you can now (re)compile Smaller C into a 16-bit DOS .EXE using only
Smaller C and NASM (no linker!)

Happy holiday season everyone!
Alex

Alexei A. Frounze

unread,

Dec 29, 2013, 6:23:01 AM12/29/13

to

I've just added support for the huge memory model via the -huge option.

It'll let you compile 32-bit code for DOS. Ints and pointers are 32-bit.
Pointers are 32-bit physical addresses.

No need to mess with segments. And you can have arrays larger than 64KB! :)

In this model the stack is limited to 64KB and the cumulative size of local
variables is limited to 32KB (this is a per-function limit).
Individual functions are limited to 32KB of code size.
These limitations should not pose problems normally.

This is how you can (re)compile Smaller C using the huge memory model
(provided you have a 32-bit Smaller C already, compiled with gcc/DJGPP, Open
Watcom C/C++ for Windows/DOS4G, etc):

smlrc -huge -no-externs lbdos.c lbdos.asm
smlrc -huge -no-externs -label 1001 smlrc.c smlrc.asm
nasm -f bin smlrchg.asm -o smlrchg.exe

Enjoy and a happy New Year!
Alex

Benjamin David Lunt

unread,

Dec 29, 2013, 3:38:20 PM12/29/13

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:l9p0mo$a9i$1...@speranza.aioe.org...

Alex,

Looks like you are making great strides with this. Everytime I
decide to take a break and have a look at what you have done, I
find that you have already added new features.

How close are you to structures?

One thing I would like to do, eventually, is re-write my loader
file(s) to use mostly C code. It currently is many lines of assembly
and is getting harder and harder to keep track of. [However,
I do use it to test my NBASM for errors :-)] Of all the C compilers
I have tried so far, yours looks to be the closest and easiest to
make that transition.

I made a few modifications to your previous code, a few weeks ago,
and was able to test some code transition. However, my loader file
uses structures, and NBASM (mostly) allows and supports structures
similar to C.

For example,

S_TEMP struct
member0 word ; a 16-bit structure member
member1 qword ; a 64-bit structure member
member2 dword ; a 32-bit structure member
etc byte ; an 8-bit structure member
S_TEMP ends

temp_struct st S_TEMP

mov eax,es:[ebx + S_TEMP->member2]
or
mov eax,es:[ebx + temp_struct.member2]

I am very interested in your progress on structures. Once you have
a working structure mechanism, I would start the transition and
be sure to keep you up to date with what I find.

Anyway, glad to hear you have made good progress.

Alexei A. Frounze

unread,

Dec 29, 2013, 10:30:04 PM12/29/13

to

On Sunday, December 29, 2013 12:38:20 PM UTC-8, Benjamin David Lunt wrote:
> "Alexei A. Frounze" <...@gmail.com> wrote in message

> news:l9p0mo$a9i$1...@speranza.aioe.org...
> > I've just added support for the huge memory model via the -huge option.
> >
> > It'll let you compile 32-bit code for DOS. Ints and pointers are 32-bit.
> > Pointers are 32-bit physical addresses.
> >
> > No need to mess with segments. And you can have arrays larger than 64KB!
> > :)
> >
> > In this model the stack is limited to 64KB and the cumulative size of
> > local variables is limited to 32KB (this is a per-function limit).
> > Individual functions are limited to 32KB of code size.
> > These limitations should not pose problems normally.
> >
> > This is how you can (re)compile Smaller C using the huge memory model
> > (provided you have a 32-bit Smaller C already, compiled with gcc/DJGPP,
> > Open Watcom C/C++ for Windows/DOS4G, etc):
> >
> > smlrc -huge -no-externs lbdos.c lbdos.asm
> > smlrc -huge -no-externs -label 1001 smlrc.c smlrc.asm
> > nasm -f bin smlrchg.asm -o smlrchg.exe
> >
> > Enjoy and a happy New Year!
>
> Alex,
>
> Looks like you are making great strides with this. Everytime I
> decide to take a break and have a look at what you have done, I
> find that you have already added new features.

:)

> How close are you to structures?

Very close. In fact, I was about to start implementing them about a week
ago (as I had worked out most of the needed logic), but then I realized
I really needed 32-bit ints for things like fseek() and ftell() and I
was also short on code space for new features (I run close to 63000
bytes of self-compiled code when using -seg16/-seg16t unless I cut off
the annotations and the minimal built-in preprocessor) and even if I
could squeeze in support for struct, 32-bit ints stood no chance.
But I really wanted a full self-compilable version for DOS.
So, I made a detour to experiment with the huge model, which looks like
a success, though, understandably, to simplify things and not to get too
intimate with instruction encodings in the compiler itself I had to
compromise the quality of code generation for the huge memory model.
I believe, the code generator may be improved somewhat to reduce the
losses, but it's not something critical and can be addressed later.

I think, within a month or two I can get basic structure support in.

> One thing I would like to do, eventually, is re-write my loader
> file(s) to use mostly C code. It currently is many lines of assembly
> and is getting harder and harder to keep track of. [However,
> I do use it to test my NBASM for errors :-)] Of all the C compilers
> I have tried so far, yours looks to be the closest and easiest to
> make that transition.

Exactly why do you think mine may be the easiest? Is it because it
generates asm code? Is it because you can tweak asm output format?
Is it because of the huge memory support? Or is it something else?
Perhaps, because I might be able to help out with a thing or two? :)

> I made a few modifications to your previous code, a few weeks ago,
> and was able to test some code transition. However, my loader file
> uses structures, and NBASM (mostly) allows and supports structures
> similar to C.
>
> For example,
>
> S_TEMP struct
> member0 word ; a 16-bit structure member
> member1 qword ; a 64-bit structure member
> member2 dword ; a 32-bit structure member
> etc byte ; an 8-bit structure member
> S_TEMP ends
>
> temp_struct st S_TEMP
>
> mov eax,es:[ebx + S_TEMP->member2]
>
> or
>
> mov eax,es:[ebx + temp_struct.member2]

In case you can almost taste it, 16-bit short, 32-bit long and structure
packing will arrive later. I'm not sure about structure initialization.
That may be far from now. And 64-bit ints and floating point, even later.
If ever.

> I am very interested in your progress on structures. Once you have
> a working structure mechanism, I would start the transition and
> be sure to keep you up to date with what I find.

Sure. It would be nice to have some users! :)

> Anyway, glad to hear you have made good progress.

Thanks!

Alex

Benjamin David Lunt

unread,

Dec 30, 2013, 2:17:54 PM12/30/13

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:l9qpc0$8cr$1...@speranza.aioe.org...

> Exactly why do you think mine may be the easiest? Is it because it
> generates asm code? Is it because you can tweak asm output format?
> Is it because of the huge memory support? Or is it something else?
> Perhaps, because I might be able to help out with a thing or two? :)

(smile)

I have studied your code enough that I would know what code it would
produce, and would be able to easily modify it if needed to tweak
something. Also, it only took me an hour to modify it to output
NBASM code. :-)

> In case you can almost taste it, 16-bit short, 32-bit long and structure
> packing will arrive later. I'm not sure about structure initialization.
> That may be far from now. And 64-bit ints and floating point, even later.
> If ever.

How about 64-bit ints via EDX:EAX?

Thanks and Happy New Year to you and all the a.o.d group,

Alexei A. Frounze

unread,

Dec 30, 2013, 5:13:15 PM12/30/13

to

On Monday, December 30, 2013 11:17:54 AM UTC-8, Benjamin David Lunt wrote:
> "Alexei A. Frounze" <alexf...@gmail.com> wrote in message
> news:l9qpc0$8cr$1...@speranza.aioe.org...
>
> > Exactly why do you think mine may be the easiest? Is it because it
> > generates asm code? Is it because you can tweak asm output format?
> > Is it because of the huge memory support? Or is it something else?
> > Perhaps, because I might be able to help out with a thing or two? :)
>
> (smile)
>
> I have studied your code enough that I would know what code it would
> produce, and would be able to easily modify it if needed to tweak
> something. Also, it only took me an hour to modify it to output
> NBASM code. :-)

I see. Cool! :)

> > In case you can almost taste it, 16-bit short, 32-bit long and structure
> > packing will arrive later. I'm not sure about structure initialization.
> > That may be far from now. And 64-bit ints and floating point, even
> > later.
> > If ever.
>
> How about 64-bit ints via EDX:EAX?

(E)AX is the implicit result container throughout the code.
Likewise, SizeOfWord is the implicit result size (in 8-bit bytes) and
pointer size throughout the code.

Going beyond one register and the number of bytes specified in SizeOfWord
is not easy. Lots of stuff will need to be (re)written anew.

This is why it's much easier to extend AX to EAX than to simingly
the same DX:AX, which is what I did with -huge.

> Thanks and Happy New Year to you and all the a.o.d group,

¡Próspero Año Nuevo!
С Новым Годом!

Alex

Alexei A. Frounze

unread,

Feb 17, 2014, 1:54:17 AM2/17/14

to

On Sunday, December 29, 2013 12:38:20 PM UTC-8, Benjamin David Lunt wrote:
...

> How close are you to structures?

Basic support for struct/union has just been submitted.

You can do everything with struct and union, except:
- bit-fields, flexible array members
- tight packing (members are naturally aligned)
- passing/returning to/from functions by value; pass/return by reference instead
- initializing at definition time via = { initializer(s) }

Assignment, ?:, ->, ., (unsigned)&((struct someTag*)0)->someMember are OK.

Enjoy. Please report bugs.

Alex

Benjamin David Lunt

unread,

Feb 17, 2014, 4:27:12 PM2/17/14

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:458c2c3b-6b99-49db...@googlegroups.com...

Thanks Alex, I will have a look.

Alexei A. Frounze

unread,

Feb 25, 2014, 3:16:32 AM2/25/14

to

On Monday, February 17, 2014 1:27:12 PM UTC-8, Benjamin David Lunt wrote:
> "Alexei A. Frounze" <alexf...@gmail.com> wrote in message
> news:458c2c3b-6b99-49db...@googlegroups.com...
> > On Sunday, December 29, 2013 12:38:20 PM UTC-8, Benjamin David Lunt wrote:
> > ...
> >> How close are you to structures?
> >
> > Basic support for struct/union has just been submitted.
> >
> > You can do everything with struct and union, except:
> > - bit-fields, flexible array members
> > - tight packing (members are naturally aligned)
> > - passing/returning to/from functions by value; pass/return by reference
> > instead
> > - initializing at definition time via = { initializer(s) }
> >
> > Assignment, ?:, ->, ., (unsigned)&((struct someTag*)0)->someMember are OK.
> >
> > Enjoy. Please report bugs.
>
> Thanks Alex, I will have a look.

New things:
- short
- long (32-bit and huge mode(l)s only)
- improved extern (now extern directives are generated for NASM automatically as needed, not based on whether or not you use extern in C)
- one bug fixed

Alex

Alexei A. Frounze

unread,

Mar 2, 2014, 12:30:42 AM3/2/14

to

New things:
- typedef
- __func__
- bugfixes

Alexei A. Frounze

unread,

Mar 10, 2014, 4:45:55 AM3/10/14

to

New things:
- enum
- #pragma pack()
- bugfix

Alexei A. Frounze

unread,

Apr 20, 2014, 11:18:40 PM4/20/14

to

Improvements/New things:
- bugfixes
- support initialization of multidimensional arrays
- support initialization of structures
- support initialization of structures and arrays inside functions
- support static inside functions

The compiler should now be much more usable.

I think, Ben was missing some of these. :)

Alex

Alexei A. Frounze

unread,

Sep 14, 2014, 6:05:33 AM9/14/14

to

I've uploaded binaries for DOS and most of the standard library.

Note: a few things are still missing from the library, most notably: <time.h> functionality, *scanf() functions.

Everyone's welcome to play with the compiler and report bugs.

How to install the .zip file downloaded from https://github.com/alexfru/SmallerC:

1. Create C:\SMLRC (any hard disk or name will do, but make the full path as short as possible) and copy into it the following subdirectories of the .zip file:
BIND
INCLUDE
LIB
TESTS

2. Include C:\SMLRC\BIND in the environment variable PATH

3. Don't forget to have NASM (2.10 is good) installed and also available via PATH

You should now be able to compile the following example from TESTS\hw.c:

/*
How to compile for DOS (all mode(l)s: tiny/.COM, small/.EXE, huge/.EXE):
smlrcc -dost hw.c -o hwdt.com
smlrcc -doss hw.c -o hwds.exe
smlrcc -dosh hw.c -o hwdh.exe
*/
#include <stdio.h>

int main(void)
{
puts("Hello, World!");
return 0;
}

I suggest to stick to the huge memory mode(l) as it supports 32-bit types such as long the functions that consume or return these types. In this mode(l) you can allocate all the available conventional memory via malloc() and you're not limited to objects smaller than 64KB.

The huge memory mode(l) is selected with the "-dosh" option, but you don't need to specify it explicitly when using a DOS version of smlrcc.

What else to know?

smlrcc can consume one or more of .c, .asm, .o or .a files and make an executable out of them.

If the command line is too long (over some 120 characters), you can put it into a file, say, mycmd.txt, and invoke smlrcc with @mycmd.txt (note the @ prefix) and it will extract the command line parameters from the file mycmd.txt. The linker (smlrl) supports this as well.

smlrcc supports the following useful options:
-c (compile only, don't link)
-S (compile to assembly only)
-v (verbose; show executed commands)
-map <mapfile> (produce the map file together with the binary)

You can compile your .c/.asm/.o files directly to a .a library file if you invoke smlrcc like so:
smlrcc [options] <file(s)> -c -o mylib.a

There's more but the documentation hasn't been updated for a while and so here I'm giving the most basic info only.

Enjoy.
Alex

Benjamin David Lunt

unread,

Sep 14, 2014, 4:46:55 PM9/14/14

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:94dbee89-6191-4cec...@googlegroups.com...

> I've uploaded binaries for DOS and most of the standard library.
>

> Everyone's welcome to play with the compiler and report bugs.
>
> How to install the .zip file downloaded from
> https://github.com/alexfru/SmallerC:

<snip>

> Enjoy.
> Alex

Hi Alex,

I think I have mentioned it before, but I am interested in this compiler
so that I may compile my OS' loader file(s). Currently, they are in
assembly and some are considerably long. If I move to C, this may
simplify some of them, make for smaller source files, and an easier
read.

I know that the current form of SmallerC won't work for this, but
with the past research I did with it, and since you release the source,
(thank you for that), I think it will be a somewhat simple task to
change it.

Anyway, just a few comments to say thanks for the work. If I ever
get some time again, I will have a look at it again.

Ben

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Forever Young Software
http://www.fysnet.net/index.htm

http://www.fysnet.net/osdesign_book_series.htm

Alexei A. Frounze

unread,

Sep 14, 2014, 7:30:04 PM9/14/14

to

On Sunday, September 14, 2014 1:46:55 PM UTC-7, Benjamin David Lunt wrote:
> Hi Alex,
>
> I think I have mentioned it before, but I am interested in this compiler
> so that I may compile my OS' loader file(s). Currently, they are in
> assembly and some are considerably long. If I move to C, this may
> simplify some of them, make for smaller source files, and an easier
> read.

I remember that.

> I know that the current form of SmallerC won't work for this, but
> with the past research I did with it, and since you release the source,
> (thank you for that), I think it will be a somewhat simple task to
> change it.

I remember you wanted support for structures in the compiler. It's been in for several months now. Or did you actually need more than just that?

> Anyway, just a few comments to say thanks for the work. If I ever
> get some time again, I will have a look at it again.

Thanks!
Alex

Benjamin David Lunt

unread,

Sep 14, 2014, 11:29:59 PM9/14/14

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:84a7de34-8062-4efb...@googlegroups.com...

> On Sunday, September 14, 2014 1:46:55 PM UTC-7, Benjamin David Lunt wrote:
>
> I remember you wanted support for structures in the compiler. It's
> been in for several months now. Or did you actually need more than
> just that?

I haven't had a chance to look at all yet. I will soon, though.

Thanks,
Ben

Benjamin David Lunt

unread,

Oct 16, 2014, 9:29:51 PM10/16/14

to

"Benjamin David Lunt" <zf...@fysnet.net> wrote in message

news:lv5mi8$mte$1...@speranza.aioe.org...

Alex,

I finally was able to take some time and look at SmallerC.

It has changed quite a bit from the last time I saw it. It has
quite a bit more functionality, and I was quite surprised in the
amount of work you have put into it. Good job.

1:
I added a little bit to the #pragma part. Rather then testing
if this is the #pragma pack() keyword, and returning an error
if it isn't, I changed it to allows other #pragma keywords and
to give an error at the end of the list if not found within this
list.

2:
I also added the 'X86OpLabelOff' operand so that the output
would be

mov eax,offset SomeVar ; MASM and NBASM syntax

compared to

mov eax,SomeVar ; NASM syntax

3:
I also found that you allow structures and, best of all,
nested structures.

4:
I also added a #pragma to assign:

SizeOfWord = 4

or

SizeOfWord = 2

For example, in a Loader.asm file (boot file), you will start
out with 16-bit code (SizeOfWord = 2), but after checking to make
sure you are on a 32-bit machine, you can then move to 32-bit code,
(SizeOfWord = 4). Works like a charm.

Thanks Alex. It was an interesting afternoon.

I will have to do some more work to see what else I can find for you.
I would sure like to see the 'bool' type added. However, there are
many opinions on the size of the 'bool' type. Should it be of type
'signed char', 'int', or 'unsigned int'?

I currently just use

typedef signed char bool

Alexei A. Frounze

unread,

Oct 17, 2014, 5:15:18 AM10/17/14

to

On Thursday, October 16, 2014 6:29:51 PM UTC-7, Benjamin David Lunt wrote:
> "Benjamin David Lunt" <...@fysnet.net> wrote in message

>
> Alex,
>
> I finally was able to take some time and look at SmallerC.
>
> It has changed quite a bit from the last time I saw it. It has
> quite a bit more functionality, and I was quite surprised in the
> amount of work you have put into it. Good job.

In the latest version you can see most of the standard library available for DOS and all DOS binaries, with which you can recompile the compiler, the driver, the linker and the library. I had a couple of weeks between the jobs to make it all happen. :)

> 1:
> I added a little bit to the #pragma part. Rather then testing
> if this is the #pragma pack() keyword, and returning an error
> if it isn't, I changed it to allows other #pragma keywords and
> to give an error at the end of the list if not found within this
> list.

Sure.

> 2:
> I also added the 'X86OpLabelOff' operand so that the output
> would be
>
> mov eax,offset SomeVar ; MASM and NBASM syntax
>
> compared to
>
> mov eax,SomeVar ; NASM syntax

Of course, you can do that to support your assembler.

> 3:
> I also found that you allow structures and, best of all,
> nested structures.

Unless you try something truly weird as "sizeof(struct {int a;})", things should work. This particular example is rare. It's rather eccentric, esoteric and impractical and it is outright banned in C++, but, apparently, should work in C. I don't support declaring new struct/union/enum types inside sizeof() and function parameters (in the latter case, in parameters, such declarations wouldn't work in standard C the way one might expect anyway due to the scope/visibility).

> 4:
> I also added a #pragma to assign:
>
> SizeOfWord = 4
>
> or
>
> SizeOfWord = 2
>
> For example, in a Loader.asm file (boot file), you will start
> out with 16-bit code (SizeOfWord = 2), but after checking to make
> sure you are on a 32-bit machine, you can then move to 32-bit code,
> (SizeOfWord = 4). Works like a charm.

That one isn't how I envisioned one would use the compiler. So, proceed with caution. :)

OTOH, with NASM there's an easy escape:

asm("incbin 'somefile.bin'");

or

asm("%include 'somefile.asm'");.

> Thanks Alex. It was an interesting afternoon.

You're welcome! :)

> I will have to do some more work to see what else I can find for you.

For me? It's still very long until xmas presents! :)

> I would sure like to see the 'bool' type added. However, there are
> many opinions on the size of the 'bool' type. Should it be of type
> 'signed char', 'int', or 'unsigned int'?
>
> I currently just use
>
> typedef signed char bool

Well, making it larger than a char/byte is almost certainly wasteful and useless. It can't be made smaller than a char/byte as it must be addressable through a pointer. Other than that, it doesn't really have a sign (since it can only take on the two values 0 and 1) and must always be representable in the range of signed ints similarly to [un]signed chars that are smaller in size than int and thus always convert into signed int in regular expressions. Thus it can't be a synonym for unsigned int.

Anyhow, bool, which should expand into _Bool, must be a separate internal type, clearly distinct from all others because it has special assignment and conversion semantics. Assigning to _Bool and casting to _Bool needs to work differently than the same operations on all other integer types. It's not hard to do. Half of the logic is in place. Casting to _Bool is implemented under case tok_Bool: in exprval() and GenFuse()/GenExpr1()/GenExpr0() and the logical && and || operators insert these implicit casts to _Bool. Declarations of _Bool and assignment to it aren't implemented. Shouldn't be hard (easy, I'd think) to add support for the type, but I haven't yet seen a compelling reason to. A lot of C code doesn't use it since it would require a C99 compiler or simply because the code predates C99.

Alex

Benjamin David Lunt

unread,

Oct 18, 2014, 1:57:25 AM10/18/14

to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message

news:ad17129d-2ed3-4d6c...@googlegroups.com...

On Thursday, October 16, 2014 6:29:51 PM UTC-7, Benjamin David Lunt wrote:
> "Benjamin David Lunt" <...@fysnet.net> wrote in message
>>

>> I will have to do some more work to see what else I can find for you.
>
>For me? It's still very long until xmas presents! :)

Alex,

Again, great job on SmallerC. I spent a while this evening working
with it some more.

I do have a question though. Why do you not allow a pack size larger
than SizeOfWord?

if ((PragmaPackValue <= 0) ||
(PragmaPackValue > SizeOfWord) ||
(PragmaPackValue & (PragmaPackValue - 1)))
error("Invalid alignment value\n");

What if SizeOfWord is 2, but I want to align on a dword boundary?

Also, why can't the pack size be a non power of two? What if,
for some weird reason, I want to align on a 6 byte boundary?

Just curious why you chose to check these values this way.