MCC Compiler

Bart

unread,

Sep 28, 2023, 7:15:06 AM9/28/23

to

My MCC project was intended to upgrade my BCC C compiler to use a
similar backend to my other compilers.

The aims were:

* A more solid code generator that would run more C programs with
fewer errors

* To apply the modest optimiser of that new backend

* To eliminate the intermediate ASM stages of BCC and directly
generate EXEs

* To have faster compilation because of that

* To have full ABI compliance, with proper handling of struct
passing

As it turned out, only the last of those has been achieved, since:

* The generated code, even if better, did not magically make C
programs work that didn't before. It is impractical to debug
large convoluted C programs to find a bug that exists elsewhere
entirely

* I started the direct EXE backend yesterday, but tedium had set in
and I abandoned it. So MCC is slower than BCC, partly because of the
extra IL stage, but also the ASM stages are handled discretely. In
BCC, the ASMs were kept in memory and the assembler was incorporated.

* BCC was a one-file solution (compiler + headers + assembler/linker).
MCC was intended as a 2-file one (windows.h was taken outside; it's
too big). As it is, MCC is a 3-file solution (mcc.exe + aa.exe +
windows.h) which also makes it slower as I said.

* I haven't bothered activating the optimiser.

* I found out that, even without the final EXE generation, it was still
slower than Tiny C, when there are lots of smaller files (with bigger
files, the speed would be about par I think)

So, MCC is still a usable product, and would be a better starting point
for further work (for example, to create an interpreter for its IL,
which opens up debugging capabilities), but I now have two private,
mediocre C compilers.

Something I added to both was support for wildcard inputs like '*.c'. I
did this to more easily build projects like Malcolm's BBX which could be
done like this:

mcc *.c freetype\* samplerate\*.c

To avoid having to type that lot, I put those inputs into an '@' options
file called 'bbx':

*.c freetype\* samplerate\*.c

Now I just type 'mcc @bbx'. I was surprised however when I tried to do:

gcc @bbx

because:

(1) gcc doesn't like backslashes in options files; they have to be "/"

(2) gcc doesn't expand wildcard filespecs like "*.c" inside options
files

Why not? Since this is not done by the OS, it must be done by the app
anyway.

But that's one more small thing at least that my compilers do better!

(The bigger projects I've used as test inputs were:

Lua
TCC
Pico C
SQL (sqlite3.c + shell.c)
BBX
LibJPEG
AA, QQ, CC (transpiled C from three of my projects)

The last three seem to run fine (my conservative C code always does!).
Most of the others seemed to work based on superficial tests, but I
didn't do in-depth testing.

But TCC didn't work. Version 0.9.27 (I tried also .25 and .28) went the
furthest: it could compile programs and produce EXEs, but the EXEs were
faulty.

At some point I will have another go at Seed7.)

Bart

unread,

Sep 28, 2023, 10:31:24 AM9/28/23

to

On 28/09/2023 12:14, Bart wrote:
> My MCC project was intended to upgrade my BCC C compiler to use a
> similar backend to my other compilers.

> But TCC didn't work. Version 0.9.27 (I tried also .25 and .28) went the
> furthest: it could compile programs and produce EXEs, but the EXEs were
> faulty.

MCC can generate ASM source code with named locals and temps, or using
only their offsets to keep the size down, which is what I'd been using
but had forgotten about.

But there was a problem with the offsets for temps (used when I run out
of registers) which I have to look at.

Generating ASMs with fully named temps, then Tiny C works. Further, the
exe produced can compile itself, to multiple generations. (BCC only
managed one generation.)

So it's not quite as poor as I'd thought. I might have a go at 0.9.28 again.

Tim Rentsch

unread,

Sep 28, 2023, 6:50:57 PM9/28/23

to

Bart <b...@freeuk.com> writes:

> My MCC project was intended to upgrade my BCC C compiler to use a

> similar backend to my other compilers. [...]

Please confine your postings in comp.lang.c to topics and subjects
relevant to the C language. None of what you say in your posting
is topical in comp.lang.c. An obvious suggestion is the newsgroup
comp.compilers instead.

Bart

unread,

Sep 28, 2023, 7:24:48 PM9/28/23

to

That's me told!

This project was first mentioned as a possibility at the end of August.
I was giving an update on it before moving on to my normal projects and
saner languages.

Maybe some people here (obviously not the old-timers) would find a blog
report on the tribulations of creating a personal C compiler interesting.

Or maybe someone is thinking of doing something of their own, and my
experience shows one-man compilers can be viable.

So how about letting them make up their own mind. If not, it is easy
enough to ignore the post or to killfile me.

Meanwhile I could equally suggest you post all your deadly-dull stuff
about C standards minutiae to comp.std.c

Kaz Kylheku

unread,

Sep 28, 2023, 8:38:13 PM9/28/23

to

On 2023-09-28, Bart <b...@freeuk.com> wrote:
> On 28/09/2023 23:50, Tim Rentsch wrote:
>> Bart <b...@freeuk.com> writes:
>>
>>> My MCC project was intended to upgrade my BCC C compiler to use a
>>> similar backend to my other compilers. [...]
>>
>> Please confine your postings in comp.lang.c to topics and subjects
>> relevant to the C language. None of what you say in your posting
>> is topical in comp.lang.c. An obvious suggestion is the newsgroup
>> comp.compilers instead.
>
> That's me told!

comp.compilers *is* a nice newsgroup, though.

Mind you, it's moderated.

You can't go on long C sucks tirades and whatnot.

Tim Rentsch

unread,

Sep 29, 2023, 1:10:19 AM9/29/23

to

Bart <b...@freeuk.com> writes:

> On 28/09/2023 23:50, Tim Rentsch wrote:
>
>> Bart <b...@freeuk.com> writes:
>>
>>> My MCC project was intended to upgrade my BCC C compiler to use a
>>> similar backend to my other compilers. [...]
>>
>> Please confine your postings in comp.lang.c to topics and subjects
>> relevant to the C language. None of what you say in your posting
>> is topical in comp.lang.c. An obvious suggestion is the newsgroup
>> comp.compilers instead.
>
> That's me told!
>
> This project was first mentioned as a possibility at the end of
> August. I was giving an update on it before moving on to my normal
> projects and saner languages.
>
> Maybe some people here (obviously not the old-timers) would find a
> blog report on the tribulations of creating a personal C compiler
> interesting.
>
> Or maybe someone is thinking of doing something of their own, and my
> experience shows one-man compilers can be viable.

None of that changes the reality that your comments are not
about the C language or C programs and so are not topical here.

> So how about letting them make up their own mind. If not, it is easy
> enough to ignore the post or to killfile me.

I know that your MO is to be a rude, self-centered, insecure
jerk. You have been for years and there is no indication that
will change any time soon. There is plenty of evidence that
people in the newgroup here are not interested in what you have
to say about your experience working on compilers, especially
since those compilers are not written in C and aren't faithful to
the C language (and it isn't clear whether you don't know that or
if you just don't care). So people have made up there minds;
the only question is how long it will take you to realize it, if
ever.

> Meanwhile I could equally suggest you post all your deadly-dull stuff
> about C standards minutiae to comp.std.c

My comments about the details of C pertain to the C language and
how to write C programs. They are topical here. Discussions in
comp.std.c are supposed to do with how to change the C standard
or how to read the C standard in areas where there is some
question about what is meant. If you think my comments are
deadly dull you are free to ignore the postings or put me in
your kill file. But whether you do that or not, my comments
are topical, and yours are not.

Bart

unread,

Sep 29, 2023, 6:21:44 AM9/29/23

to

On 29/09/2023 06:10, Tim Rentsch wrote:

> I know that your MO is to be a rude, self-centered, insecure
> jerk.

What's yours?

Malcolm McLean

unread,

Sep 29, 2023, 6:35:37 AM9/29/23

to

Oh don't lower the tone by replying to stuff like that.

I think Tim is in a mood. He posted something negative about me as
well. It's not normal for him.

Bart

unread,

Sep 29, 2023, 6:49:02 AM9/29/23

to

On 29/09/2023 06:10, Tim Rentsch wrote:
> Bart <b...@freeuk.com> writes:

> to say about your experience working on compilers, especially
> since those compilers are not written in C

Is that really relevant? Does a compiler for language L have to be
written in L to be taken seriously?

In any case in can be transpiled to C in 60ms:

c:\cx>tm mc -c cc
Compiling cc.m---------- to cc.c
TM: 0.06

> and aren't faithful to
> the C language (and it isn't clear whether you don't know that or
> if you just don't care).

In which ways? My product compiles a C 'subset' but does not formally
define what it is. Yet it manages to compile 100s of 1000s of lines of C
applications (not my generated code) which would be challenging for many
such small compilers.

But it no longer supports my own extensions, like the many that gcc
provides, and it no longer tries to improve on the language or fail
programs on features I feel ought to be deprecated.

Bart

unread,

Sep 29, 2023, 6:59:18 AM9/29/23

to

I think it is!

He seems to like policing this /unmoderated/ newsgroup according to what
/he/ finds interesting or relevant.

Meanwhile many here are happy to participate in endless discussions
about the 'halting' problem.

Michael S

unread,

Sep 29, 2023, 7:06:22 AM9/29/23

to

Modus operandi.
Is there kitchen.lang.latin newsgroup?

Michael S

unread,

Sep 29, 2023, 7:07:50 AM9/29/23

to

Many? You are wildly exaggerating.

Malcolm McLean

unread,

Sep 29, 2023, 7:36:22 AM9/29/23

to

There's one poster who occasionally posts to comp.lang.c who believes he has
found a flaw in the straightforwards proof that halting is non-computable that
everyone knows (run the halt detector on itself then loop forever if it detects
halt, and halt if it detects non-halting behaviour is the gist). The volume of
discussion that this has created is unbelieveable and, yes, I have participated
myself.

Mostly he posts to comp.theory however, where it is at least technically on topic.

Kenny McCormack

unread,

Sep 29, 2023, 7:56:02 AM9/29/23

to

In article <2c2f6ab4-949a-46ff...@googlegroups.com>,

It *is* normal for him.

That's all he does - posts the most boring, uninteresting drivel
imaginable, but, yet, at least by the looser standard now generally used in
CLC, on-topic.

(Note and clarification of the above: By the usual standard of most groups,
as applied to the specific rules of CLC, *nothing* is on-topic here. But
of course, that standard is rarely applied, since it would lead to, well,
you know, nothing ever being posted)

--
Just like Donald Trump today, Jesus Christ had a Messiah complex.

And, in fact, the similarities between the two figures are quite striking.
For example, both have a ragtag band of followers, whose faith cannot be shaken.

Kenny McCormack

unread,

Sep 29, 2023, 7:58:12 AM9/29/23

to

In article <uf6alm$809b$2...@dont-email.me>, Bart <b...@freeuk.com> wrote:
>On 29/09/2023 11:35, Malcolm McLean wrote:
>> On Friday, 29 September 2023 at 11:21:44 UTC+1, Bart wrote:
>>> On 29/09/2023 06:10, Tim Rentsch wrote:
>>>
>>>> I know that your MO is to be a rude, self-centered, insecure
>>>> jerk.
>>> What's yours?
>>>
>> Oh don't lower the tone by replying to stuff like that.
>>
>> I think Tim is in a mood. He posted something negative about me as
>> well. It's not normal for him.
>>
>
>I think it is!

Yes. It is.

>He seems to like policing this /unmoderated/ newsgroup according to what
>/he/ finds interesting or relevant.

Yup.

>Meanwhile many here are happy to participate in endless discussions
>about the 'halting' problem.

Yup.

The rules are obviously applied, to put it charitably, inconsistently.

Topicality rules for thee, but not for me.

--
Pensacola - the thinking man's drink.

David Brown

unread,

Sep 29, 2023, 10:40:43 AM9/29/23

to

On 29/09/2023 12:59, Bart wrote:
> On 29/09/2023 11:35, Malcolm McLean wrote:
>> On Friday, 29 September 2023 at 11:21:44 UTC+1, Bart wrote:
>>> On 29/09/2023 06:10, Tim Rentsch wrote:
>>>
>>>> I know that your MO is to be a rude, self-centered, insecure
>>>> jerk.
>>> What's yours?
>>>
>> Oh don't lower the tone by replying to stuff like that.
>>
>> I think Tim is in a mood. He posted something negative about me as
>> well. It's not normal for him.
>
> I think it is!
>
> He seems to like policing this /unmoderated/ newsgroup according to what
> /he/ finds interesting or relevant.
>

Tim is an extraordinarily patronising grumpy old git. (And that is
coming from someone who knows he can often be patronising!) He is also
very rarely wrong about the technicalities of C, and thus a useful
resource for this group.

I don't think you need fear the consequences of being on his bad side -
he killfiled me years ago, without it doing me any harm.

I do appreciate that it's useful for there to be occasional "policing"
of the group - polite reminders when threads are straying too far from
topicality. We want to keep the group strongly focused on C. (And I
say that as someone who is on the receiving end of such reminders every
now and again - I don't always follow them immediately, but I approve of
the principle.)

Discussions about specific compiler details have traditionally been
off-topic in comp.lang.* groups - its too easy to get bogged down by
compiler-specific threads rather than threads about the language itself.
Some mention of common compilers is inevitable.

However, I do think that a thread about your C compiler is very much
on-topic. You might get into a heated discussion, but that's hardly
unusual. You might get tips or ideas, or answers to questions.

> Meanwhile many here are happy to participate in endless discussions
> about the 'halting' problem.

Actually, almost all the comp.lang.c regulars were strongly against such
discussions, at least when it became clear that the person behind them
was off his head and not the slightest bit interested in listening to
rational arguments or mathematical realities.

But "whataboutism" is not a good argument anyway.

Richard Harnden

unread,

Sep 29, 2023, 10:48:08 AM9/29/23

to

On 29/09/2023 11:35, Malcolm McLean wrote:

Perhaps ...

He can tell intuitively that 7 things annoy him, but he can only be
concurrently irritated about 3 of them.

...?

No? I'll get my coat.

Anton Shepelev

unread,

Sep 29, 2023, 11:18:14 AM9/29/23

to

Kenny McCormack about Tim:

> That's all he does - posts the most boring, uninteresting
> drivel imaginable, but, yet, at least by the looser
> standard now generally used in CLC, on-topic.

Technical discussions can be boring to an observer who has
mastered the subject, but I for one appreciate Tim's
feedback to my articles.

--
() ascii ribbon campaign -- against html e-mail
/\ www.asciiribbon.org -- against proprietary attachments

Bart

unread,

Sep 29, 2023, 11:46:29 AM9/29/23

to

On 29/09/2023 15:40, David Brown wrote:

> However, I do think that a thread about your C compiler is very much
> on-topic. You might get into a heated discussion, but that's hardly
> unusual. You might get tips or ideas, or answers to questions.

You might be pleased to know that the revised version is less verbose by
default when more than one file is involved.

But it is still somewhat more so than gcc:

c:\luac>gcc @lua

c:\luac>mcc @lua
Compiling 33 files to lua.exe

Tim Rentsch

unread,

Oct 3, 2023, 11:21:46 AM10/3/23

to

Anton Shepelev <anto...@gmail.moc> writes:

> Kenny McCormack about Tim:
>
>> That's all he does - posts the most boring, uninteresting
>> drivel imaginable, but, yet, at least by the looser
>> standard now generally used in CLC, on-topic.

Apparently Kenny has forgotten a recent posting of mine,
responding to his posting asking something, that was meant
to provide a helpful answer to his question.

> Technical discussions can be boring to an observer who has
> mastered the subject, but I for one appreciate Tim's
> feedback to my articles.

Thank you, it's nice to hear that someone appreciates my
comments.

Tim Rentsch

unread,

Oct 15, 2023, 9:10:44 PM10/15/23

to

I presume this question is meant to be rhetorical. Still
it might be useful to write down some of the precepts I
try to follow. Please add "I try to", or words to that
effect, at the start of each of the following.

Distinguish facts, beliefs, and opinion.

Emphasize communication and exchange of views, seeking
to convey rather than convince; prefer discussion to
debate. The first goal is for each person to understand
the other.

Avoid oversimplification: I should at least mention
when a statement is only approximately true rather than
completely accurate.

Double check any statements for correctness, accuracy,
and completeness.

Stay out of arguments about programming style. I very
much want to understand why people prefer to write code
the way the do, but in most cases there is little to be
gained in having an argument about it.

Follow general rules of good writing - be concise, direct,
unambiguous, accurate, and easy to read. Always review
and revise; do at least one final reading before sending.

Observe topicality guidelines. Many times when I am
wondering about whether to post a followup, noticing
that the discussion has strayed well outside the lines
of what is topical makes the decision easy: don't post.

Post only when what I would say seems to add value. If
someone else has answered the question, there is no reason
to reply unless I have something useful to add.

Be calm before following up. Usually it helps to let hot
blood cool down before continuing an emotional exchange.

Respond to what is written, not to the personality of the
writer. Avoid ad hominem remarks.

Practice active listening - both making an effort to hear
and understand what other people are saying, and checking
back to make sure they are following what I am saying.

Both respect others' preferences and stay faithful to my
own judgments about what to say and how to say it. Be
polite and considerate but also firm. I like to think
I'm getting better at this one; I am working on it.

Strive to improve. Needless to say I am not always
successful in following these principles so when I
notice a shortcoming I try to do better next time.

(No doubt there are some additional habits or principles
I have left out, but I am stopping here at least for now.)

Tim Rentsch

unread,

Oct 15, 2023, 10:27:12 PM10/15/23

to

Bart <b...@freeuk.com> writes:

> On 29/09/2023 06:10, Tim Rentsch wrote:
>
>> Bart <b...@freeuk.com> writes:
>>
>> to say about your experience working on compilers, especially
>> since those compilers are not written in C
>
> Is that really relevant? Does a compiler for language L have to be
> written in L to be taken seriously?

What matters is whether you have something to say about
programming in C or about the C language. Your compiler
isn't written in C so anything you have to say about the
compiler itself isn't relevant here, whether or not it
should be taken seriously.

In talking about how the compiler behaves, you don't say
anything that pertains to the C language. I don't think
this notion is very hard to understand. Talking about the
environment in which the compiler is built, even though it
might be interesting in other contexts, still has nothing
to do with the C language.

>> and aren't faithful to
>> the C language (and it isn't clear whether you don't know that or
>> if you just don't care).
>
> In which ways?

No one knows but you, and it's not even clear if you know.
Ironically, if you were to go through and make up a list of
differences between what your compiler accepts and what the
C standard requires, and present that list here, that WOULD
be topical, especially if there were reasons related to how
easy or hard some aspects of C are to compile. For reasons
beyond understanding you leave out exactly the pieces of
information that would make it relevant in comp.lang.c. I'm
at a loss to understand why you do that.

> My product compiles a C 'subset' but does not formally
> define what it is.

An informal definition is a lot better than no description
at all, and at least is something about the C language.
So figure out what the discrepancies are (and only you can
do that), and tell us about it. That's why we're here!

> [speed of compiler]

There's nothing wrong with feeling pride in having written
a fast program (in a non-C language), but it's not a subject
of interest in this newsgroup.

> But it no longer supports my own extensions, like the many that gcc
> provides, and it no longer tries to improve on the language or fail
> programs on features I feel ought to be deprecated.

What would be relevant and possibly interesting is to say
what C language constructs you think should be considered
or left out. Not what constructs IN OTHER LANGUAGES you
think are nice but what additions or changes to C you think
could add value. Explain what, and leave it at that; let
other people form their own opinions about how valuable
each change might be.

Bart

unread,

Oct 16, 2023, 8:41:21 AM10/16/23

to

On 16/10/2023 03:26, Tim Rentsch wrote:
> Bart <b...@freeuk.com> writes:

>> My product compiles a C 'subset' but does not formally
>> define what it is.
>
> An informal definition is a lot better than no description
> at all, and at least is something about the C language.
> So figure out what the discrepancies are (and only you can
> do that), and tell us about it. That's why we're here!

OK, an informal definition is the subset of C that I personally use, and
that I saw being commonly used on open source projects, at the start of
2017.

It would loosely be C99 but missing Complex, VLAs, designated
initialisers, compound literals, but with _Generic from C11 (that one
could be trivially implemented in about 40 lines of code).

Since 2017, those missing features are much more commonly used, but I
haven't changed the front end of the compiler.

There are also a whole bunch of restrictions and points of non-conformance.

Still, the compiler works on these projects there were used for testing:

* sqlite3.c + shell.c
* Tiny C 0.9.27 (0.9.28 uses extra features)
* Lua 5.4
* Pico C interpreter
* BBX (Malcolm's resource compiler)
* LIBJPEG 9e
* Piet (an interpeter for programs that look like Mondrian paintings)

Plus a bunch of smaller ones. Plus all the generated C code from my
tools, but that uses an even smaller subset (eg. there are no #includes
and no macros).

>> [speed of compiler]
>
> There's nothing wrong with feeling pride in having written
> a fast program (in a non-C language), but it's not a subject
> of interest in this newsgroup.

I didn't mention the speed of the compiler. I said I can /transpile/ the
source code to C more or less instantly, since the choice of the
implementation language seemed to be an issue for you.

The actual speed of MCC is limited by having to use an intermediate ASM
stage.

Tim Rentsch

unread,

Oct 20, 2023, 10:14:57 AM10/20/23

to

Bart <b...@freeuk.com> writes:

> On 16/10/2023 03:26, Tim Rentsch wrote:
>
>> Bart <b...@freeuk.com> writes:
>
>
>>> My product compiles a C 'subset' but does not formally
>>> define what it is.
>>
>> An informal definition is a lot better than no description
>> at all, and at least is something about the C language.
>> So figure out what the discrepancies are (and only you can
>> do that), and tell us about it. That's why we're here!
>
> OK, an informal definition is the subset of C that I personally
> use, and that I saw being commonly used on open source projects,
> at the start of 2017.
>
> It would loosely be C99 but missing Complex, VLAs, designated
> initialisers, compound literals, but with _Generic from C11 (that
> one could be trivially implemented in about 40 lines of code).

Given that, a good target would be to aim for conformance with
C11, where Complex and VLAs are optional. Compound literals
and designated initializers shouldn't be that hard to add, and are
both very useful. If you can get your compiler up to the level
of C11 conformance, or at least close enough, I would consider
trying it with my current project (which is written entirely
in C). The code in that project uses both compound literals and
designated initializers. Do you handle bitfields? I use those
also.

> Since 2017, those missing features are much more commonly used,
> but I haven't changed the front end of the compiler.
>
> There are also a whole bunch of restrictions and points of
> non-conformance.

Please elaborate. These details are just the sort of thing
that the group is here to discuss.

> Still, the compiler works on these projects [list]

This doesn't tell me anything since I am not familiar with
these code bases.

>>> [speed of compiler]
>>
>> There's nothing wrong with feeling pride in having written
>> a fast program (in a non-C language), but it's not a subject
>> of interest in this newsgroup.
>

> I didn't mention the speed of the compiler. [...]

Sorry, I misread your earlier comment. Those comments were on a
subject not of interest in the newsgroup, but indeed they were not
about the speed of the compiler. My apologies for the incorrect
summary.

Tim Rentsch

unread,

Oct 21, 2023, 12:13:36 AM10/21/23

to

Malcolm McLean <malcolm.ar...@gmail.com> writes:

> On Friday, 29 September 2023 at 11:21:44 UTC+1, Bart wrote:
>
>> On 29/09/2023 06:10, Tim Rentsch wrote:
>>
>>> I know that your MO is to be a rude, self-centered, insecure
>>> jerk.
>>
>> What's yours?
>
> Oh don't lower the tone by replying to stuff like that.
>
> I think Tim is in a mood.

If I am in a mood it's because a few self-centered individuals
repeatedly engage in voluminous and protracted discussions whose
overlap with material germane to the topic of the newsgroup can
be measured in at most small numbers of square angstroms, and who
persist in doing so despite multiple requests not to.

> He posted something negative about me as well.

That's wrong. My comments were about what you wrote, not about
you. I recommend to everyone that they learn to tell the
difference between the two.

> It's not normal for him.

It is unusual for me to make direct ad hominem remarks. I
confess that I don't respond well to flagrantly rude and
inconsiderate behavior.

BGB

unread,

Oct 21, 2023, 4:15:13 PM10/21/23

to

FWIW, in my C compiler (BGBCC, as-is):
_Complex exists, but isn't well tested;
VLAs sort of exist, but:
They only work for 1D arrays of primitive types (no structs).
Basically, C89 with parts of the newer standards glued on ad-hoc.
Features were added more "as useful".
Supports some parts of C23 as well.
Supports _BitInt, partial 'auto', ...
Also supports the proposed (C++ style) lambda syntax.
Though, semantics don't exactly match C++ lambdas (1).
Also the new attribute syntax and similar.
Along with UTF-8 strings and similar as well.
Also the 'stdbit.h' stuff, "0b" literals, digit separators, ...
...

Does not support '_Generic()' though as of yet.

*1: alloca, VLAs, and (automatic) lambdas are implemented with automatic
storage being implemented via heap allocation (with automatic freeing
via an implicit linked list). Note that my compiler and ABI design don't
allow for the size of stack-frames to change at runtime.

It is written in C, but currently only targets my own ISA (though,
previously, did target the Hitachi SuperH SH-2 and SH-4 ISA's; but this
hasn't been tested in a while).

The ISA design had distantly evolved out of SH-4, but has now mutated
beyond recognition. It is a LIW/VLIW design with 64 GPRs, 1-3
instructions per bundle, and variable-length instructions (16/32/64/96
or 32/64/96-bit, depending on ISA variant).

But, main reason for a custom C compiler here being mostly that no other
compilers support my ISA design, and trying to add a new target to GCC
or LLVM looked like far more pain than it was worth.

The C dialect supported, as well as the design of the command-line, is
generally more like that of MSVC (it borrows some amount of extension
keywords/attributes from MSVC, including a lot of its "__declspec"
modifiers and similar).

There are various language extensions as well, but mostly as-useful.
Most commonly used are __int128 and similar (which adds 128-bit integer
types), as well as extensions for SIMD vectors and similar (with
notation and semantics loosely derived from the OpenGL shader language /
GLSL).

In this case, borrowing GLSL features being "less awful" than the
"xmmintrin.h" system used by MSVC; and generally more usable than GCC's
vector extension (also partly supported, for a limited selection of
vector sizes and types). Some aspects of MSVC's vector system were
retained, but with different semantics (__m64 and __m128 still exist but
serve as more abstract types for bit-level casting between types and
formats).

So, for example, it is possible to coerce a double to an integer type like:
double f;
unsigned long long fi;
f=3.14159;
fi=(__m64)f;
Which first converts the floating point type to __m64, which then
converted to an integer type, yields the bit-pattern of the
floating-point value (the reverse is also possible).

Note that my compiler isn't terribly smart, and isn't able to optimize
away the memory accesses from more traditional methods, for example:
fi=*(unsigned long long *)(&f);
Will require 'f' to memory and then reloading the value from memory,
whereas the '__m64' cast can sidestep the memory load.

Both 'union' and 'memcpy()' strategies suffer from the same basic
problems as the pointer deref in this case.

Note that in this case, 'double' is defined as using the IEEE Binary64
format.

Where:
double x; //Binary64
float y; //Binary32 (*2)
short float z; //Binary16 (*2)
long double w; //Binary128 (*3)

*2: For small scalar floating point types, typically Binary64 is always
used in registers; in this case, the smaller representation only exists
when the value is stored in memory.

*3: Though, using this is handled by falling back to software emulation,
so 'long double' comes at a significant speed penalty (this differs some
from MSVC behavior, where 'long double' decays to 'double').

Though, more like GCC, 'sizeof(long)'=='sizeof(void *)' in most
contexts, as it seemed more useful to keep 'long' as consistent with the
size of a native pointer, than to keep 'long' as 32 bits.

There is also sometimes used extensions for working with dynamic types.
Technically, it is possible to write C with using more JavaScript-like
types, but this comes with a performance penalty (most operations
involve runtime calls), and the type-systems interact in potentially
rather non-trivial ways in more non-trivial situations (it isn't really
possible to provide a "seamless" integration if mixing data between
them, as the way C types and data structures behave is very different
from how things work as dynamic types).

Many other extensions mostly exist in terms of the C library, such as
support for things like:
zone based memory allocation;
Allocations can be categorized during alloc,
with a bulk free for any memory objects in a given category
Allocation meta categories
Such as for requesting memory with read/write/execute access, ...
"__memlzcpy", which implements overlapping copy as needed for LZ77
Copying memory over itself generating a repeating pattern of bytes.

Say:
unsigned char *arr;
...
__memlzcpy(arr+1, arr, 256)
Behaving similarly to:
memset(arr+1, *arr, 256);

But, supporting arbitrary strides, and trying to "efficiently" implement
the various cases (a naive byte-for-byte copy loop is horribly slow; but
writing these sorts of copy-operations is a way that is efficient, and
works correctly, across targets, tends to be rather ugly and unwieldy).

Note that this is nearly the exact opposite behavior in this case from
something like "memmove()" (when copying memory to a higher address in
the case of overlap). Note that if copying memory to a lower address,
it will behave more like "memmove()" (and will behave more like
"memcpy()" in the case of non-overlap).

There is also "__memlzcpyf()" which does similar, but makes the
optimizing assumption that the user doesn't care about the bytes
directly following the end of the destination (so, we can allow the
following 16 bytes or so to be stomped by the copy; rather than have the
copy be bogged down by trying to copy an exact number of bytes).

...

Tim Rentsch

unread,

Oct 22, 2023, 10:57:10 AM10/22/23

to

For a compiler to be one I might be able to use, it must

* conform to either the C99 or C11 standard (some
documented shortcomings might be acceptable, depending
on what they are)

* produce .o files in at least one environment I can use
(right now that is only GNU/Linux), including support
for a -fPIC option

* generate code of at least reasonable quality; not
necessarily at the -O2 level of gcc or clang, but
better than -O1

If you get to something roughly or possibly approximating the
list above, please keep the group posted!

BGB

unread,

Oct 22, 2023, 5:18:15 PM10/22/23

to

It mostly supports roughly the common subset of C99 and C11.
Though, most of the code I have ported to my ISA has been C89, so newer
functionality isn't well tested.

> * produce .o files in at least one environment I can use
> (right now that is only GNU/Linux), including support
> for a -fPIC option
>

At present, generating code on any mainstream targets was not a
priority, as MSVC/GCC/Clang already address these cases fairly effectively.

My previous small-scale attempts at generating code for ARM targets had
given horribly bad performance relative to GCC, so it didn't really seem
worthwhile.

As-is, it produces "RIL3" as its "object file" output, which is a
stack-oriented bytecode along vaguely similar lines to .NET bytecode (it
does the preprocessing and parsing, then translates the resulting AST
into a stack-oriented bytecode; generally using implicit types which are
carried along the with stack, like .NET and unlike JVM, ...).

Though, there are larger scale architectural differences (metadata is
managed differently; and my design assumes a C-like or "bare metal"
environment, rather than a "Managed VM", ...).
Though, actual .NET bytecode would be a crappy way of expressing C and
similar (making it less desirable at the time), and is used mostly as a
stand-in for object files and static libraries (with the final output
being in the form of native-code binaries). Though, one could argue, at
least .NET's bytecode would be "less bad" at representing C than the
JVM's ".class" file-format (in that, at least, doing so is not "comical").

Though, things "could be better", but not come up with any "clearly
better" option (native-code objects and a three-address-code IR are all
tradeoffs); and other factors are mostly matters of data-packaging and
how to best represent the metadata, ...

I had on-off considered whether to consider using IR images and AOT
compiling them per target. Would preferably need a lighter weight
backend though. Current backend loads and unpacks all of the bytecode
and metadata; saving memory would require ability to translate functions
one-at-a-time, and determining things like what is reachable, preferably
without needing to first translate everything into 3-address form, ...

All this would require a fair bit of redesign though (and a different
packaging scheme for the IR bytecode and metadata; as the current
structure is not well suited to random access as it was designed around
the the assumption of a sequential loader). Things like RAM footprint of
reading in and accessing the bytecode images would also need to be
considered, etc. Well, along with the battle over the relative merits of
a stack-machine vs three-address-code representation for the IR, etc.

Note that many aspects of the target machine can be glossed over in the
IR, though things like "sizeof(void *)" and similar remain as thorny
issues (so, if relevant, one may still need to build versions of the
libraries for each potential pointer size).
As-is, it sort of works, but trying to mix/match here mostly tends to
result in a bunch of hairy bugs typically revolving around "#ifdef's"
and "typedef" and similar (which effect other code, even if the C
library itself can be written to be mostly pointer-size agnostic).

Well, and relatedly, there is another "__ifarch(feature)" extension that
mostly allows enabling/disabling functions of blocks of code depending
on target-specific options, however (unlike "#ifdef") requires the code
to be structurally and syntactically valid (and the function signatures
need to match across all ifarch paths, etc). This differing from
"#ifdef" mostly in that it is handled much later in the compiler pipeline.

An example of this would be to allow the same context-switch code to be
used on targets with 32 and 64 GPRs (where half of the registers don't
exist in the 32-register configuration). But, would be annoying to
compile separate versions of the runtime libraries based on things like
whether the configuration is using 32 or 64 GPRs, which variant of the C
ABI is being used, etc.

This feature is generally also available in ASM code and inline ASM and
similar as well (and also can enable or disable the use of ASM versions
of functions vs C counterparts, etc).

The backend then translates this to native code, at present emitting
binaries in a PE/COFF variant (loosely derived from the WinCE/SH
variant); but with some tweaks and typically LZ4 compressed (the LZ4
compression is integrated with the PE/COFF loaders, via the modified
format).

The LZ4 decompression is faster than reading more data from an SDcard
running at 13MHz in SPI mode. Where, I am using 13MHz mostly as I could
not get reliable operation much over this speed (and have not
implemented the logic to support UHS, and even if I did, I couldn't get
much more bandwidth over the existing MMIO bus).

When operating in the LZ4-compressed mode, the PE/COFF checksums also
use a different (slightly stronger) algorithm, as the original PE/COFF
checksum algorithm was insufficient to detect many of the problems a
misbehaving LZ4 decoder could introduce.

The code produced is "mostly" position independent, but does still rely
on "base relocations" for some things. The ABI and format was tweaked
slightly to allow running multiple logical instances of a given binary
within a single shared address space. Effectively, the
'.text'/'.rdata'/etc sections being accessed separately from '.data'/'.bss'.

Where, in this case, modifiable sections are separately allocated and
accessed relative to the "global pointer", with the global pointer
pointing to a lookup table which allows each PE image (EXE or DLL) to
locate its corresponding version of the section (updating the global
pointer to point to its own data section). This global pointer is
callee-save, so on return, the caller's global-pointer is restored.

Typically, the base-relocations were also split up, with one part being
applied to the executable sections when initially loaded; and the other
to the data sections when a new process instance is created.

Note that saving/restoring/updating the global-pointer is skipped for:
Functions which are not callable as either function pointers or DLL
exports ("__declspec(dllexport)" or similar);
Leaf functions which don't access any global variables.

This strategy seemed to be lower overhead than the mechanism used in ELF
FDPIC (where this hackery was mostly handled on the caller-side by using
multiple entries in the GOT and by having function pointers as pointers
into the GOT).

All of this running on a sort of quick/dirty makeshift OS (sort of
vaguely resembling a weird Linux/Windows/MS-DOS hybrid); which as-is
treats the use of virtual memory as an optional/experimental feature
(all this stuff was designed to not assume the use of virtual memory).

Had partly started work on another compiler built around the goal of
being lightweight (preferably less code and memory footprint; goal being
to try to keep it under 50k lines).

This compiler would be closer to a more conventional design, but was
using a non-standard "WOFF" object format, which was effectively sort of
like COFF but mode simplistic and effectively built on a variant of the
WAD format (sort of like what was used in in the Doom engine games):
Lumps that began with '.' being sections;
Such as ".text" / ".data"
Lumps starting with '$' being metadata;
Such as the symbol/reloc tables ("$symtab"/"$rlctab").
Typically, section lumps preceding metadata lumps.
Section lump index was used as section-number, just starting at 1.
Section 0 being a "NULL section" as far as relocs go.

Otherwise, the metadata was "similar" to what existed in COFF (just
using a single pair of symbol and reloc tables, rather than giving each
section its own tables). Likewise, using the same entry format for both
the symbol and relocation tables (just one entry defines where to find
the symbol, and the other what to fixup with the address of the symbol).

A variant of WOFF was also considered as a possible option for bytecode
objects/images as well, but in this case, its main "competitor" being
the use of RIFF with a file-structure vaguely similar to the RIFF AVI
format (and some past musings for possible WAD/RIFF hybrid formats, ...).

Though, for "whatever-ar", it would make sense to stick with the
traditional "!<arch>" format for static libraries and similar in any
case (regardless of the use of COFF/ELF/WOFF whatever for the object
files); though technically, "whatever-ar" could probably also use a WAD
variant or similar and nothing would likely notice (and sidesteps the
hair of the "arch" format using ASCII-based data fields for "who knows
what reason"); partly as each cross compiler provides its own versions
of all the binutils (but, either way).

Though, partly, this was because COFF seemed to have some needless
complexity vs "just throw the assembler's output in a WAD file".
Also skipped ELF mostly as it also seemed needlessly complex, and
doesn't really match as well to my ISA and ABI design.

This compiler would have also been intended to more closely mimic GCC's
interface (hopefully close enough that it would be less of a stretch to
try to use it as a cross-compiler in autotools).

This effort kinda stalled though by me going and working on other stuff.

And, I didn't really get the compiler anywhere near complete and it was
already at 30 kLOC, so it doesn't look like it would achieve the "less
than 50 kLOC" goal.

For contrast, my existing compiler weighs in at closer to 250 kLOC.

Though, relatedly, there was still the goal of "compile a program using
less than 4MB of RAM or so", where my existing compiler needs
significantly more than this.

Like, 180MB to recompile something like GLQuake from source effectively
precludes running it on the FPGA boards, at least not without using
virtual memory (this being a big drawback of the "do all the code
generation for everything all at once" strategy; but could be kept lower
with separate compilation and linking).

> * generate code of at least reasonable quality; not
> necessarily at the -O2 level of gcc or clang, but
> better than -O1
>

Code generation is still kinda hit/miss.

By default, it assumes some relatively conservative semantics:
Integer wrap on overflow (1);
Does not perform aliasing optimizations be default (2).

*1: Some old programs, particularly my ROTT port, seem rather sensitive
to integer overflow behavior (so, if build with GCC, typically need
"-fwrapv -fno-strict-aliasing" and similar).

*2: It may cache memory loads, but treats any explicit memory store that
can't be proven not to alias as potentially aliasing.

Had experimented with TBAA, but I don't really consider it "sufficiently
safe" to be treated as a default option.

Generally, it also assumes that most pointers may be unaligned, with a
partial exception for things like "__m128 *" and similar, which assume
aligned access (there is a separate "__m128u" type for unaligned access).

There are "__aligned" and "__unaligned" modifiers for this, with
"__packed" also assumed as implying "__unaligned" (and is also implied
as a default it "#pragma pack(1)" or similar is used, ...).

Though, some of this would matter more for SuperH (which by default
assumed aligned memory access), whereas in my ISA, most memory access
instructions are unaligned (with the main exception of the 128-bit
load/store instructions). Though, performance may still be better if
alignment is preserved (also packed structs have a penalty in that the
fixed-displacement load/store ops only encode aligned cases, so
accessing a misaligned struct member requires a multi-instruction sequence).

Register allocation tends to be more naive than GCC, as it generally
divides registers into one of two categories:
Statically assigned across the entirely function;
Dynamically assigned within each basic-block.
Any such variables are spilled to memory at the end of the block.

In many cases, this does increase the register pressure needed to get
"decent" performance, which is part of the incentive for my ISA design
having 64 GPRs (this makes it easier to often static-assign *everything*
into GPRs for many functions).

Though, have noted that there seems to be a relative performance
advantage for code using large numbers of local variables or large
amounts of state being updated inside of loops.

Relative to x86-64, "work per clock cycle" for my ISA seems to beat out
x86-64 for things like functions with 100+ local variables that update
"most of them" on each loop iteration (as a single giant loop body).
Granted, this is a bit niche...

But, if you write something like:
for(i=0; i<1000000; i++)
dst[i]=src[i];
The relative performance of my ISA is crap...

Also, it doesn't currently understand (or perform) any such
optimizations as loop unrolling or function inlining.

But, the code generation kinda falls on its face for targets with a
smaller number of registers (such as 16).

Comparably, GCC is able to handle individual values flowing from one
basic block to another, which holds up better on targets that can't
throw a large number of GPRs at the problem (eg: doesn't turn into
excessive amounts of spill-and-fill).

> If you get to something roughly or possibly approximating the
> list above, please keep the group posted!

Not sure...

Trying to compete with GCC or friends on their "home turf" wasn't really
something I had considered.

Bart

unread,

Oct 25, 2023, 11:29:47 AM10/25/23

to

On 16/10/2023 03:26, Tim Rentsch wrote:
> Bart <b...@freeuk.com> writes:
>
>> On 29/09/2023 06:10, Tim Rentsch wrote:

>>> and aren't faithful to
>>> the C language (and it isn't clear whether you don't know that or
>>> if you just don't care).
>>
>> In which ways?
>
> No one knows but you, and it's not even clear if you know.
> Ironically, if you were to go through and make up a list of
> differences between what your compiler accepts and what the
> C standard requires, and present that list here, that WOULD
> be topical, especially if there were reasons related to how
> easy or hard some aspects of C are to compile. For reasons
> beyond understanding you leave out exactly the pieces of
> information that would make it relevant in comp.lang.c. I'm
> at a loss to understand why you do that.

Rather than list all the shortcoming of my compiler, easier to say that
the front-end needs a rewrite.

The overhaul I did in September this year was in replacing the backend,
which addressed some urgent code-gen issues and hopefully made it less
buggy.

At the moment I'm not looking at working on the front-end, and the
project remains a personal one used for C code I write, the C I
generate, and whatever external C I come across that is within its
capabilities. That includes most stuff posted here.

> if there were reasons related to how easy or hard some aspects of C
are to compile

One thing that makes it hard IMO is that a lot of C isn't very
rigorously defined. You can easily have a program that either passes,
fails, or passes with warnings, depending on compiler and options.

Or take this sequence:

int a = 0;
int b = {0};
int c = {{0}};

The first is OK, so is the second (why?), but the third gives a warning
on gcc. (Mcc passes the first two and fails the third.)

I know that when an array/struct is initialised, extra braces are not
allowed, but fewer braces are OK:

int a[2][3] = {1,2,3,4,5,6};
int b[2][3] = {{1,2,3}, {4,5,6}};

Here, gcc passes both, even though the shape of the data doesn't match
the type of 'a'.

There is a some algorithm involved to figure out which braces can be
legally left out. Mcc doesn't use it; it requires braces to exactly
match the type's data structure, so 'a' fails, and 'b' parses.

Here's one more:

static int a;
extern int a;
static int a;

The above is passed by gcc, but this fails (Mcc just passes both);

extern int a;
static int a;
extern int a;

There is tons of this stuff. Somehow I don't think that rewrite is going
to happen. To write a compiler you need have clear rules about what
constitutes a valid program.

C is just too lax amd too open-ended, sorry.

Ben Bacarisse

unread,

Oct 25, 2023, 3:53:07 PM10/25/23

to

Bart <b...@freeuk.com> writes:

> ... To write a compiler you need have clear rules about what constitutes
> a valid program.

You don't think the rules exist, or you don't consider them to be clear?

--
Ben.

Keith Thompson

unread,

Oct 25, 2023, 4:42:33 PM10/25/23

to

Bart <b...@freeuk.com> writes:
[...]

> One thing that makes it hard IMO is that a lot of C isn't very
> rigorously defined. You can easily have a program that either passes,
> fails, or passes with warnings, depending on compiler and options.

Any behavior of a compiler invoked in non-conforming mode (including
most C compilers in their default) is irrelevant to the question of how
rigorously C is defined.

> Or take this sequence:
>
> int a = 0;
> int b = {0};
> int c = {{0}};
>
> The first is OK, so is the second (why?), but the third gives a
> warning on gcc. (Mcc passes the first two and fails the third.)

The first and second are OK because the standard explicitly says so.
N1570 6.7.9p:

The initializer for a scalar shall be a single expression,
optionally enclosed in braces.

As for the third, with "-std=c17 -pedantic-errors", gcc warns "braces
around scalar initializer", while clang gives a fatal error "too many
braces around scalar initializer".

One could argue that "enclosed in braces" allows both {...} and {{...}}.
I don't think I'd make that argument myself. But the requirement is in
the Semantics section, not Constraints, so violating it is not a
constraint violation. IMHO it *should* have been a constraint. I think
clang is incorrect to reject it in conforming mode, though a warning is
certainly appropriate.

So yes, in this specific case the standard is a bit vague about the
validity of `int c = {{0}};`.

> I know that when an array/struct is initialised, extra braces are not
> allowed, but fewer braces are OK:
>
> int a[2][3] = {1,2,3,4,5,6};
> int b[2][3] = {{1,2,3}, {4,5,6}};
>
> Here, gcc passes both, even though the shape of the data doesn't match
> the type of 'a'.
>
> There is a some algorithm involved to figure out which braces can be
> legally left out. Mcc doesn't use it; it requires braces to exactly
> match the type's data structure, so 'a' fails, and 'b' parses.

Other than {{scalar}}, the rules for braces in initializers are
complicated but unambiguous. Allowing nested braces to be omitted lets
{0} be a valid initializer for any object type, which can be very
convenient. (C23 allows {}.)

[...]

> C is just too lax amd too open-ended, sorry.

I don't think it's nearly as lax as you say it is.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

bart

unread,

Oct 25, 2023, 4:46:30 PM10/25/23

to

The rules probably do their best trying to formalise a messy, untidy
language due to diverse implementations.

But no I don't consider them to be clear.

bart

unread,

Oct 25, 2023, 8:31:14 PM10/25/23

to

It also allows this:

int a[2][3] = {1,2,4,5,6};

Element '3' has been left out by mistake. Now all elements are out of
step, and the last has value 0 instead of 6.

I couldn't believe it when I discovered this.

(I also wasn't that keen on missing elements being set to zero, when
when the braces are correct, since you can have a similar problem when
elements can be left out by mistake, or a comma omitted: {1,2,34,5,6}.
I'd have prefered a syntax like a trailing ... here to denote padding
with zeros.)

>
> [...]
>
>> C is just too lax amd too open-ended, sorry.
>
> I don't think it's nearly as lax as you say it is.

Perhaps you haven't tried to implement it!

Keith Thompson

unread,

Oct 25, 2023, 9:18:22 PM10/25/23

to

C allows something you don't like. You're shocked. Lather, rinse, repeat.

It deliberately allows omitting all but the outer braces, so that a
multi-dimensional array can be initialized with a single brace-enclosed
list of values. Any trailing values (struct members or array elements)
that are not specified are initialized to zero.

It also allows:

int a[2][3] = {42};

which sets a[0][0] to 42 and all other elements to 0, and the universal
initializer:

some_type obj = {0};

falls out of the rules (C++ and C23 allow {}).

If you want to avoid that kind of error, don't omit the inner braces.

Large initializers are likely to be automatically generated, so barring
a bug in the code generator this problem is unlikely to occur.

[...]

Please keep in mind that nobody here is either responsible for the
current rules or in any position to change them.

bart

unread,

Oct 26, 2023, 8:10:24 AM10/26/23

to

You've said this kind of thing before; you can't just avoid inadvertent
errors. You might as well take out ALL error checking from a compiler,
and advise users to get around that by never making a mistake!

> Large initializers are likely to be automatically generated, so barring
> a bug in the code generator this problem is unlikely to occur.
>
> [...]
>
> Please keep in mind that nobody here is either responsible for the
> current rules or in any position to change them.

I'm in a position to do that something about that, in a very small way.
Somebody using my compiler /has/ to write {{1,2,3},{4,5,6}} not
{1,2,3,4,5,6}, so that particular error can't get through.

Such code is still valid C compilable anywhere. And that user may get
into the habit of always getting the braces matching the data structure.

At the minute, if you see `= {0}` being used to initialise something in
this strictly typed language, it can be initialsing ANY scalar type, ANY
array of any number of dimensions, any size and any element type, and
ANY struct.

That is not a bit lax? OK..

David Brown

unread,

Oct 26, 2023, 9:50:35 AM10/26/23

to

On 26/10/2023 14:10, bart wrote:
> On 26/10/2023 02:17, Keith Thompson wrote:

>
>> It deliberately allows omitting all but the outer braces, so that a
>> multi-dimensional array can be initialized with a single brace-enclosed
>> list of values. Any trailing values (struct members or array elements)
>> that are not specified are initialized to zero.
>>
>> It also allows:
>>
>> int a[2][3] = {42};
>>
>> which sets a[0][0] to 42 and all other elements to 0, and the universal
>> initializer:
>>
>> some_type obj = {0};
>>
>> falls out of the rules (C++ and C23 allow {}).
>>
>> If you want to avoid that kind of error, don't omit the inner braces.
>
> You've said this kind of thing before; you can't just avoid inadvertent
> errors. You might as well take out ALL error checking from a compiler,
> and advise users to get around that by never making a mistake!
>

In this particular case, you most certainly /can/ avoid inadvertent
errors - and you can avoid omitting the inner braces.

You know the dimensions of the array you are initialising - it's not
hard to get the braces right. You might accidentally get the number of
elements wrong, but the braces are easy.

So how do you avoid getting the number of elements wrong? As Keith
said, for many large initialised arrays, the data is automatically
generated in some way. It is also very common for there to be a layout
pattern that makes errors more obvious. Having too many initialisers is
a constraint error (IIRC) - compilers should complain about that. Too
few initialisers is allowed, and missing initialisers will be filled
with zeros - that's considered a feature in C, not a bug (whether or not
you like it). If I had reason to suspect that I might be missing a
value in my own code, I would probably just add a fake value (so that I
have too many initialisers, assuming there was no missing value) and
check for the error or warning when compiling it.

More commonly, however, I simply don't give the length explicitly for
initialised arrays. So rather than writing:

enum { no_of_elems = 5 };
int xs[no_of_elems] = { 1, 2, 3, 4, 5 };

I'd write :

int xs[] = { 1, 2, 3, 4, 5 };
enum { no_of_elms = sizeof(xs) / sizeof(xs[0])};

And often I'd have a static_assert on "no_of_elms".

That gives pretty much no chance of errors in the number of initialisers.

For your own compiler, you are of course free to add warnings to check
initialiser counts and bracketing. Indeed, I would encourage it - that
helps users keep to a subset of C that you feel is safer. But in the
interests of compiling code from other sources, such warnings need to be
optional (or at least non-fatal).

>
>
>
>> Large initializers are likely to be automatically generated, so barring
>> a bug in the code generator this problem is unlikely to occur.
>>
>> [...]
>>
>> Please keep in mind that nobody here is either responsible for the
>> current rules or in any position to change them.
>
> I'm in a position to do that something about that, in a very small way.
> Somebody using my compiler /has/ to write {{1,2,3},{4,5,6}} not
> {1,2,3,4,5,6}, so that particular error can't get through.
>
> Such code is still valid C compilable anywhere. And that user may get
> into the habit of always getting the braces matching the data structure.
>

That's a good habit (IMHO). But if you are going to accept code from
other people, you need optional warnings rather than fatal errors.

> At the minute, if you see `= {0}` being used to initialise something in
> this strictly typed language, it can be initialsing ANY scalar type, ANY
> array of any number of dimensions, any size and any element type, and
> ANY struct.
>
> That is not a bit lax? OK..
>

You say "lax", others might say "flexible" or "consistent". And Keith
did not say that C was not lax - he merely said he does not think it is
as lax as you said it was.

You might not like the rules (and I would agree with you in this case),
but that does not make the rules bad, unclear or "lax" - it just means
they follow a different set of priorities and design rules that you
personally would prefer.

Tim Rentsch

unread,

Oct 26, 2023, 12:34:53 PM10/26/23

to

Keith Thompson <Keith.S.T...@gmail.com> writes:

> Bart <b...@freeuk.com> writes:
> [...]

>
>> Or take this sequence:
>>
>> int a = 0;
>> int b = {0};
>> int c = {{0}};
>>
>> The first is OK, so is the second (why?), but the third gives a
>> warning on gcc. (Mcc passes the first two and fails the third.)
>
> The first and second are OK because the standard explicitly says
> so. N1570 6.7.9p:
>
> The initializer for a scalar shall be a single expression,
> optionally enclosed in braces.
>
> As for the third, with "-std=c17 -pedantic-errors", gcc warns
> "braces around scalar initializer", while clang gives a fatal
> error "too many braces around scalar initializer".
>
> One could argue that "enclosed in braces" allows both {...} and
> {{...}}. I don't think I'd make that argument myself. But the
> requirement is in the Semantics section, not Constraints, so
> violating it is not a constraint violation. IMHO it *should* have
> been a constraint. I think clang is incorrect to reject it in

> conforming mode, [...]

I don't see how you reach the conclusion about clang. If you
think that in 6.7.9p11 the phrase "optionally enclosed in braces"
is meant to allow one pair of braces but not more than one, how
is it that you also think clang is not within its rights to
reject the declaration with two pairs of braces? The two
inferred statements seem inconsistent with respect to each other.
Is it that you think that violating a syntax rule or violating a
constraint are the only things that give an implementation
license to refuse to translate a source file? (Let me exclude
things like #error and #pragma from the discussion here.) I have
to admit to being baffled.

Keith Thompson

unread,

Oct 26, 2023, 3:03:30 PM10/26/23

to

I did miss something. This:

int c = {{0}};

has undefined behavior (assuming that "optionally enclosed in braces"
isn't meant to allow multiple braces). A compiler may reject a program
that has undefined behavior. Or it may successfully translate it, or
make demons fly out of your nose during execution.

The question is why the standard made it UB rather than a constraint
violation. I can't think of any good reason. Perhaps it was meant
to cater to one or more implementations that happen to accept
it; personally, I wouldn't consider that to be a *good* reason.
(Note that this case of UB is explicitly mentioned in Annex J,
which might suggest that it was a deliberate choice.)

It's possible that clang is choosing to reject it because it has
undefined behavior (which raises questions about what happens if it's
enclosed in `if (0) { ... }`, but I won't get into that). My
impression, however, is that clang is treating the requirement
as if it were a constraint.

Do you think that clang is allowed to reject `int c = {{0}};`?

If so, on what basis do you think so?

(To be clear, I'm asking about a basis for rejecting it specifically
because of the extra braces.)

Kaz Kylheku

unread,

Oct 26, 2023, 4:59:32 PM10/26/23

to

On 2023-10-26, Keith Thompson <Keith.S.T...@gmail.com> wrote:

> bart <b...@freeuk.com> writes:
>> It also allows this:
>>
>> int a[2][3] = {1,2,4,5,6};
>>
>> Element '3' has been left out by mistake. Now all elements are out of
>> step, and the last has value 0 instead of 6.
>>
>> I couldn't believe it when I discovered this.
>
> C allows something you don't like. You're shocked. Lather, rinse, repeat.

If the compilers that I use diagnose something, I don't necessarily care if
that is not required.

$ gcc -Wall -W braces.c
braces.c:1:15: warning: missing braces around initializer [-Wmissing-braces]

int a[2][3] = {1,2,4,5,6};

^
{ }{ }

The situations in which ISO C requires a diagnostic are rather sparse, and
there is no requirement to give any details about what is wrong and where.

A production compiler has to diagnose a lot more than that standard-required
minimum, and in more informative ways. This is a competitive area. If two
compilers optimize about equally well, and are similar in other attributes, you
might choose the one with better diagnostics (more diagnostics, more
configurable diagnostics, ...).

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazi...@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

Tim Rentsch

unread,

Oct 29, 2023, 12:17:53 PM10/29/23

to

I believe the intent of the C standard is that an implementation
may decline to accept a program containing this declaration and
still be conforming.

> If so, on what basis do you think so?

The standard requires implementations to accept any program that
is strictly conforming. Part of the definition of being strictly
conforming is that the program use only those features specified
in the standard. The standard does specify the case without any
braces, and also the case with exactly one pair of braces. I
agree with your reading of the phrase "optionally enclosed in
braces", from which follows the conclusion that the standard does
not specify using two pairs of braces around the '0' in this
declaration. Because the program is using a feature not
specified in the standard, the program is not strictly
conforming, and thus does not have to be accepted.