Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Cake - C23 to C99 transpiler

144 views
Skip to first unread message

Thiago Adams

unread,
Aug 18, 2022, 9:44:57 PM8/18/22
to

With C23 approaching and having a lot of small changes I decided
to focus my C front end to compile C11/C23 to C99.

Here is the online version
http://thradams.com/web3/playground.html

The project is here.
https://github.com/thradams/cake

It is not complete but it can:
* Convert C11 _Generics
* C11 u8 literals
* C23 digit separators
* c23 binary literals
* c23 nullptr bool static assert true false..
* c23 {} empty initializer
* c23 typeof
* c23 embed
* c23 warning
* elifdef elifndef

missing
* c11 atomic
* enum with types..
* BitInt Decimal and more..
* Decimal
* attributes are very incomplete
* vaopt

extensions

* defer
* lambdas without capture
* try catch blocks
* c++ 17 if with initializer
* repeat for (;;)
* typeid

The idea is not create a new language but implement and
try ideas that fits into C.

Blue-Maned_Hawk

unread,
Sep 6, 2022, 2:48:40 AM9/6/22
to
Why?
​
--
/blu.mɛin.dʰak/ | shortens to "Hawk" | he/him/his/himself/Mr.
bluemanedhawk.github.io
I think my Usenet provider stores their passwords in plain text. If i'm
acting suspiciously, chances are that that backfired on them.


Thiago Adams

unread,
Sep 6, 2022, 1:08:39 PM9/6/22
to
On Tuesday, September 6, 2022 at 3:48:40 AM UTC-3, Blue-Maned_Hawk wrote:
> Why?

The initial idea was to experiment new ideas for C compiling to standard C.

Transpiring C23 to C99 is a similar task the difference is that one day C compiler
will have the feature.

Some people may have this need to compile from C23 to C99 because they
have C23 in many platforms but there is one platform missing C23
but C99 is available. (But this was not my personal motivation)

The end of road map is to have a normal C compiler (like tcc) with some extensions.
I need to choose a backend now.




Vir Campestris

unread,
Sep 7, 2022, 7:27:30 AM9/7/22
to
On 06/09/2022 18:08, Thiago Adams wrote:
> I need to choose a backend now.

If your intent is to produce C99 shouldn't you be testing against every
compiler you can get your hands on, rather than picking one?

Andy

Thiago Adams

unread,
Sep 7, 2022, 8:45:44 AM9/7/22
to
Here "I need to choose a backend now." I was talking about
a x86 or intermediate backend or create a interpreter.

But choosing a compiler target for C.. yes it also is possible.
Some features may exist (like thread storage) and be different
depending on the compiler . So choosing the target compiler the
generator could generate code for that specific feature.

Also some auxiliary header could be added. For instance static_assert is
not part of c99 and I am removing it. But it could be a macro in this
auxiliary header.






Bart

unread,
Sep 7, 2022, 10:35:51 AM9/7/22
to
On 07/09/2022 13:45, Thiago Adams wrote:
> On Wednesday, September 7, 2022 at 8:27:30 AM UTC-3, Vir Campestris wrote:
>> On 06/09/2022 18:08, Thiago Adams wrote:
>>> I need to choose a backend now.
>> If your intent is to produce C99 shouldn't you be testing against every
>> compiler you can get your hands on, rather than picking one?
>
> Here "I need to choose a backend now." I was talking about
> a x86 or intermediate backend or create a interpreter.
>
> But choosing a compiler target for C.. yes it also is possible.
> Some features may exist (like thread storage) and be different
> depending on the compiler . So choosing the target compiler the
> generator could generate code for that specific feature.

The Seed7 language, if you were to build it from C sources, comes with
nearly 20 different makefiles for different compilers and platforms.

There's also a configure program which creates and runs some 100
different test programs to collate information about the C
implementation, resulting in a configuration header file describing the
environment.

Since Seed7 also uses C as a target language, I can't remember if that
config file was for the compiler, or compiling the generated C
intermediates, or both.

Use C as a target is not that simple!

When I used to target C, I produced a single C source for the entire
program, but there were three versions, since it didn't have conditional
elements:

* For Windows
* For Linux
* For a Neutral OS (runs on either but with limitations)

I think also the code assumed a 64-bit implementation; a 32-bit target,
if I was to still bother with it, would need separate versions.

I used to try and support half a dozen C compilers, which was hard as
they all had different limitations. Now I support only tcc and gcc.

No special options are required (other than ones like -O2 and -o for
gcc), and no special extensions (a few things expected to be in C99 like
anonymous structs and unions).

The main limitation of tcc (a significant one for me) is that it doesn't
support '$' in identifiers; most C compilers do. Which means taking
account of that in code generators.

Ben Bacarisse

unread,
Sep 7, 2022, 11:38:57 AM9/7/22
to
Bart <b...@freeuk.com> writes:

> The main limitation of tcc (a significant one for me) is that it
> doesn't support '$' in identifiers; most C compilers do. Which means
> taking account of that in code generators.

-fdollars-in-identifiers works for me.

--
Ben.

Bart

unread,
Sep 7, 2022, 12:04:43 PM9/7/22
to
What a strange thing to have as an option (and an odd thing to have as
an essential requirement in the build instructions for your app).

Just supporting '$' anyway would be a one-line change in the tcc source
code (although that only fixed my copy of it when I tried it).

Anton Shepelev

unread,
Sep 7, 2022, 12:14:58 PM9/7/22
to
Bart:

> Just supporting '$' anyway would be a one-line change in
> the tcc source code (although that only fixed my copy of
> it when I tried it).

The currect solution allows to run the compiler with the
more conservative settings, enforcing a more standard and
portable code.

--
() ascii ribbon campaign - against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]

Ben Bacarisse

unread,
Sep 7, 2022, 12:32:07 PM9/7/22
to
Bart <b...@freeuk.com> writes:

> On 07/09/2022 16:38, Ben Bacarisse wrote:
>> Bart <b...@freeuk.com> writes:
>>
>>> The main limitation of tcc (a significant one for me) is that it
>>> doesn't support '$' in identifiers; most C compilers do. Which means
>>> taking account of that in code generators.
>> -fdollars-in-identifiers works for me.
>
> What a strange thing to have as an option (and an odd thing to have as
> an essential requirement in the build instructions for your app).

Why? Because you want it as the default, tcc should make me write
-fno-dollars-in-identifiers get this non-portable feature diagnosed?

> Just supporting '$' anyway would be a one-line change in the tcc
> source code (although that only fixed my copy of it when I tried it).

Or a zero line change if you use that option.

--
Ben.

Bart

unread,
Sep 7, 2022, 12:40:34 PM9/7/22
to
On 07/09/2022 17:31, Ben Bacarisse wrote:
> Bart <b...@freeuk.com> writes:
>
>> On 07/09/2022 16:38, Ben Bacarisse wrote:
>>> Bart <b...@freeuk.com> writes:
>>>
>>>> The main limitation of tcc (a significant one for me) is that it
>>>> doesn't support '$' in identifiers; most C compilers do. Which means
>>>> taking account of that in code generators.
>>> -fdollars-in-identifiers works for me.
>>
>> What a strange thing to have as an option (and an odd thing to have as
>> an essential requirement in the build instructions for your app).
>
> Why? Because you want it as the default, tcc should make me write
> -fno-dollars-in-identifiers get this non-portable feature diagnosed?

Because '$' was supported on nearly every other compiler I tried, run
with default options.

(The exception was lccwin32 which only allowed it either as a starter or
inside the identifier; I don't recall.)

Also, generally there is no downside to allowing it. '$' doesn't have
any other use in C source code, does it?

>> Just supporting '$' anyway would be a one-line change in the tcc
>> source code (although that only fixed my copy of it when I tried it).
>
> Or a zero line change if you use that option.

But all build instructions for every user for every project that depends
on '$' would forever more need to include '-fdollars-in-identifiers'.
IMV that was the wrong choice.

Ben Bacarisse

unread,
Sep 7, 2022, 12:49:54 PM9/7/22
to
But you don't think using IDs with non-standard characters might also
have been a poor choice?

--
Ben.

Keith Thompson

unread,
Sep 7, 2022, 1:29:02 PM9/7/22
to
Sure, it would be easy to implement it. That's not the issue.

Supporting '$' in identifiers is an extension. The C standard doesn't
even mention it as a common extension. (I don't think I've ever used it
other than in a tiny test program or on VMS.)

And you want to remove the ability to warn about it?

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

David Brown

unread,
Sep 7, 2022, 2:06:09 PM9/7/22
to
Dollars are not part of the standard character set for C, so it seems
entirely reasonable to support it as an optional extension requiring a
flag. I don't know the details of tcc's standards support, but aiming
for greater conformance by default is a good idea.

Note that on several processors, the standard assembly makes use of
dollar signs for other purposes, such as referring to registers, local
labels, or hexadecimal constants. For such targets, having dollar signs
in identifiers may complicate things, so it is not supported on all
compilers. And some linkers might not support it either.

The strange thing, as I see it, is for a code generator to make C code
that has dollars in the identifiers. The code generator could use
whatever naming system it wants - it could mangle the identifiers so
that the source code could use more "letters", or other features such as
overloading, or encoding type information for extra error checking when
linking multiple object files. Clearly there are also advantages in
keeping the identifier naming unchanged - it makes the generated C code
easier to read, and will make it easier to use a debugger along with the
C code.

David Brown

unread,
Sep 7, 2022, 2:17:31 PM9/7/22
to
On 07/09/2022 18:40, Bart wrote:
> On 07/09/2022 17:31, Ben Bacarisse wrote:
>> Bart <b...@freeuk.com> writes:
>>
>>> On 07/09/2022 16:38, Ben Bacarisse wrote:
>>>> Bart <b...@freeuk.com> writes:
>>>>
>>>>> The main limitation of tcc (a significant one for me) is that it
>>>>> doesn't support '$' in identifiers; most C compilers do. Which means
>>>>> taking account of that in code generators.
>>>> -fdollars-in-identifiers works for me.
>>>
>>> What a strange thing to have as an option (and an odd thing to have as
>>> an essential requirement in the build instructions for your app).
>>
>> Why?  Because you want it as the default, tcc should make me write
>> -fno-dollars-in-identifiers get this non-portable feature diagnosed?
>
> Because '$' was supported on nearly every other compiler I tried, run
> with default options.
>
> (The exception was lccwin32 which only allowed it either as a starter or
> inside the identifier; I don't recall.)
>
> Also, generally there is no downside to allowing it. '$' doesn't have
> any other use in C source code, does it?
>

I believe (but I might have this wrong) that the use of dollar signs in
identifiers is undefined behaviour according to the standard. This
means that the compiler is free to accept it, without complaint. But it
also means that there is no guarantee that it will be accepted by a
compliant compiler. And while it might have worked on the compilers
/you/ tried, there are hundreds of others of compilers that you have
/not/ tried.

Maximum portability is, of course, not always particularly relevant.
When your language targets Windows and Linux, you don't care if the
generated C code compiles on a toolchain for a small microcontroller or
for DOS. That's fine, and you get to choose both the features of the
generated source code and the requirements for compilers to handle that
code. But other people have other needs and preferences, and pick
different defaults from the ones /you/ happen to want.

>>> Just supporting '$' anyway would be a one-line change in the tcc
>>> source code (although that only fixed my copy of it when I tried it).
>>
>> Or a zero line change if you use that option.
>
> But all build instructions for every user for every project that depends
> on '$' would forever more need to include '-fdollars-in-identifiers'.
> IMV that was the wrong choice.
>

I've only seen one other person ever using dollars in identifiers, and
that was someone in comp.lang.c++ where doing so was a particularly
questionable choice, given that there are serious proposals for using
dollar signs for a different purpose in C++ (for metaclasses and
reflection). It would surprise me if future C versions used the dollar
sign for some new purpose, but it is not impossible.

Bart

unread,
Sep 7, 2022, 2:27:38 PM9/7/22
to
On 07/09/2022 17:49, Ben Bacarisse wrote:
> Bart <b...@freeuk.com> writes:
>
>> On 07/09/2022 17:31, Ben Bacarisse wrote:
>>> Bart <b...@freeuk.com> writes:
>
>>>> Just supporting '$' anyway would be a one-line change in the tcc
>>>> source code (although that only fixed my copy of it when I tried it).
>>>
>>> Or a zero line change if you use that option.
>>
>> But all build instructions for every user for every project that
>> depends on '$' would forever more need to include
>> '-fdollars-in-identifiers'. IMV that was the wrong choice.
>
> But you don't think using IDs with non-standard characters might also
> have been a poor choice?
>

The dollar symbols come up when transpiling to C code from syntax that
uses $, for example, to represent fully names since C lacks namespaces,
or for special identifiers.

There, $ was considered a better and more visible and obvious indicator
of a special identifier than underscore.

Underscores suffer from poor visibility; consecutive underscore can
appear to run together depending on font; and they are also popular as
separators in user-identifiers, creating a bit more confusion.

Ben Bacarisse

unread,
Sep 7, 2022, 2:58:53 PM9/7/22
to
David Brown <david...@hesbynett.no> writes:

> I believe (but I might have this wrong) that the use of dollar signs
> in identifiers is undefined behaviour according to the standard.

I don't think it's undefined. Recent C drafts permit "other
implementation defined characters" in the syntax, so I'm not sure how it
could be undefined.

The C23 draft permits an XID_Start character followed by XID_Continue
characters. These may, in fact, include $ but I got lost down the
rabbit hole of referenced standards so I can't be sure.

--
Ben.

Ben Bacarisse

unread,
Sep 7, 2022, 3:03:11 PM9/7/22
to
So no, you consider it a good choose. So good, in fact, that you are
happy to exclude tcc from being used as the C compiler. But adding a
line to the documentation to say "add -fdollars-in-identifiers when
using tcc" is too much.

--
Ben.

Bart

unread,
Sep 7, 2022, 4:03:26 PM9/7/22
to
I supported Tcc but I needed to translate names using $ into something
that is acceptable. That sort of worked, but gave less readable, longer,
more unsatisfactory output.

The advantage is that it doesn't need to be mentioned in build docs:

gcc linux: gcc prog.c -oprog -lm -ldl
tcc linux: tcc prog.c -oprog -lm -ldl -fdollars-in-identifiers
gcc windows: gcc prog.c -oprog.exe
tcc windows: tcc prog.c -luser32 -fdollars-in-identifiers
bcc windows: bcc prog

It sort of sticks out doesn't it.

Also, gcc gets away with allowing $ by default; tcc generally copies gcc
in terms of how a compiler is invoked, so this surprisingly goes against
that.

Keith Thompson

unread,
Sep 7, 2022, 4:26:57 PM9/7/22
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:
> David Brown <david...@hesbynett.no> writes:
>
>> I believe (but I might have this wrong) that the use of dollar signs
>> in identifiers is undefined behaviour according to the standard.
>
> I don't think it's undefined. Recent C drafts permit "other
> implementation defined characters" in the syntax, so I'm not sure how it
> could be undefined.

Right. So if the implementation defines identifier-nondigit to include
'$', then using '$' in an identifier is well defined; if it doesn't,
it's simply a syntax error.

> The C23 draft permits an XID_Start character followed by XID_Continue
> characters. These may, in fact, include $ but I got lost down the
> rabbit hole of referenced standards so I can't be sure.

I don't believe '$' is included in XID_Start or XID_Continue, but I'm
not 100% certain of that.

https://unicode.org/reports/tr31/
https://unicode.org/reports/tr44/

Keith Thompson

unread,
Sep 7, 2022, 4:52:15 PM9/7/22
to
Bart <b...@freeuk.com> writes:
[...]
> Also, gcc gets away with allowing $ by default; tcc generally copies
> gcc in terms of how a compiler is invoked, so this surprisingly goes
> against that.

gcc allowing $ in identifiers is a documented extension for most
targets, so it doesn't warn about them even with "-pedantic". One could
argue that this isn't necessarily an "extension", since "other
implementation-defined characters" are explicitly permitted by C11
6.4.2.1.

Interestingly, the C23 draft doesn't have that wording. Instead, it
allows XID_Start and XID_Continue characters, so it expands the set of
characters that *all* implementations must support, but doesn't permit
'$' (assuming '$' isn't in XID_Start or XID_Continue).

A conforming C23 implementation can still accept '$' in identifiers
as an extension, but my understanding is that it must issue a
warning. (Of course no warning is required in non-conforming mode.)

In any case, as of C11 a conforming C compiler is not required to accept
'$' in identifiers *at all*, even with a command-line option, and may
reject any program that uses it. You can of course choose to rely on
the behavior of specific compilers, but if you use '$' in identifiers
then your code is not 100% portable. (REMINDER: "not 100% portable" is
not necessarily a criticism.)

Thiago Adams

unread,
Sep 7, 2022, 6:28:13 PM9/7/22
to
I have two output modes:

1 - One preserves all macros includes etc.. and makes edition/add/deletion
at the current source code. The generated code can be used in different
platforms. Some changes are only at active "if groups".

2 - Generate code like compiler "sees". Macros are removed and this is useful
for direct compilation. Generated code is discarded after compilation.

I think a x86 emulator for instance can be useful for me..because I can practice
code generation and see what is necessary and at same time have an interpreter.
This could be a separated project as well and I would create something like a linker
for the interpreted languages. There are many things I don't understand and I would
like to someday have the compiler generating the exe like tcc. Many small c compilers
don't generate the exe and depends on gcc for instance.

Sometimes it is very hard to generate C code.
(lambdas was the most difficult part for me and still have bugs)

Ben Bacarisse

unread,
Sep 7, 2022, 6:42:59 PM9/7/22
to
Keith Thompson <Keith.S.T...@gmail.com> writes:

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:

>> The C23 draft permits an XID_Start character followed by XID_Continue
>> characters. These may, in fact, include $ but I got lost down the
>> rabbit hole of referenced standards so I can't be sure.
>
> I don't believe '$' is included in XID_Start or XID_Continue, but I'm
> not 100% certain of that.
>
> https://unicode.org/reports/tr31/
> https://unicode.org/reports/tr44/

I found those but also ended up unsure. If it mattered to me I'd read
them in detail, but a quick scan was inconclusive.

--
Ben.

Bart

unread,
Sep 7, 2022, 7:52:13 PM9/7/22
to
I think interpreting x64 code (don't bother with 32-bit x86) is an
unnecessary complication (it would also be a major project of its own,
and may still involve generating binary x64 machine code).

Interpretation, if you want that, could either be done from an AST
representation, or from some intermediate VM language that you devise.

> This could be a separated project as well and I would create something like a linker
> for the interpreted languages. There are many things I don't understand and I would
> like to someday have the compiler generating the exe like tcc. Many small c compilers
> don't generate the exe and depends on gcc for instance.

Actually, gcc doesn't generate EXE either, or not directly. It produces
a temporary .s file containing assembly code (in the ghastly AT&T
syntax), then invokes the assembler 'as' to produce an object file (.o
or .obj).

Finally, the 'ld' linker is invoked to turn the object file into an
executable.

So, while direct exe generation is desirable in that there are no
dependencies, it's a lot of work.

It's best to start by generating textual ASM code if compiling to native
code.

Tiny C cuts quite a few corners, for example there is no intermediate
ASM to even look at, and it has fewer passes than are recommended for a
compiler.

Thiago Adams

unread,
Sep 8, 2022, 8:21:07 AM9/8/22
to
The emulation of x86 or x64 would not be complete.. I was thinking in just
have some elements that are similar and makes useful in the future to generate
real machine assembler. For instance, using some virtual registers and similar function call.
Also making data sections and using a stack of bytes.


> > This could be a separated project as well and I would create something like a linker
> > for the interpreted languages. There are many things I don't understand and I would
> > like to someday have the compiler generating the exe like tcc. Many small c compilers
> > don't generate the exe and depends on gcc for instance.
> Actually, gcc doesn't generate EXE either, or not directly. It produces
> a temporary .s file containing assembly code (in the ghastly AT&T
> syntax), then invokes the assembler 'as' to produce an object file (.o
> or .obj).
>
> Finally, the 'ld' linker is invoked to turn the object file into an
> executable.

I think MSVC has a separated linker but not sure if object files have
a intermediate step like gcc.

> So, while direct exe generation is desirable in that there are no
> dependencies, it's a lot of work.
>
> It's best to start by generating textual ASM code if compiling to native
> code.

The problem with this textual ASM is that it works in just one assembler right?
for instance nasm or gcc or masm.


Bart

unread,
Sep 8, 2022, 11:41:30 AM9/8/22
to
On 08/09/2022 13:20, Thiago Adams wrote:
> On Wednesday, September 7, 2022 at 8:52:13 PM UTC-3, Bart wrote:

>> I think interpreting x64 code (don't bother with 32-bit x86) is an
>> unnecessary complication (it would also be a major project of its own,
>> and may still involve generating binary x64 machine code).
>>
>> Interpretation, if you want that, could either be done from an AST
>> representation, or from some intermediate VM language that you devise.
>
> The emulation of x86 or x64 would not be complete.. I was thinking in just
> have some elements that are similar and makes useful in the future to generate
> real machine assembler. For instance, using some virtual registers and similar function call.
> Also making data sections and using a stack of bytes.

So it's a VM.

>
>>> This could be a separated project as well and I would create something like a linker
>>> for the interpreted languages. There are many things I don't understand and I would
>>> like to someday have the compiler generating the exe like tcc. Many small c compilers
>>> don't generate the exe and depends on gcc for instance.
>> Actually, gcc doesn't generate EXE either, or not directly. It produces
>> a temporary .s file containing assembly code (in the ghastly AT&T
>> syntax), then invokes the assembler 'as' to produce an object file (.o
>> or .obj).
>>
>> Finally, the 'ld' linker is invoked to turn the object file into an
>> executable.
>
> I think MSVC has a separated linker but not sure if object files have
> a intermediate step like gcc.
>
>> So, while direct exe generation is desirable in that there are no
>> dependencies, it's a lot of work.
>>
>> It's best to start by generating textual ASM code if compiling to native
>> code.
>
> The problem with this textual ASM is that it works in just one assembler right?
> for instance nasm or gcc or masm.


There are syntax differences between different assemblers, but they're
not great. (Only gcc's 'gas' or 'AT&T' format is very different, but
even that has an option to accept the more standard Intel-style format.)

If you were to generate ASM in a simple fashion, such as directly
writing the text so that those differences are hardcoded throughout your
program, then switching assemblers would be a lot of work.

The method I use is to generate a more independent representation of x64
code, then I just need a different routine to dump that data structure
into ASM source. So a few hundred lines instead of a few thousand.

But also bear in mind that even x64 code will vary according to
platform, because of ABI differences.

Scott Lurndal

unread,
Sep 8, 2022, 12:43:46 PM9/8/22
to
Bart <b...@freeuk.com> writes:
>On 08/09/2022 13:20, Thiago Adams wrote:

>There are syntax differences between different assemblers, but they're
>not great. (Only gcc's 'gas' or 'AT&T' format is very different, but
>even that has an option to accept the more standard Intel-style format.)

Define standard, in this context. The AT&T syntax preceeded the intel
syntax (as it was originally designed for the PDP-11 where source
operands always preceeded destination operands) by several years.
Not to mention that the majority of mainframe assembler syntaxes
also had destination operands following source operands, even those
without general purpose registers.

The AT&T syntax is far more concise and readable without all the
useless annotations (e.g. DWORD everywhere).

YMMV and opinions differ.

Thiago Adams

unread,
Sep 8, 2022, 12:59:58 PM9/8/22
to
On Thursday, September 8, 2022 at 12:41:30 PM UTC-3, Bart wrote:
> On 08/09/2022 13:20, Thiago Adams wrote:
> > On Wednesday, September 7, 2022 at 8:52:13 PM UTC-3, Bart wrote:
>
> >> I think interpreting x64 code (don't bother with 32-bit x86) is an
> >> unnecessary complication (it would also be a major project of its own,
> >> and may still involve generating binary x64 machine code).
> >>
> >> Interpretation, if you want that, could either be done from an AST
> >> representation, or from some intermediate VM language that you devise.
> >
> > The emulation of x86 or x64 would not be complete.. I was thinking in just
> > have some elements that are similar and makes useful in the future to generate
> > real machine assembler. For instance, using some virtual registers and similar function call.
> > Also making data sections and using a stack of bytes.
> So it's a VM.

Yes. This kind of VM also may be useful inside the compiler to analyse code.

Bart

unread,
Sep 8, 2022, 2:23:22 PM9/8/22
to
The context was x64 and x86, which were successors to 8086 which was a
development of 8080, all Intel products. AFAIK those have always used a
destination operand on the left in their assemblers.

Tim Rentsch

unread,
Sep 14, 2022, 9:40:26 AM9/14/22
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:

> David Brown <david...@hesbynett.no> writes:
>
>> I believe (but I might have this wrong) that the use of dollar signs
>> in identifiers is undefined behaviour according to the standard.
>
> I don't think it's undefined. Recent C drafts permit "other
> implementation defined characters" in the syntax, so I'm not sure how it
> could be undefined.

If the implementation's documentation lists dollar sign amoung
the set of implementation-defined characters for identifiers,
the behavior is defined. Otherwise, the presence of dollar
sign (in source that hasn't been filtered out by preprocessor
directives) results in a syntax error, which makes the behavior
undefined.

> The C23 draft permits an XID_Start character followed by XID_Continue
> characters. These may, in fact, include $ but I got lost down the
> rabbit hole of referenced standards so I can't be sure.

Looking at some Unicode reference material and also looking
at some web search results, the evidence seems pretty strong
that $ and @ are not included in the XID_Start and XID_Continue
default sets. However, it isn't clear (at least it isn't to
me) whether the C23 draft admits the possibility that $ and @
may be accepted under an implementation-defined umbrella.

Tim Rentsch

unread,
Sep 14, 2022, 9:45:58 AM9/14/22
to
Keith Thompson <Keith.S.T...@gmail.com> writes:

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>
>> David Brown <david...@hesbynett.no> writes:
>>
>>> I believe (but I might have this wrong) that the use of dollar signs
>>> in identifiers is undefined behaviour according to the standard.
>>
>> I don't think it's undefined. Recent C drafts permit "other
>> implementation defined characters" in the syntax, so I'm not sure how it
>> could be undefined.
>
> Right. So if the implementation defines identifier-nondigit to include
> '$', then using '$' in an identifier is well defined; if it doesn't,
> it's simply a syntax error.
>
>> The C23 draft permits an XID_Start character followed by XID_Continue
>> characters. These may, in fact, include $ but I got lost down the
>> rabbit hole of referenced standards so I can't be sure.
>
> I don't believe '$' is included in XID_Start or XID_Continue, but I'm
> not 100% certain of that.
>
> https://unicode.org/reports/tr31/
> https://unicode.org/reports/tr44/

In addition to these documents I have done some web searching,
and I think the evidence is pretty strong that neither '$' nor
'@' is in XID_Start or XID_Continue, as the Unicode documentation
details them.

Editorial comment: the two Unicode reference documents may be
the very worst reference documentation I have ever read.

Tim Rentsch

unread,
Sep 14, 2022, 10:14:52 AM9/14/22
to
Keith Thompson <Keith.S.T...@gmail.com> writes:

> Bart <b...@freeuk.com> writes:
> [...]
>
>> Also, gcc gets away with allowing $ by default; tcc generally copies
>> gcc in terms of how a compiler is invoked, so this surprisingly goes
>> against that.
>
> gcc allowing $ in identifiers is a documented extension for most
> targets, so it doesn't warn about them even with "-pedantic". One could
> argue that this isn't necessarily an "extension", since "other
> implementation-defined characters" are explicitly permitted by C11
> 6.4.2.1.
>
> Interestingly, the C23 draft doesn't have that wording. Instead, it
> allows XID_Start and XID_Continue characters, so it expands the set of
> characters that *all* implementations must support,

I believe that conclusion is not correct. The C23 draft n3047 says
this:

An XID_Start character is an implementation-defined character
whose corresponding code point in ISO/IEC 10646 has the
XID_Start property. An XID_Continue character is an
implementation-defined character whose corresponding code point
in ISO/IEC 10646 has the XID_Continue property.

The presence of the modifying adjective "implementation-defined" in
both cases surely means, at the very least, that implementations
are allowed to subset the Unicode-specified XID_Start/XID_Continue
sets with regard to what characters are allowed in identifiers.


> but doesn't permit '$' (assuming '$' isn't in XID_Start or
> XID_Continue).

I'm not sure this statement is right either (and agreeing with the
assumption that '$' is not in XID_Start or XID_Continue). The
Unicode reference documentation is so bad that I can't make out
with any degree of certainty whether it allows various domains to
extend the XID_Start/XID_Continue sets for the application in
question. The presence of "implementation-defined" further muddies
the waters. To be clear, neither am I saying that I think the
statement is wrong; only that at present there is not enough
information to be confident of either conclusion.


> A conforming C23 implementation can still accept '$' in identifiers
> as an extension, but my understanding is that it must issue a
> warning.

If accepting '$' is only an extension, and not a consequence of
some implementation-defined behavior, then using '$' in an
identifier results in a syntax error, which requires a diagnostic.
Where is the uncertainty?


> In any case, as of C11 a conforming C compiler is not required to accept
> '$' in identifiers *at all*, even with a command-line option, and may
> reject any program that uses it. You can of course choose to rely on
> the behavior of specific compilers, but if you use '$' in identifiers
> then your code is not 100% portable. (REMINDER: "not 100% portable" is
> not necessarily a criticism.)

Yes.

Keith Thompson

unread,
Sep 14, 2022, 1:59:42 PM9/14/22
to
You're right. I managed to miss the phrase "implementation-defined".

So ISO/IEC 10646 defines which characters have the "XID_Start" property,
but an "XID_Start" character is a member of a C implementation-defined
subset of those characters. The terminology is a bit confusing, but I'm
not sure how I'd improve it.

>> but doesn't permit '$' (assuming '$' isn't in XID_Start or
>> XID_Continue).
>
> I'm not sure this statement is right either (and agreeing with the
> assumption that '$' is not in XID_Start or XID_Continue). The
> Unicode reference documentation is so bad that I can't make out
> with any degree of certainty whether it allows various domains to
> extend the XID_Start/XID_Continue sets for the application in
> question. The presence of "implementation-defined" further muddies
> the waters. To be clear, neither am I saying that I think the
> statement is wrong; only that at present there is not enough
> information to be confident of either conclusion.

If '$' can't be an XID_Start or XID_Continue character, then an attempt
to use '$' in an identifier is a syntax error. I don't see any wiggle
room there. The open question is whether '$' is, or can be, an
XID_Start or XID_Continue character.

[...]

Tim Rentsch

unread,
Sep 16, 2022, 2:47:14 AM9/16/22
to
I did not consider the question of how to improve the wording in
the ISO C standard.

>>> but doesn't permit '$' (assuming '$' isn't in XID_Start or
>>> XID_Continue).
>>
>> I'm not sure this statement is right either (and agreeing with the
>> assumption that '$' is not in XID_Start or XID_Continue).

To clarify my earlier statement, I am agreeing with (what I think
is your) assumption that '$'s ISO/IEC 10646 code point does not
have either the XID_Start property or the XID_Continue property.

>> The
>> Unicode reference documentation is so bad that I can't make out
>> with any degree of certainty whether it allows various domains to
>> extend the XID_Start/XID_Continue sets for the application in
>> question. The presence of "implementation-defined" further muddies
>> the waters. To be clear, neither am I saying that I think the
>> statement is wrong; only that at present there is not enough
>> information to be confident of either conclusion.
>
> If '$' can't be an XID_Start or XID_Continue character, then an attempt
> to use '$' in an identifier is a syntax error. I don't see any wiggle
> room there. The open question is whether '$' is, or can be, an
> XID_Start or XID_Continue character.

Yes, I believe these statements are consistent with the most
natural reading of the current C23 draft (subject to a condition
that the consistency property can be objectively quantified).
0 new messages