Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Alternative compilation model for C

85 views
Skip to first unread message

Thiago Adams

unread,
Jul 13, 2022, 9:08:15 AM7/13/22
to
(This is cross post, also on reddit compilers)

I was thinking in a new compilation model for C (just to experiment) without changing the language syntax.

Maybe breaking some code - so this compilation mode is optional.

For the #include instead of text inclusion something like "symbol import".

Today in C if we do

#define X
int f(){}
#include "file.h"

The macro X and function f can be used inside file.h because text is expanded and everything is at same scope.

Although this is possible a common practice for headers is to include other headers with the definition and not rely on external scope. The exception is global macros like DEBUG for instance.

So one difference of this compilation model is that #include is NOT text inclusion. It is "import symbols" and the external context is not accessible.

Another common practice in C is to use include guards to avoid including the same symbols twice. In this model this is the default, the symbols are included just once.

Instead of compiling one source each time this model can compile many sources and the parsed header files can be "reserved in memory" to be included again without re-parsing. This works because differently from normal C text inclusion the header file does not change depending where it is included.

To solve the global macro usage like DEBUG the compiler settings works like if they were the first include but this is implicit. We also can have our config.h that is included in each file. Actually some C projects have this.

I forgot to say something.. my idea is also to move all the preprocessor phases to compile phase and again in a way that does not break code.(at least having a big common subset the works in both models)

This process of moving the preprocessor phases have a lot of details and each problem deserver its own topic. Macro expansion is the difficult part.

How this could work? Back to #include... include is now at parser phase.

When the compiler finds #include "file.h" it checks if the file is already loaded (parsed) if yes the symbols are injected at the current context.

If not, the file is load first (parsed) with empty context (like the initial sample X and f are not present) and then symbols are injected.

Included files inside included files also will inject symbols at the current context. Something like private include also could be considered but then the source cannot be used on the old model.

One way to implement this is for-loop injecting each symbol at the external scope. If the symbol already exist then it is a error.

Another way I was thinking is to just inject all the new scope making the global scope a collection. But I need to check if the symbol already exists anyway so does not help too much..

Scott Lurndal

unread,
Jul 13, 2022, 10:53:04 AM7/13/22
to
Thiago Adams <thiago...@gmail.com> writes:
>(This is cross post, also on reddit compilers)
>
>I was thinking in a new compilation model for C (just to experiment) withou=
>t changing the language syntax.
>
<snip>
>Instead of compiling one source each time this model can compile many sourc=
>es and the parsed header files can be "reserved in memory" to be included a=
>gain without re-parsing. This works because differently from normal C text =
>inclusion the header file does not change depending where it is included.

https://en.wikipedia.org/wiki/Precompiled_header

<snip>

Keith Thompson

unread,
Jul 13, 2022, 1:48:19 PM7/13/22
to
Thiago Adams <thiago...@gmail.com> writes:
> (This is cross post, also on reddit compilers)
>
> I was thinking in a new compilation model for C (just to experiment) without changing the language syntax.
>
> Maybe breaking some code - so this compilation mode is optional.
>
> For the #include instead of text inclusion something like "symbol import".
[...]

Changing the behavior of #include is not going to happen.

I'll note that C++ is *adding* a new feature called "import" that's
probably similar to what you're suggesting. Adding import to C might be
feasible.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

bart c

unread,
Jul 14, 2022, 4:37:54 PM7/14/22
to
On Wednesday, 13 July 2022 at 14:08:15 UTC+1, Thiago Adams wrote:
> (This is cross post, also on reddit compilers)

So you must be 'u/thradams'.

> I was thinking in a new compilation model for C (just to experiment) without changing the language syntax.

I saw your thread on reddit and replied to it (as 'till-one'). But it came across as a rather broad question.

If you are trying to bolt a new module system onto C, or your new version of it, that could be worth mentioning.

Thiago Adams

unread,
Jul 14, 2022, 10:33:05 PM7/14/22
to
I haven't started but I plan one day write a C parser that can read a normal C file but
with a different model for preprocessor.
Before start this project I want to think how this could work.
On item to "solve" is #include.
We can (even in parser phase) do text inclusion. It would be just a diversion
of normal parsing. But I think we had to consider other alternatives.

Something I can do to test is to compile header files (without linking) from c libraries etc
to see if they depend on some previous state (macros, declarations) or if each header is
self-sufficient.

In any project for instance we can compile header files
for instance gcc -c header.h
to see if they are self-sufficient. This is not a perfect guarantee
because some header can have #ifdef X and X can be or not defined
by previous headers.

I can prepare a normal compiler to detect this as well and emit a warning
if it uses a macro or declaration from the external world.





Chris M. Thomasson

unread,
Jul 15, 2022, 1:50:37 AM7/15/22
to
Make sure it can compile code using chaos-pp:

https://github.com/rofl0r/chaos-pp

;^)

bart c

unread,
Jul 15, 2022, 6:51:58 AM7/15/22
to
C's 'header' system is really unsophisticated and a long way from proper modules. (And yet it is also full of complications such as the implementation-defined search algorithms for nested header files; nested imports is not a feature that is desirable of a module system.)

However I suggest leaving that alone. You can implement a better module scheme on top of C's existing preprocessor, but it may need some external tools.

The first step is deciding how an improved module scheme may work:

* Modules consist of a implementation part (file.c) and an interface (which can still be file.h)
* An advanced feature is for the interface to be generated automatically from the implementation (however this has problems when you have circular import chains)
* A module may create a new namespace, which means accesses to imported names need qualifying: file.f(), which is not valid in C unless 'file' is somehow made into a struct. But if source has to be transformed into legal C anyway, this can be turned into file_f() or similar
* Imports might done with 'import file'. These don't nest; if file.c has its own imports, these are not visible from the module that imports 'file'.

Etc. These are just ideas. Any worthwhile module scheme will involve extending C in some way, but you need to decide what its features are.

I just don't think it's worth tinkering with C's textual include files.

Thiago Adams

unread,
Oct 3, 2022, 12:14:58 PM10/3/22
to

I just want to add here something I found in the standard.

"
7.1.2
...
Standard headers may be included in any order; each may be included more than once in a given
scope, with no effect different from being included only once, except that the effect of including
<assert.h> depends on the definition of NDEBUG (see 7.2)
"

Although some C features like "#include " are just text inclusion, the way programmers
use and think about them is "import". Including the standard.
Something like NDEBUG is also a define that is global.
(Of course someone may also put just in front of some header instead of global)

Scott Lurndal

unread,
Oct 3, 2022, 12:26:22 PM10/3/22
to
Thiago Adams <thiago...@gmail.com> writes:
>
>I just want to add here something I found in the standard.
>
>"
>7.1.2
>...
>Standard headers may be included in any order; each may be included more than once in a given
>scope, with no effect different from being included only once, except that the effect of including
><assert.h> depends on the definition of NDEBUG (see 7.2)
>"
>
>Although some C features like "#include " are just text inclusion, the way programmers
>use and think about them is "import". Including the standard.

The standard is only describing headers defined _by_ the standard;
headers which are idempotent (except <assert.h>) and independent of each other.

There is no requirement that headers not described by the standard behave
similarly.

Thiago Adams

unread,
Oct 4, 2022, 8:14:42 AM10/4/22
to
Sure.

One way to ensure headers are self-sufficient is also adding at the build
pipeline a 'header compilation'. Maybe asking not link.

I tried gcc and just had an inconvenient message, but it works

gcc header.h -c

warning: #pragma once in main file
1 | #pragma once
| ^~~~

0 new messages