Effect of CPP tags

Janis Papanagnou

unread,

Dec 26, 2023, 10:59:56 AM12/26/23

to

This is a CPP question that arose last month. It's not about an actual
issue with the software, just out of curiosity and to be sure it works
reliable (it seemingly does).

In a C99 program on Linux (Ubuntu) I intended to use usleep() and then
also strnlen().

When I added usleep() and its include file I got an error and was asked
to define the CPP tag '_BSD_SOURCE'. I did so, and because I wanted
side effects of that tag kept as small as possible I prepended it just
before the respective #include and put it at the end of my #include list

...other #includes...
#define _BSD_SOURCE
#include <unistd.h>

But as got obvious *that* way there had been side-effects and I had to
put the tag at the beginning of all include files (which astonished me)

#define _BSD_SOURCE
#include <unistd.h>
...other #includes here...

For the strnlen() function I needed another CPP tag, '_GNU_SOURCE'. So
now I have both CPP tag definitions before the includes

#define _GNU_SOURCE /* necessary for strnlen() in string.h */
#define _BSD_SOURCE /* necessary for usleep() in unistd.h */
...all #includes here...

The compile showed no error messages and the code works fine (it seems).

Now I'm not feeling very comfortable with that; I seem to declare two
different "philosophies" that way, GNU and BSD. And both tags affect
all include files. - Is that okay?

Last time I looked into the system header files, three decades ago, I
got repelled by all the #ifdef's, cascaded and nested, a spaghetti code
of dependencies; I'm astonished it works. - And the responsibility to
keep all the CPP tags consistent with _all_ the header files lies at
the side of the system headers' developers? The programmer doesn't need
to care?

Janis

Lowell Gilbert

unread,

Dec 26, 2023, 5:45:32 PM12/26/23

to

Janis Papanagnou <janis_pap...@hotmail.com> writes:

> In a C99 program on Linux (Ubuntu) I intended to use usleep() and then
> also strnlen().

usleep() isn't in C99. It comes from POSIX, where it was declared
obsolete in 2001. [This information is available in "man usleep".]

I don't want to get into checking which version of glibc you're using,
and such details, so in this case I'll just recommend that you follow
POSIX and use nanosleep() instead. You'll need to multiply by a thousand
and reference different include files, but I strongly suspect you can
figure that all out on your own.

With only fabulously rare exceptions (which mostly involve the
programmer knowing more about the implementation of libc than the libc
implementors did), user code should not be defining (or undefining)
_GNU_SOURCE or _BSD_SOURCE. You seem to have had thoughts in that
direction, and you were absolutely right.

Be well.
--
Lowell Gilbert, embedded/networking software engineer
http://be-well.ilk.org/~lowell/

Kaz Kylheku

unread,

Dec 26, 2023, 5:50:54 PM12/26/23

to

On 2023-12-26, Janis Papanagnou <janis_pap...@hotmail.com> wrote:
> This is a CPP question that arose last month. It's not about an actual
> issue with the software, just out of curiosity and to be sure it works
> reliable (it seemingly does).
>
> In a C99 program on Linux (Ubuntu) I intended to use usleep() and then
> also strnlen().
>
> When I added usleep() and its include file I got an error and was asked
> to define the CPP tag '_BSD_SOURCE'. I did so, and because I wanted
> side effects of that tag kept as small as possible I prepended it just
> before the respective #include and put it at the end of my #include list
>
> ...other #includes...
> #define _BSD_SOURCE
> #include <unistd.h>
>
> But as got obvious *that* way there had been side-effects and I had to
> put the tag at the beginning of all include files (which astonished me)

Feature selection macros must be in effect before any system header
is included, and are usually put on the compiler command line:

cc ... -D_BSD_SOURCE -D_XOPEN_SOURCE=700 ...

The concept of feature selection macros is documented in POSIX.

https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html

I will give you an excellent reason not to put them in source code:
their behavior is system-specific. No single combination of these things
works everywhere. It's best to wrestle that out in the configure
scripts and Makefiles, keeping the code clean.

The header files in BSD Unixes make the wrong interpretation of how
feature selection macros are supposed to work. This affects everything
that is based on BSD or has copy and pasted C libraries from BSD:
Android's Bionic, Cygwin's Newlib, Apple's Darwin environment.

The BSD interpretation of feature selection macros is this:

1. All available identifiers are turned on by default.

2. Feature selection macros *restrict* (subtract) from that set.

3. So do macros coming from the compiler's dialect selection.

Under BSD, if you use, say gcc -ansi -D_POSIX_SOURCE, it will
stupidly produce the intersection of pure ANSI and POSIX.
What you wanted was to use the ANSI C dialect of the language, with
POSIX functions. The BSD interpretation is that ANSI means you don't
want <stdio.h> to give you the declaration of fileno or fdopen,
and so the -D_POSIX_SOURCE won't reveal POSIX things in ANSI/ISO C
headers.

Similarly, if you picked _POSIX_SOURCE and _BSD_SOURCE, you would
stupidly get the intersection of those two, not the union.

In other systems, like GNU/Linuxes, Solaris (I think), you get
the union: -ansi -D_POSIX_SOURCE -D_BSD_SOURCE would work as you
expect: your code is the ANSI dialect, and you want header files
to reveal POSIXisms as well as BSD-isms.

In the Apple environment, forget about it; fine-grained feature
selection is broken: you use -D_DARWIN_C_SOURCE and be done with it.
It's like the old -D_HPUX_SOURCE to get anything compiled on HP-UX.

> #define _GNU_SOURCE /* necessary for strnlen() in string.h */

I think once you define _GNU_SOURCE, Glibc's feature selection will
give you everything. BSD functions in the GNU C library are also
considered GNU extensions over POSIX. It's like the GNU equivalent of
_DARWIN_C_SOURCE or _HPUX_SOURCE; you're asking to reveal everything
from the GNU vendor.

On Glibc, the header /usr/include/features.h is the switch where
the feature selection public macros get converted into combinations of
internal private macros which are then used throughout the headers
to turn things on and off. The features.h header contains a
block comment which lists the public feature selection macros, and
another one that describes the internal ones you're not supposed
to use directly.

You can see that for _GNU_SOURCE, the comment says:

_GNU_SOURCE All of the above, plus GNU extensions.

All of the above refers to all the POSIX, X/Open, BSD stuff
listed above that.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazi...@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

Spiros Bousbouras

unread,

Dec 27, 2023, 12:11:58 PM12/27/23

to

On Tue, 26 Dec 2023 22:50:40 -0000 (UTC)
Kaz Kylheku <433-92...@kylheku.com> wrote:
> On 2023-12-26, Janis Papanagnou <janis_pap...@hotmail.com> wrote:
> > This is a CPP question that arose last month. It's not about an actual
> > issue with the software, just out of curiosity and to be sure it works
> > reliable (it seemingly does).
> >
> > In a C99 program on Linux (Ubuntu) I intended to use usleep() and then
> > also strnlen().
> >
> > When I added usleep() and its include file I got an error and was asked
> > to define the CPP tag '_BSD_SOURCE'. I did so, and because I wanted
> > side effects of that tag kept as small as possible I prepended it just
> > before the respective #include and put it at the end of my #include list
> >
> > ...other #includes...
> > #define _BSD_SOURCE
> > #include <unistd.h>
> >
> > But as got obvious *that* way there had been side-effects and I had to
> > put the tag at the beginning of all include files (which astonished me)
>
> Feature selection macros must be in effect before any system header
> is included, and are usually put on the compiler command line:
>
> cc ... -D_BSD_SOURCE -D_XOPEN_SOURCE=700 ...
>
> The concept of feature selection macros is documented in POSIX.
>
> https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html

On Linux there exists also man page feature_test_macros which says : "In
order to be effective, a feature test macro must be defined before including
any header files."

By the way , this kind of question is more appropriate for
comp.unix.programmer .

Janis Papanagnou

unread,

Dec 28, 2023, 11:34:59 AM12/28/23

to

On 26.12.2023 16:59, Janis Papanagnou wrote:
> [...]

> In a C99 program on Linux (Ubuntu) I intended to use usleep() and then
> also strnlen().
>
> When I added usleep() and its include file I got an error and was asked

> to define the CPP tag '_BSD_SOURCE'. [...]

Thanks for all the replies.

Kaz wrote:
>> Feature selection macros must be in effect before any system header
>> is included, and are usually put on the compiler command line:

Right. This time I came from the functions' man pages and read that
such tags are necessary for them, so I didn't think about the original
purpose of these tags. I forgot that decades ago we used for platform
specific declarations. Thanks for refreshing my neurons.

>> I think once you define _GNU_SOURCE, Glibc's feature selection will
>> give you everything.

Indeed. I removed the one obsolete tag.

Lowell wrote:
>> usleep() isn't in C99. It comes from POSIX, where it was declared
>> obsolete in 2001.

Yes, I read about that. Though here I'm just programming non-portably
for my local (and static, non-changing) environment, so it's not an
issue in practice. Generally I try to program close to standards (but
standards obviously also change, as we see).

>> I'll just recommend that you follow POSIX and use nanosleep()
>> instead.

When I had read about the various 'sleep' options I decided to use one
which supports sub-second resolution and with a most simple interface.
That's why my choice was the simple 'usleep(usec);' even if obsolete
by POSIX. The nanosleep() is not "very complex", sure, but I'd have to
litter my code with variables unnecessary in my context, and also the
advertised "advantages" of this function do not apply in my case.[*]

And thanks for your confirmation that my "thoughts" about "not looking
right to use _GNU_SOURCE and _BSD_SOURCE" was not unjustified.

Spiros wrote:
>> By the way , this kind of question is more appropriate for
>> comp.unix.programmer

I was indeed pondering about that. But I'm not programming Unix, and
here I came from an application programming question so my choice was
this newsgroup, and I think (and hope) it has not been a bad choice.

I certainly found the experts to get the clarifications, suggestions,
and insights that were helpful.

Janis

[*] To illustrate: I recall a similar decision in Java context. There
was a simple and easy to use rexexp library (from Apache, IIRC). And
there was also a most flexible blown up library that made its usage a
lot more bulky (using and instantiating many classes and dependencies).
I used the simple, user-friendly one. Later the bulky library became
the Java standard.

Lowell Gilbert

unread,

Dec 28, 2023, 2:11:58 PM12/28/23

to

Janis Papanagnou <janis_pap...@hotmail.com> writes:

> Lowell wrote:
>>> I'll just recommend that you follow POSIX and use nanosleep()
>>> instead.
>
> When I had read about the various 'sleep' options I decided to use one
> which supports sub-second resolution and with a most simple interface.
> That's why my choice was the simple 'usleep(usec);' even if obsolete
> by POSIX. The nanosleep() is not "very complex", sure, but I'd have to
> litter my code with variables unnecessary in my context, and also the
> advertised "advantages" of this function do not apply in my case.[*]

To be honest,I didn't actually understand where your problem came from
in the first place -- I just chose not to bring up more than one point
at a time. While usleep() is obsolete, it works fine, without any
feature test macro games, on (as far as I know) all POSIX-ish
systems. Certainly on recent Ubuntu, the following program compiles and
runs perfectly well without even any warnings with even the most extreme
levels of warning enabled:

#include <stdio.h>
#include <unistd.h>

int main(void)
{
printf("starting\n");
usleep(2500000);
printf("finishing\n");

Keith Thompson

unread,

Dec 28, 2023, 4:14:12 PM12/28/23

to

Lowell Gilbert <lgus...@be-well.ilk.org> writes:
[...]

> To be honest,I didn't actually understand where your problem came from
> in the first place -- I just chose not to bring up more than one point
> at a time. While usleep() is obsolete, it works fine, without any
> feature test macro games, on (as far as I know) all POSIX-ish
> systems. Certainly on recent Ubuntu, the following program compiles and
> runs perfectly well without even any warnings with even the most extreme
> levels of warning enabled:
>
> #include <stdio.h>
> #include <unistd.h>
>
> int main(void)
> {
> printf("starting\n");
> usleep(2500000);
> printf("finishing\n");
> }

I think the warnings you enabled weren't extreme enough:

$ gcc -std=c11 -c c.c
c.c: In function ‘main’:
c.c:7:4: warning: implicit declaration of function ‘usleep’; did you mean ‘sleep’? [-Wimplicit-function-declaration]
7 | usleep(2500000);
| ^~~~~~
| sleep
$

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Kaz Kylheku

unread,

Dec 28, 2023, 4:22:55 PM12/28/23

to

On 2023-12-28, Janis Papanagnou <janis_pap...@hotmail.com> wrote:
> On 26.12.2023 16:59, Janis Papanagnou wrote:
>> [...]
>> In a C99 program on Linux (Ubuntu) I intended to use usleep() and then
>> also strnlen().
>>
>> When I added usleep() and its include file I got an error and was asked
>> to define the CPP tag '_BSD_SOURCE'. [...]
>
> Thanks for all the replies.
>
>
> Kaz wrote:
>>> Feature selection macros must be in effect before any system header
>>> is included, and are usually put on the compiler command line:
>
> Right. This time I came from the functions' man pages and read that
> such tags are necessary for them, so I didn't think about the original
> purpose of these tags. I forgot that decades ago we used for platform
> specific declarations. Thanks for refreshing my neurons.

I'm a real stickler for not wanting to reveal everything in the headers,
if possible.

So here is the irony. On BSD libraries, you can get BSD symbols with
a dialect option like -ansi or -std=C99 fi you use the secret, internal,
double-leading-underscored feature selector __BSD_VISIBLE. As in:

gcc ... -std=c99 -D__BSD_VISIBLE

The -std=c99 generates certain #defines with BSD takes as a clue to
hide everything not related to C99 from every standard ISO C header.

We then pry that open with __BSD_VISIBLE. If we used -D_BSD_SOURCE,
that wouldn't work.

In the TXR configure script, I have an elaborate test.
(Not shown is the conftest function which compiles conftest.c,
with the assistance of a rule in the Makefile, and reports success):

printf "Detecting what symbol reveals BSD functions ... "

cat > conftest.c <<!
#include <unistd.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
int (*pdaemon)(int, int) = &daemon;
}
!

if conftest ; then
printf "none needed\n"
else
for flag in _DEFAULT_SOURCE _BSD_SOURCE __BSD_VISIBLE _GNU_SOURCE _X_OOPS; do
if [ $flag = _X_OOPS ] ; then
printf "failed\n"
break
fi

if conftest EXTRA_FLAGS=-D$flag ; then
printf "%s\n" $flag
lang_flags="$lang_flags -D$flag"
gen_config_make
break
fi
done
fi

Kaz Kylheku

unread,

Dec 28, 2023, 4:33:52 PM12/28/23

to

But if you don't specify any options, you're not even specifying the C
dialect. You will get whatever dialect your gcc installation defaults
to. That is always a GNU dialect, firstly, which you might not want.
Secondly, it's a moving target; it' used to be gnu89, then gnu99
then gnu11. Now GCC defaults to gnu17.

Once you specify the dialect, things get strict.

$ gcc usleep.c

Nothing

$ gcc -std=c99 usleep.c
usleep.c: In function ‘main’:
usleep.c:7:4: warning: implicit declaration of function ‘usleep’; did

you mean ‘sleep’? [-Wimplicit-function-declaration]

usleep(2500000);
^~~~~~
sleep

Oops! Once we use any feature selection macro (or compiler option that
generates #defines that serve as feature selection macros) the set of
what is visible is reduced to some minimal set, and from then, the
feature macros turn on the selected symbols.

Thus <unistd.h> reveals only some fairly old POSIX functions.
Maybe not even up to 2003? The details depend on the behavior of the C
library, like Glibc's <features.h>.

The GNU-documented preferred way to coax usleep out of <unistd.h>
under this condition is this:

$ gcc -std=c99 -D_DEFAULT_SOURCE usleep.c

The _DEFAULT_SOURCE feature selection symbol brings out some traditional
functions (or something like that) without bringing in all the full
blown extensions of _GNU_SOURCE.

The BSD people misunderstood this whole thing. Without feature
selections, they reveal everything. But the semantics of a feature
selection is "hide everything except this". They neglected to implement
the correct logic: "if at least one feature selection is present, hide
everything except for some minimal base, and then for every feature
selection, make those respective symbols visible."

Kaz Kylheku

unread,

Dec 28, 2023, 4:42:23 PM12/28/23

to

On 2023-12-28, Kaz Kylheku <433-92...@kylheku.com> wrote:
> Once you specify the dialect, things get strict.
>
> $ gcc usleep.c
>
> Nothing
>
> $ gcc -std=c99 usleep.c
> usleep.c: In function ‘main’:
> usleep.c:7:4: warning: implicit declaration of function ‘usleep’; did
> you mean ‘sleep’? [-Wimplicit-function-declaration]
> usleep(2500000);
> ^~~~~~
> sleep

For completeness:

No warning with -Wall without a dialect selection: declaration of usleep
is not hidden in <unistd.h>:

$ gcc -Wall usleep.c

No warning with C89. But that's because GCC doesn't warn about implicit
declarations when in C89 mode:

$ gcc -std=c89 usleep.c

Put in -Wall and we see that usleep is hidden in <unistd.h>

$ gcc -std=c89 -Wall usleep.c

Kaz Kylheku

unread,

Dec 28, 2023, 4:47:20 PM12/28/23

to

On 2023-12-28, Keith Thompson <Keith.S.T...@gmail.com> wrote:
> Lowell Gilbert <lgus...@be-well.ilk.org> writes:
> [...]
>> To be honest,I didn't actually understand where your problem came from
>> in the first place -- I just chose not to bring up more than one point
>> at a time. While usleep() is obsolete, it works fine, without any
>> feature test macro games, on (as far as I know) all POSIX-ish
>> systems. Certainly on recent Ubuntu, the following program compiles and
>> runs perfectly well without even any warnings with even the most extreme
>> levels of warning enabled:
>>
>> #include <stdio.h>
>> #include <unistd.h>
>>
>> int main(void)
>> {
>> printf("starting\n");
>> usleep(2500000);
>> printf("finishing\n");
>> }
>
> I think the warnings you enabled weren't extreme enough:
>
> $ gcc -std=c11 -c c.c
> c.c: In function ‘main’:
> c.c:7:4: warning: implicit declaration of function ‘usleep’; did you mean ‘sleep’? [-Wimplicit-function-declaration]
> 7 | usleep(2500000);
> | ^~~~~~
> | sleep
> $

What's actually happening is that -std=c11 causes <unistd.h> to hide
the usleep declaration. (Even tough <unistd.h> isn't in ISO C).

If you just use:

gcc -Wimplicit-function-declaration -c c.c

it will not warn. Since no feature selection has been made, all symbols
in headers are maximally visible.

Unless you're using an old GCC whose default dialect is gnu89,
the -Wimplicit-functiond-declaration option is already present in
the default dialect (like gnu99, gnu11 or gnu17).

Lowell Gilbert

unread,

Dec 28, 2023, 6:05:16 PM12/28/23

to

Yes, that's true. I was making an educated guess that the original poster
wasn't actually asking for strict C99, despite referring to a "C99 program." I
think the statement that "I'm just programming non-portably for my local (and
static, non-changing) environment" is strong evidence for this. The discussion
had clearly gone beyond standard C before that, usleep() has never been part
of the actual language.

Be well.

Keith Thompson

unread,

Dec 28, 2023, 6:12:57 PM12/28/23

to

That's an interesting choice. Just including <unistd.h> has undefined
behavior as far as ISO C is concerned, but gcc never (?) warns about
that regardless of the options.

In glibc on Ubuntu 22.04, the declaration of usleep() in <unistd.h>
is protected by:

#if (defined __USE_XOPEN_EXTENDED && !defined __USE_XOPEN2K8) \
|| defined __USE_MISC

IMHO it would have been better if standards that extend the C library
declared all additional declarations in non-standard headers like
<unistd.h> rather than adding them to ISO C standard headers like
<stdio.h> and <stdlib.h>. But a lot of the stuff that POSIX adds
probably predates ISO C90, so there was never a good opportunity to make
a clean split.

[...]

Lawrence D'Oliveiro

unread,

Dec 28, 2023, 9:35:59 PM12/28/23

to

On Tue, 26 Dec 2023 16:59:40 +0100, Janis Papanagnou wrote:

> But as got obvious *that* way there had been side-effects and I had to
> put the tag at the beginning of all include files (which astonished me)

It has always been thus
<https://manpages.debian.org/7/feature_test_macros.en.html>:

NOTE: In order to be effective, a feature test macro must be defined

before including any header files.

> Last time I looked into the system header files, three decades ago, I
> got repelled by all the #ifdef's, cascaded and nested, a spaghetti code
> of dependencies; I'm astonished it works.

The whole concept of include files and string-based macro processing is
flawed. But that’s C for you ...

Bart

unread,

Dec 29, 2023, 8:31:50 AM12/29/23

to

It's not just C's fault. It's the insistence of having have just ONE
system header that has to work for as many platforms and versions as
possible.

Then that is just added to over the years to include to result in the
patched-together mess that you see that is utterly unreadable. You can't
simplify it it take things out because something could break. It is fragile.

Why not have a dedicated header file that is the specific to a
particular version of a C compiler for a given platform? That it can be
streamlined for that purpose.

If someone is maintaining compilers that need to work across a range of
targets, then they can have a process that synthesises the header needed
for a specific configuration.

(I guess this is something that is harder on Linux because there, many
standard headers are not part of a specific C compiler, but are a
resource shared by all C compilers, or tools that need to process C
headers.)

David Brown

unread,

Dec 29, 2023, 9:58:37 AM12/29/23

to

This kind of dilemma turns up all the time in development. You
regularly have multiple variations of projects or code, where most
things are the same but there are a few important differences scattered
around. If you choose to have separate code bases, you have to
duplicate work for new features or bug fixes. If you choose to combine
them, you risk a mess of compile-time conditionals, or run-time
conditionals that complicate the code and make it less efficient. The
first method is a pain for people maintaining the general code, while
the second method is a pain for people only interested in one variation.
Sometimes it is possible to have a clean separation between common
parts and variant-specific parts, other times that just makes things
even more complicated.

There is no good answer here, and often you are stuck with decisions
that made sense when they were made, but are no longer the best choice
as the project has grown and developed. And yet at no point is it
feasible to throw things out and start again.

I agree it is not just C's fault - it is, I think, inevitable for any
long-lived large project that is used in many ways. And it is always
going to be a pain for some people, no matter what is done.

At least for things like glibc the great majority of people using it for
their development can ignore the mess and read the documentation instead
of looking into the header files. And most of those that /do/ need to
look in the headers, only need to do so occasionally.

A useful tool that someone might like to write for this particular
situation would be a partial C preprocessor, letting you choose what
gets handled. You could choose to expand the code here for, say,
_GNU_SOURCE and _BSD_SOURCE - any use of these in #ifdef's and
conditional compilation would be expanded according to whether you have
defined the symbols or not, leaving an output that is easier to
understand while keeping most of the pre-processor stuff unchanged (so
not affecting #includes, and leaving #define'd macros and constants
untouched and therefore more readable).

Janis Papanagnou

unread,

Dec 29, 2023, 10:04:28 AM12/29/23

to

On 28.12.2023 20:11, Lowell Gilbert wrote:
>
> To be honest,I didn't actually understand where your problem came from
> in the first place -- I just chose not to bring up more than one point
> at a time. While usleep() is obsolete, it works fine, without any
> feature test macro games, on (as far as I know) all POSIX-ish
> systems. Certainly on recent Ubuntu, the following program compiles and
> runs perfectly well without even any warnings with even the most extreme
> levels of warning enabled:

> [snip code]

Here's the output of the compiler call with #define _GNU_SOURCE removed

$ cc -std=c99 -o warn warn.c
warn.c: In function ‘delay_time’:
warn.c:368:3: warning: implicit declaration of function ‘strnlen’
[-Wimplicit-function-declaration]
warn.c: In function ‘main’:
warn.c:579:5: warning: implicit declaration of function ‘usleep’
[-Wimplicit-function-declaration]

It compiles, but if I see warnings I nonetheless try to get rid of them.

$ cc --version
cc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

(It's no "recent Ubuntu", I'm sure.)

Janis

Janis Papanagnou

unread,

Dec 29, 2023, 10:12:10 AM12/29/23

to

On 29.12.2023 00:04, Lowell Gilbert wrote:
>
> Yes, that's true. I was making an educated guess that the original poster
> wasn't actually asking for strict C99, despite referring to a "C99 program."

I switched (from the default cc setting) to -std=c99 since when a
compile run said that my program needs C99 to run. That was the
whole reason for it. And I mentioned it in my post since I thought
it would be relevant context information to answer the question.

Janis

Scott Lurndal

unread,

Dec 29, 2023, 10:53:00 AM12/29/23

to

Janis Papanagnou <janis_pap...@hotmail.com> writes:

>>> I'll just recommend that you follow POSIX and use nanosleep()
>>> instead.
>
>When I had read about the various 'sleep' options I decided to use one
>which supports sub-second resolution and with a most simple interface.
>That's why my choice was the simple 'usleep(usec);' even if obsolete
>by POSIX. The nanosleep() is not "very complex", sure, but I'd have to
>litter my code with variables unnecessary in my context, and also the
>advertised "advantages" of this function do not apply in my case.[*]

You can always define your own usleep:

inline int
usleep(useconds_t microseconds)
{
struct timespec ts;
ts.tv_sec = microseconds / (useconds_t)1000000;
ts.tv_nsec = (long)(microseconds % (useconds_t)1000000) * 1000L;
return nanosleep(&ts, NULL);
}

Janis Papanagnou

unread,

Dec 29, 2023, 11:27:41 AM12/29/23

to

On 29.12.2023 16:52, Scott Lurndal wrote:
> Janis Papanagnou <janis_pap...@hotmail.com> writes:
>
>>>> I'll just recommend that you follow POSIX and use nanosleep()
>>>> instead.
>>
>> When I had read about the various 'sleep' options I decided to use one
>> which supports sub-second resolution and with a most simple interface.
>> That's why my choice was the simple 'usleep(usec);' even if obsolete
>> by POSIX. The nanosleep() is not "very complex", sure, but I'd have to
>> litter my code with variables unnecessary in my context, and also the
>> advertised "advantages" of this function do not apply in my case.[*]
>
> You can always define your own usleep:

LOL, yes. :-)

I usually want to take what's already there and not re-implement
every function. What I mean is; in my case I use a function call
and that is it.

>
> inline int
> usleep(useconds_t microseconds)
> {
> struct timespec ts;
> ts.tv_sec = microseconds / (useconds_t)1000000;
> ts.tv_nsec = (long)(microseconds % (useconds_t)1000000) * 1000L;
> return nanosleep(&ts, NULL);
> }
>

Lowell Gilbert upthread already suggested to use nanosleep() and
do the multiplications, and he was confident, rightly, that I'm
able to "figure that out". :-)

BTW, is 'inline' meanwhile C standard? (I know that from C++ but
haven't done much C for long now.)

Janis

Richard Damon

unread,

Dec 29, 2023, 11:58:33 AM12/29/23

to

And using #if in the headers is one way to "have a process" that
"synthesises the header needed for a specific configuration", without
needing some special code in the compiler, and still makes the header
available to the user for inspection.

I suppose the alternative is a separate include hierarchy for EVERY
combination of configurations, and needing to modify multiple headers to
add/fix features.

Kaz Kylheku

unread,

Dec 29, 2023, 12:44:35 PM12/29/23

to

On 2023-12-29, Bart <b...@freeuk.cm> wrote:
> On 29/12/2023 02:35, Lawrence D'Oliveiro wrote:
> > On Tue, 26 Dec 2023 16:59:40 +0100, Janis Papanagnou wrote:
> >
> >> But as got obvious *that* way there had been side-effects and I had to
> >> put the tag at the beginning of all include files (which astonished me)
> >
> > It has always been thus
> > <https://manpages.debian.org/7/feature_test_macros.en.html>:
> >
> > NOTE: In order to be effective, a feature test macro must be defined
> > before including any header files.
> >
> >> Last time I looked into the system header files, three decades ago, I
> >> got repelled by all the #ifdef's, cascaded and nested, a spaghetti code
> >> of dependencies; I'm astonished it works.
> >
> > The whole concept of include files and string-based macro processing is
> > flawed. But that’s C for you ...
>
> It's not just C's fault. It's the insistence of having have just ONE
> system header that has to work for as many platforms and versions as
> possible.

Umm, no; system headers are largely system-specific.

>
> Then that is just added to over the years to include to result in the
> patched-together mess that you see that is utterly unreadable. You can't
> simplify it it take things out because something could break. It is fragile.

The preprocessing selection in system headers is for the different
expectations of programs which target different versions of the system
interfaces.

It is to prevent clashes. If an identifier like usleep is hidden,
it means that the program can use it without triggering a redefinition
or other error.

When you compile with, say, -std=c99 (and no other feature selection)
and include <stdio.h>, that header must not declare fdopen or fileno,
which are POSIX extensions. That matters if your program happens to
use these identifiers: which an ISO C program is entitled to do.

> Why not have a dedicated header file that is the specific to a
> particular version of a C compiler for a given platform? That it can be
> streamlined for that purpose.

That's predominantly the case, other than (obviously) for libraries
that are intended to be widely portable.

>
> If someone is maintaining compilers that need to work across a range of
> targets, then they can have a process that synthesises the header needed
> for a specific configuration.

In fact, I remember from long ago that GCC used to have a script,
as part of its build, that would scan the platform's native headers and
generate sanitized versions for GCC. Not sure if that is the case any
more.

> (I guess this is something that is harder on Linux because there, many
> standard headers are not part of a specific C compiler, but are a
> resource shared by all C compilers, or tools that need to process C
> headers.)

Headers on GNU/Linux systems tend to assume GCC. Clang would not be
usable did it not have GCC compatibility.

Kaz Kylheku

unread,

Dec 29, 2023, 12:51:27 PM12/29/23

to

This is a good plan because the implicit declaration of a function
is only a guess, which can easily be wrong. The return type is
guessed to be int (wrong for strnlen which has size_t). The number
of arguments and their types are guessed from the call, and could
be wrong. E.g. if sin is not declared then sin(0) implicity declares
it as int (int). Implicit declaration has been dropped from ISO C;
for a number of decades, it was there in support of K&R C programs.
Compiler vendors can support it as long as they like, but a diagnostic
is now required when an undeclared function is used.

Kenny McCormack

unread,

Dec 29, 2023, 1:10:51 PM12/29/23

to

In article <iXBjN.109557$p%Mb.3...@fx15.iad>,
Scott Lurndal <sl...@pacbell.net> wrote:
...

>You can always define your own usleep:
>
>inline int
>usleep(useconds_t microseconds)

> { etc... }

Or just:

int usleep(useconds_t microseconds);

The semicolon at the end is the key.
--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/Pedantic

Keith Thompson

unread,

Dec 29, 2023, 1:33:41 PM12/29/23

to

David Brown <david...@hesbynett.no> writes:
> A useful tool that someone might like to write for this particular
> situation would be a partial C preprocessor, letting you choose what
> gets handled. You could choose to expand the code here for, say,
> _GNU_SOURCE and _BSD_SOURCE - any use of these in #ifdef's and
> conditional compilation would be expanded according to whether you
> have defined the symbols or not, leaving an output that is easier to
> understand while keeping most of the pre-processor stuff unchanged (so
> not affecting #includes, and leaving #define'd macros and constants
> untouched and therefore more readable).

The unifdef tool does some of this. (I haven't used it much.)

Keith Thompson

unread,

Dec 29, 2023, 1:55:01 PM12/29/23

to

Kaz Kylheku <433-92...@kylheku.com> writes:
> On 2023-12-29, Bart <b...@freeuk.cm> wrote:

[...]

>> (I guess this is something that is harder on Linux because there, many
>> standard headers are not part of a specific C compiler, but are a
>> resource shared by all C compilers, or tools that need to process C
>> headers.)
>
> Headers on GNU/Linux systems tend to assume GCC. Clang would not be
> usable did it not have GCC compatibility.

On my Ubuntu 22.04 system, tcc manages to use the system headers, which
are mostly provided by glibc. In a quick glance at /usr/include/stdio.h,
I see some #ifdefs for symbols like __GNUC__ (which is predefined by gcc
and clang but not by tcc) and __USE_GNU (I haven't bothered to look into
how that's defined).

Keith Thompson

unread,

Dec 29, 2023, 2:01:40 PM12/29/23

to

Janis Papanagnou <janis_pap...@hotmail.com> writes:
[...]

> BTW, is 'inline' meanwhile C standard? (I know that from C++ but
> haven't done much C for long now.)

C added inline in C99 (the 1999 edition of the ISO C standard, the same
one that removed implicit int).

I think C and C++ have subtly different semantics for inline.

Kaz Kylheku

unread,

Dec 29, 2023, 3:20:11 PM12/29/23

to

On 2023-12-29, Keith Thompson <Keith.S.T...@gmail.com> wrote:
> Kaz Kylheku <433-92...@kylheku.com> writes:
>> On 2023-12-29, Bart <b...@freeuk.cm> wrote:
> [...]
>>> (I guess this is something that is harder on Linux because there, many
>>> standard headers are not part of a specific C compiler, but are a
>>> resource shared by all C compilers, or tools that need to process C
>>> headers.)
>>
>> Headers on GNU/Linux systems tend to assume GCC. Clang would not be
>> usable did it not have GCC compatibility.
>
> On my Ubuntu 22.04 system, tcc manages to use the system headers, which
> are mostly provided by glibc. In a quick glance at /usr/include/stdio.h,
> I see some #ifdefs for symbols like __GNUC__ (which is predefined by gcc
> and clang but not by tcc) and __USE_GNU (I haven't bothered to look into
> how that's defined).

__GNUC__ would be a definite ever-present signal from the compiler.

__USE_GNU is the internal feature selector corresponding to when you
use the externally documented _GNU_SOURCE.

Kaz Kylheku

unread,

Dec 29, 2023, 3:23:44 PM12/29/23

to

On 2023-12-29, Keith Thompson <Keith.S.T...@gmail.com> wrote:

> David Brown <david...@hesbynett.no> writes:
>> A useful tool that someone might like to write for this particular
>> situation would be a partial C preprocessor, letting you choose what
>> gets handled. You could choose to expand the code here for, say,
>> _GNU_SOURCE and _BSD_SOURCE - any use of these in #ifdef's and
>> conditional compilation would be expanded according to whether you
>> have defined the symbols or not, leaving an output that is easier to
>> understand while keeping most of the pre-processor stuff unchanged (so
>> not affecting #includes, and leaving #define'd macros and constants
>> untouched and therefore more readable).
>
> The unifdef tool does some of this. (I haven't used it much.)

GNU cpp has an option which is something like this: -fdirectives-only.
It causes it not to expand macros.

However, I don't think there is any way to prevent the removal of
comments, which could be a deal breaker.

Scott Lurndal

unread,

Dec 29, 2023, 5:19:05 PM12/29/23

to

Keith Thompson <Keith.S.T...@gmail.com> writes:
>Janis Papanagnou <janis_pap...@hotmail.com> writes:
>[...]
>> BTW, is 'inline' meanwhile C standard? (I know that from C++ but
>> haven't done much C for long now.)
>
>C added inline in C99 (the 1999 edition of the ISO C standard, the same
>one that removed implicit int).
>
>I think C and C++ have subtly different semantics for inline.

Mostly related to symbol visibility, IIRC.

Bart

unread,

Dec 29, 2023, 5:41:01 PM12/29/23

to

On 29/12/2023 20:23, Kaz Kylheku wrote:
> On 2023-12-29, Keith Thompson <Keith.S.T...@gmail.com> wrote:
>> David Brown <david...@hesbynett.no> writes:
>>> A useful tool that someone might like to write for this particular
>>> situation would be a partial C preprocessor, letting you choose what
>>> gets handled. You could choose to expand the code here for, say,
>>> _GNU_SOURCE and _BSD_SOURCE - any use of these in #ifdef's and
>>> conditional compilation would be expanded according to whether you
>>> have defined the symbols or not, leaving an output that is easier to
>>> understand while keeping most of the pre-processor stuff unchanged (so
>>> not affecting #includes, and leaving #define'd macros and constants
>>> untouched and therefore more readable).
>>
>> The unifdef tool does some of this. (I haven't used it much.)
>
> GNU cpp has an option which is something like this: -fdirectives-only.
> It causes it not to expand macros.

It flattens include files, processes conditionals, and keeps #defines
unchanged.

However, it turns gcc's sys/stat.h from 300 lines into 3000 lines.

If I apply it to my stat.h (also my stddef.h which it includes), which
are 110 lines together, it produces 900 lines. Most of that consists of
lots of built-in #defines with __ prefixes (each complete with a line
saying it is built-in).

When I use my own conversion tool (designed to turn C headers that
define APIs into declarations in my language), the output is 65 lines.

The gcc option does not expand typedefs or macros. So if there is a
declaration using a type which uses both, that is unchanged, which is
not helpful. (At least not if trying to create bindings for your FFI.)

gcc with just -E will expand macros but still keep typedefs.

So, for producing a streamlined standard header, it still leaves a lot
to be desired. And for trying to flatten layers of macros and typedefs,
to reveal the underlying types, it's not that great either.

Purpose-built tools are always better, but dealing with C is not trivial
anyway, and dealing with gnu- and gcc-specific features is even harder.

Lawrence D'Oliveiro

unread,

Dec 29, 2023, 8:28:31 PM12/29/23

to

On Fri, 29 Dec 2023 22:40:47 +0000, Bart wrote:

> It flattens include files, processes conditionals, and keeps #defines
> unchanged.
>
> However, it turns gcc's sys/stat.h from 300 lines into 3000 lines.

Still, merging all the stuff into fewer files likely means it loads
faster.

Includes files are kludge, and all these attempts to improve them are a
kludge on top of a kludge. This is why better-designed languages have a
proper module system that solves the whole issue.

Bart

unread,

Dec 29, 2023, 8:59:14 PM12/29/23

to

Flattening hierarchies of include files is not popular in C
applications. But this option doesn't go far enough anyway.

Loading files is not a bottleneck in compilation; it is repeated
processing of huge amounts of content.

For example a GTK2 user will include "gtk.h", but that involves over
1000 other #includes, 550 unique, totaling 350K lines of declarations
across a dozen folders.

All that processing is repeated for each module that includes it.

When I applied my tool to GTK2, I found that all information could be
reduced to a single, flat 25K line representation in my syntax. (My
language /has/ a module scheme; that interface file is processed once
per build.)

It could just as well have generated a 25K line C header file. That's
93% smaller than all those separate headers, and one #include instead of
1000.

So, why don't the vendors of the library do that exercise? End-users
don't need 550 separate header files.

By contrast, all 30 or so standard C headers have 3-5K lines in total.
The problem there is that it is just very messy.

Blue-Maned_Hawk

unread,

Dec 30, 2023, 1:52:05 AM12/30/23

to

Bart wrote:

> Why not have a dedicated header file that is the specific to a
> particular version of a C compiler for a given platform? That it can be
> streamlined for that purpose.

In fact, this is similar to exactly what Plan 9 did: for include files
that had arch-dependent content, it'd have a separate version of them for
each arch, with an arch's versions of headers like these all stored in
their own directory.

--
Blue-Maned_Hawkâ”‚shortens to Hawkâ”‚/blu.mÉ›in.dÊ°ak/â”‚he/him/
his/himself/Mr.
blue-maned_hawk.srht.site
Of course, further simplicifications would be possible by killing off all
but one or two arches.

Lawrence D'Oliveiro

unread,

Dec 30, 2023, 3:12:28 PM12/30/23

to

On Fri, 29 Dec 2023 13:31:37 +0000, Bart wrote:

> Why not have a dedicated header file that is the specific to a
> particular version of a C compiler for a given platform? That it can be
> streamlined for that purpose.

The GCC compilers already work this way. For example, on my Debian
system I have directories /usr/lib/gcc/x86_64-linux-gnu/12/ and
/usr/lib/gcc/x86_64-linux-gnu/13/. And when I look to see what
packages have put stuff in them, I find quite a lot:

ldo@theon:~> dpkg-query -S /usr/lib/gcc/x86_64-linux-gnu/12
libgcc-12-dev:amd64, gcc-12, libgfortran-12-dev:amd64, g++-12, cpp-12, gfortran-12, libstdc++-12-dev:amd64, gnat-12: /usr/lib/gcc/x86_64-linux-gnu/12
ldo@theon:~> dpkg-query -S /usr/lib/gcc/x86_64-linux-gnu/13
libstdc++-13-dev:amd64, libgcc-13-dev:amd64, libobjc-13-dev:amd64, gfortran-13, gcc-13, libgfortran-13-dev:amd64: /usr/lib/gcc/x86_64-linux-gnu/13

BGB

unread,

Dec 30, 2023, 5:16:31 PM12/30/23

to

On 12/30/2023 12:51 AM, Blue-Maned_Hawk wrote:
> Bart wrote:
>
>> Why not have a dedicated header file that is the specific to a
>> particular version of a C compiler for a given platform? That it can be
>> streamlined for that purpose.
>
> In fact, this is similar to exactly what Plan 9 did: for include files
> that had arch-dependent content, it'd have a separate version of them for
> each arch, with an arch's versions of headers like these all stored in
> their own directory.
>

Yeah, we don't actually need big monolithic C libraries, nor big
monolithic C compilers, ...

Many people's assumption, that one needs "one true C compiler" (with
people debating GCC cs Clang, mostly ignoring any other possibilities)
and then using the same C library, etc, everywhere, may potentially
actually be doing the world a disservice.

Though, it seems like a different kind of strategy could be possible:
Split compilers into frontends and backends (which may exist as
semi-independent projects).

Frontend deals with source languages, compiles down to a common IR (with
the IR taking the place of object files);
Backend compiles this to the actual machine code, and does any linking,
before emitting the actual binary.

Possibly, separate frontends and backends could exist, merely agreeing
on a common IR.

The frontend could be more target neutral, trying to reduce
target-specific features mostly to "general parameters" (realistically,
a C frontend will still need to know things like the sizes of various
types, various other major architectural features, etc...). But, mostly,
it will care about input languages.

Though, in my compiler I had added an extension for an "__ifarch()" feature:
__ifarch(feature) some_decl
Where, used like an attribute, the declaration is only enabled (in the
backend) if the corresponding feature is present (though the features
may be combined using logical expressions, so "!feature",
"feature1&&!feature2", ...).

But also for blocks of code:
__ifarch(feature)
{
// only emitted if feature is present.
...
}
But, unlike "#ifdef", all the code needs to be syntactically valid, etc.
Partly, this is because the front-end still needs to be able to compile it.

The backend will care about target machine, but should not need to care
(too much) about the language, though will likely need to care about
some things outside the scope of plain C (such as namespaces, objects,
and exceptions, ...), mostly to be able to deal effectively with "OO
languages" and similar (though, I have noted that some simplifications
are possible relative to something like C++, where it is possible in
premise to mimic C++ style multiple inheritance on top of a
single-inheritance object model by treating MI classes as aggregates, ...).

In my experience, the "generally least bad" option for an IR seems to be
for a stack-machine like model (similar to JVM or .NET).

While a backend will most likely operate in terms of a TAC or SSA like form:
RPN -> TAC or SSA is fairly easy.
TAC -> SSA is more involved;
Using SSA directly exposes piles of hair across the interface.

For similar reasons to not work with raw SSA, it also seems like "not a
good thing" to expose bare ASTs (an AST will expose a whole lot of
design choices and tradeoffs within the frontend; while also being
"comparably expensive" in terms of the memory needed to work with them).

Though, one can note that RPN does need around 40-60% more operations
than TAC or SSA, but in practice, this seems to not be too big of an
issue (most of these extra stack operations "magically disappear" if one
treats the stack more as a holding space for variable IDs, than as an
"actual location that stores stuff").

Note that the use of the stack will be fairly constrained in such an IR
to use this model (unlike Forth, it is not free-form relative to control
flow). However, given the existing VMs also impose similar constraints,
it seems likely they also work in this way.

Some major tradeoffs within a stack IR:
Explicit types in opcodes (like JVM) or implicit (like .NET);
Explicit is better for a naive interpreter;
Implicit makes more sense for a compiler or JIT.
Though, better here is to leave type-promotions to the frontend.
Whether to fuse variable stores, etc, into the ops.
LOAD x; LOAD #1; ADD; STORE y
Vs:
LOAD x; LOAD #1; ADD_ST y
...

Well, along with tradeoffs for how to package the bytecode within a
file, and how to best express the various metadata (such as "a few big
tables holding everything", like in JVM, or splitting stuff into a large
number of interconnected tables like in .NET, ...).

For my compiler, it was:
Stack based IR, implicit types, with variable store merged with various ops;
Packaging was a big linear blob of bytecode (with metadata itself
expressed in the bytecode), but better likely would have been to use a
more structured TLV-style format and/or a WAD-file variant;
Metadata was split into two major tables:
One held everything that was a global variable of function declaration
or similar;
The other table held anything that was more like a value or type
declaration (so, structs/unions/classes/etc, value lists for initialized
arrays, etc).

In my case, sort of like PostScript, there is an explicit MARK operator
for function calls, eg:
dst=foo(args...);
Becomes:
MARK; args...; CALL foo; STORE dst;

...

For C library, there can be per platform libraries, or possibly ones
with a partial split between "generic C library stuff" and "stuff that
needs to be platform specific" (based on OS, target architecture, ...).

But, this is likely its own topic...

Lawrence D'Oliveiro

unread,

Dec 30, 2023, 6:21:35 PM12/30/23

to

On Sat, 30 Dec 2023 16:16:07 -0600, BGB wrote:

> Many people's assumption, that one needs "one true C compiler" (with
> people debating GCC cs Clang, mostly ignoring any other possibilities)
> and then using the same C library, etc, everywhere, may potentially
> actually be doing the world a disservice.

Actually, we have quite a choice of C libraries, even on the same Linux
platform. I once used one of them (I think it was musl) just to prove I
could build a small executable† that would run in just 20kB of RAM on a
modern 64-bit Linux machine.

†It did nothing more than take an integer command-line argument and sleep
for that number of seconds. That way the process would hang around long
enough for me to confirm how much/little RAM it was using.

BGB

unread,

Dec 30, 2023, 8:15:18 PM12/30/23

to

On Linux, I think yeah, glibc, newlib, and musl, are the main options.
Seemingly, a majority of stuff assumes glibc though.

With main popular compilers being GCC and LLVM/Clang.
There also exist variants of, say: lcc, tcc, pcc, etc...

They exist, but most people seem to ignore them...

For my custom hobby OS + CPU ISA, I am using:
BGBCC, full custom C compiler, though it is mostly based on code that
has been beating around in my projects for the past 20 years or so
(originally starting as a fork off an early version of an interpreter I
had written for a JavaScript-like language).

The main alternatives at the time were:
Modify GCC to support my designs, which looked like a massive pain;
LLVM, which even trying to recompile it one-off "totally owned" my PC at
the time;
LCC, which looked possible, but didn't offer much obvious advantage over
the code I already had laying around;
...

With the C library being a heavily modified version of PDPCLIB that had
also grown a lot of OS style appendages (the OS kernel, C library, and
shell, are all sort of a conjoined mess that probably needs some amount
of redesign).

Did eventually get around to make it so that at least, normal user-mode
binaries will omit much of the kernel related code from the binary
(though, does mean these binaries can no longer be launched "bare metal").

Had started trying to write a new C library at one point, but effort on
this kinda fizzled.

Internal architecture gets a little weird, with various parts of the
runtime libraries and OS interfacing via structures resembling COM
objects (with a C like API wrapping the internal COM style interfaces).
Along with some funkiness like using COM style interfaces to call
services running inside different tasks.

Ironically, the system call mechanism could itself almost be a COM
object, apart from the fact that it was instead implemented as a "magic
function pointer" (which can be used to perform some system calls
directly, and to fetch objects for some other OS API interfaces).

Though, it is not exactly the same:
There is no IDL tool;
There is more variability in terms of method signatures and naming;
Interfaces are generally identified as pairs of FOURCC or EIGHTCC values
rather than GUIDs, but there is the idea that the EIGHTCC pair could
easily be used for a GUID as well (though, likely limiting GUIDs to
"private interfaces", with FOURCC/EIGHTCC making more sense for OS
provided APIs and services).

Part of this was in-turn because providing inter-task interfaces via a
"GetProcAddress" style mechanism would have resulted in considerably
higher overheads (an full object has roughly the same overhead, but can
provide a bunch of different functions all at the same time).

The handling for DLL's and the C library is kind of backwards compared
with the strategy Windows uses:

Windows: Each binary/DLL has a C library linked in, which calls back to
shared DLLs for the OS API.

My case: Main EXE has a statically-linked C library, which provides an
interface that the DLLs can use;
Each DLL has a stripped down C library, which uses (yet another)
COM-style interface to call back into the C library provided by the main
binary (which may then redirect calls across the system-call interface
or similar).

It was done this way mostly for "technical reasons" (partly because,
among the available options, this did not preclude bare metal programs
or the kernel from being able to use DLLs).

Note that (unlike ELF on Linux), it is not currently possible to
directly share global variables across DLL boundaries.

Though, the effect this would have on the C library are reduced as
stdin/stdout/errno/etc were already implemented as macros wrapping calls
to getter functions.

Eg:
errno_t *__get_errno();
#define errno (*(__get_errno()))

...

All still kind of a mess though...

Similarly, my compiler can at least sort of mimic the POSIX style
command-line interface (though was written originally with a more
MSVC-like CLI);
But, still falls short of being able to satisfy "./configure" and
similar that it is a valid cross compiler.

Then again, not like there is much hope of getting stuff ported when in
general, nearly all of the Linux software seems to take a "you need the
whole jungle to get a single banana" approach to software dependencies
(and then much of the code tends to be riddled with GCC'isms).

Often almost easier to port DOS era or older Unix-era code (it is still
not usually going to work "out of the box" but at least tends to be more
straightforward for how one is to go about porting it).

Well, at least excluding some "obviously nasty" code...

...

Lawrence D'Oliveiro

unread,

Dec 30, 2023, 8:34:51 PM12/30/23

to

On Sat, 30 Dec 2023 19:14:55 -0600, BGB wrote:

> Note that (unlike ELF on Linux), it is not currently possible to
> directly share global variables across DLL boundaries.

Windows is broken in so many ways ... when you achieve something, it’s
like getting a bear to dance: it’s not that it dances badly, but that it
dances at all.

Lawrence D'Oliveiro

unread,

Dec 30, 2023, 8:36:24 PM12/30/23

to

On Sat, 30 Dec 2023 01:58:53 +0000, Bart wrote:

> So, why don't the vendors of the library do that exercise?

Maybe because most of the “vendors” of proprietary libraries have gone
extinct. What we have now is “developers” and “contributors” to open-
source projects. And if you have a bright idea for how they can do things
better, you are free to contribute it.

Bart

unread,

Dec 30, 2023, 9:06:57 PM12/30/23

to

I have plenty of ideas, but people are generally not interested. Even if
some were, how do you persuade the creators of 100s of libraries to do
things differently?

So I use my ideas in my own languages and in my own compilers, including
one for C. There, the standard headers /are/ specific to that platform,
although I do only support one. (If another comes along, it will have
its own set!)

Bart

unread,

Dec 30, 2023, 9:18:38 PM12/30/23

to

I think that that limitation was specific to BGB's handling of DLLs; it
was not made clear.

If it is an actual issue on Windows, then someone would first have to
explain what it means, and why it happens, as I can't see it.

Each DLL exports certain symbols such as the addresses of functions and
variables. So no reason you can't access a variable exported from any
DLL, unless perhaps multiple instances of the same DLL have to share the
same static data, but that sounds very unlikely, as little would work.

>Windows is broken in so many ways

Other examples? I find rather the opposite. For example I did like this
remark of BGB's:

>Then again, not like there is much hope of getting stuff ported when
in general, nearly all of the Linux software seems to take a "you need
the whole jungle to get a single banana" approach to software
dependencies (and then much of the code tends to be riddled with GCC'isms)

This somes my experience of software originating in Linux. This is why
Windows had to acquire CYGWIN then MSYS then WSL. You can't build the
simplest program without involving half of Linux.

BGB

unread,

Dec 31, 2023, 12:46:26 AM12/31/23

to

On 12/30/2023 8:18 PM, Bart wrote:
> On 31/12/2023 01:34, Lawrence D'Oliveiro wrote:
>> On Sat, 30 Dec 2023 19:14:55 -0600, BGB wrote:
>>
>>> Note that (unlike ELF on Linux), it is not currently possible to
>>> directly share global variables across DLL boundaries.
>>
>> Windows is broken in so many ways ... when you achieve something, it’s
>> like getting a bear to dance: it’s not that it dances badly, but that it
>> dances at all.
> I think that that limitation was specific to BGB's handling of DLLs; it
> was not made clear.
>

Yes.

In my case (for my custom target) I am using a modified version of
PE/COFF (with some tweaks, *), but it has the issue that there is not
currently (any) mechanism for sharing variables across DLL boundaries
apart from getter/setter functions or similar.

*1: Major differences from Windows' PE/COFF:
Removing the MZ header
File typically starts directly at the 'PE' magic;
For my uses, the MZ stub/header was entirely useless.
The loaders will still handle it if encountered;
But, having MZ also disables some of the other format tweaks.
Adding LZ4 compression
'PEL4' magic: Everything past the first 1K is LZ4 compressed.
This is an optional extension, mostly to help with loader speed.
The loader is mostly IO bound, and LZ4 can make loading faster.
The resource section was replaced with a modified WAD2 variant.
Albeit, with different headers, and offsets encoded in RVA space.
The original format seemed needlessly complicated and awkward.
IOW: Based on the format used for textures in Quake and Half-Life.
I also went over to a different checksum algorithm and similar.
The original algorithm was very weak against some types of errors.
...

However, much of its structure still has more in common with PE/COFF
than some of the other COFF variants (such as ECOFF and XCOFF).

Note, original algorithm was something like (from memory):
u32 PE_Checksum(void *buf, int sz)
{
uint16_t *cs, *cse;
uint32_t sum;
cs=buf; cse=cs+(sz>>1); sum=0;
while(cs<cse)
{ sum+=*cs++; }
sum=((uint16_t)sum)+(sum>>16);
sum=((uint16_t)sum)+(sum>>16);
return(sum);
}
Had replaced it with:
uint32_t TKPE_CalculateImagePel4BChecksum(byte *buf, int size)
{
byte *cs, *cse;
uint32_t v, v0, v1, v2, v3;
uint64_t acc_lo, acc_hi;
uint32_t csum;
cs=buf; cse=cs+size;
acc_lo=1; acc_hi=0;
while(cs<cse)
{
v0=((uint32_t *)cs)[0]; v1=((uint32_t *)cs)[1];
v2=((uint32_t *)cs)[2]; v3=((uint32_t *)cs)[3];
acc_lo=acc_lo+v0; acc_hi=acc_hi+acc_lo;
acc_lo=acc_lo+v1; acc_hi=acc_hi+acc_lo;
acc_lo=acc_lo+v2; acc_hi=acc_hi+acc_lo;
acc_lo=acc_lo+v3; acc_hi=acc_hi+acc_lo;
cs+=16;
}
acc_lo=((uint32_t )acc_lo)+(acc_lo>>32);
acc_lo=((uint32_t )acc_lo)+(acc_lo>>32);
acc_hi=((uint32_t )acc_hi)+(acc_hi>>32);
acc_hi=((uint32_t )acc_hi)+(acc_hi>>32);
csum=(uint32_t )(acc_lo^acc_hi);
return(csum);
}

Where in this case, my ISA has enough registers that is is generally
faster to do the accumulation 4-wide than 1-wide.

The use of checksums in this case mostly to verify that the program has
been loaded intact and the binary is not corrupt.

> If it is an actual issue on Windows, then someone would first have to
> explain what it means, and why it happens, as I can't see it.
>
> Each DLL exports certain symbols such as the addresses of functions and
> variables. So no reason you can't access a variable exported from any
> DLL, unless perhaps multiple instances of the same DLL have to share the
> same static data, but that sounds very unlikely, as little would work.
>

Much past roughly Win9x or so, it has been possible to use
"__declspec(dllimport)" on global variables in Windows (in an earlier
era, it was not possible to use the __declspec's, but instead necessary
to manage DLL import/exports by writing out lists in ".DEF" files).

It isn't entirely transparent, but yes, on actual Windows, it is very
much possible to share global variables across DLL boundaries.

Just, this feature is not (yet) supported by my compiler. Personally, I
don't see this as a huge loss (even if it did work; I personally see it
as "poor coding practice").

But, otherwise, I am using the newer __declspec dllimport/dllexport
syntax; and technically using exclusively import/export by name (the use
of ordinal numbers is not supported either by my compiler or PE loader).

But, yeah, making shared global variables work is on the "eventual TODO"
list, just not an immediate priority.

Note that, like with the MS tools, unless a function is marked as
dllexport, its visibility is purely local to a given EXE or DLL.

This differs slightly from Cygwin, which seems to use an "export
everything" (and implicitly import everything) strategy, likely in an
attempt to mimic ELF behavior.

Though, there are some differences:
In ELF based systems, it is possible to leave symbols absent in the
SO's, with them resolved to a symbol exported from the main binary;
This strategy doesn't really work with DLLs (where the import dependency
tree needs to be acyclic).

> >Windows is broken in so many ways
>
> Other examples? I find rather the opposite. For example I did like this
> remark of BGB's:
>
> >Then again, not like there is much hope of getting stuff ported when
> in general, nearly all of the Linux software seems to take a "you need
> the whole jungle to get a single banana" approach to software
> dependencies (and then much of the code tends to be riddled with GCC'isms)
>
> This somes my experience of software originating in Linux. This is why
> Windows had to acquire CYGWIN then MSYS then WSL. You can't build the
> simplest program without involving half of Linux.

Yes, and it is really annoying sometimes.

For the most part, Linux software builds and works fairly well... if one
is using a relatively mainline and relatively up-to-date Linux distro...

But, if one is not trying to build in or for a typical Linux style / GNU
based userland; it is straight up pain...

Like, typically either the "./configure" script is going to go down in a
crap-storm of error messages (say, if the shell is not "bash", or some
commands it tries to use are absent or don't accept the same
command-line arguments, etc); or libraries are going to be missing; or
the build just ends up dying due to compiler errors (say, which headers
exist are different, or their contents are different, ...).

Within the code itself, it often doesn't take much looking to find one of:
Pointer arithmetic on "void *";
Various GCC specific "__attribute__((whatever))" modifiers;
Blobs of GAS specific inline ASM;
...

Whereas in more cross-platform code, one will usually find stuff like:
#ifdef __GNUC__
... GCC specific stuff goes here ...
#endif
#ifdef _MSC_VER
... MSVC specific stuff goes here ...
#endif
...

And, different ways of doing stuff, say:
Some stuff that works on MSVC will break on GCC;
Some stuff that performs well on GCC will perform like garbage on MSVC;
...

Sometimes it makes sense to write a bunch of wrapper code or macros for
various tasks which differ between compilers and targets.

Though, have noted that sometimes programs will work at one optimization
level, but break at another (so, "-O3" or more so "-Ofast" is "playing
with fire" with GCC, as it is with "/O2" in MSVC; with GCC one often
needing "-fno-strict-aliasing -fwrapv" and similar for some older code
to work correctly).

My compiler uses sort of an intermediate C dialect, but is more
conservative by default in some areas, such as treating things like TBAA
as "opt-in" features, rather than "opt-out", ...

Though, I did designate various cases as "no consistent or sensible
behavior exists", so "whatever happens, happens". Separating out cases
that are "technically undefined, but has a conventionally accepted
behavior" (such as using pointer casts for type punning, etc), vs "no
accepted behavior and any behavior that may result is effectively a dice
roll..." (a lot of cases involving out-of-bounds memory access, etc).

Some amount of the extensions have more MSVC-like syntax (albeit the ASM
syntax itself is more derived from GAS style ASM syntax than Intel style
syntax). Though, in particular, it is derived from "GAS SuperH" (which
falls into a similar category as M68K and PDP-11 ASM syntax):
R4 //this is a register
@R4 //memory with address in R4
(R4) //same as @R4
(R4,32) //displacement
32(R4) //same as (R4,32)

Generally, ASM code is passed through the RPN stage by passing it along
as ASCII text blobs (similar to string literals). Where, the backend
will know how to deal with it. Though, this is generally after the
preprocessor and similar has had its way with it (and some minor aspects
of the ASM notation were modified to play along better with the C
preprocesor):
#1234 //doesn't play well with C preprocessor.
1234 //preprocessor leaves it alone

Some programs I had ported, such as ROTT, were full of out-of-bounds
memory access. Though, some aspects of the game did "subtly but
fundamentally change" when a lot of the out-of-bounds access was fixed,
and some of it was in a gray area of "broken but just happened to work
if memory objects are laid out in a certain way" or some "next-level
arcane magic" (where people were intentionally writing code around how
the compiler and memory allocator and similar will end up organizing
stuff in memory).

But, in general, stuff was less buggy when most of the OOB memory
accesses were fixed.

Still never did built up the courage to try poking around with trying to
port "Duke Nukem 3D", the code looks like a trash fire that doesn't seem
particularly worth the effort.

My usual first step is trying to get things ported and working on MSVC
on X64 (since this is "fairly similar" in terms of C dialect and behavior).

...

David Brown

unread,

Dec 31, 2023, 8:40:48 AM12/31/23

to

In C, the "inline" qualifier is pretty much a message from the
programmer saying "I think the resulting code would be more efficient if
this is inlined by the optimiser". Optimising compilers mostly ignore
it. In C++, the "inline" qualifier is a message from the programmer
saying "I might define this thing in multiple translation units, and I
promise I'll do it the same way each time".

You are most likely to see issues if some calls can't be inlined (or are
not inlined by the compiler). If you have code that might need to be
compiled as C or C++ (such as if the function is in a header), the best
method is to declare it "static inline" rather than plain "inline". And
in a C or C++ implementation file, any function that you'd consider
declaring "inline" is likely to be "static" anyway. (Or in an anonymous
namespace in C++, which amounts to the same thing.) "static inline"
works, AFAIK, in exactly the same way in C and C++.

David Brown

unread,

Dec 31, 2023, 10:25:21 AM12/31/23

to

Note that typedefs are part of the core C language, not the
preprocessor, so there could not possibly be a cpp option to do anything
with typedefs (the phrase "expand typedefs" is entirely wrong).

I realise that you (and possibly others) might find it useful for a tool
to replace typedef identifiers with their definitions, but it could only
be done for some cases, and is not as simple as macro substitution.

Bart

unread,

Dec 31, 2023, 10:26:17 AM12/31/23

to

On 31/12/2023 05:46, BGB wrote:
> On 12/30/2023 8:18 PM, Bart wrote:
>> On 31/12/2023 01:34, Lawrence D'Oliveiro wrote:
>>> On Sat, 30 Dec 2023 19:14:55 -0600, BGB wrote:
>>>
>>>> Note that (unlike ELF on Linux), it is not currently possible to
>>>> directly share global variables across DLL boundaries.
>>>
>>> Windows is broken in so many ways ... when you achieve something, it’s
>>> like getting a bear to dance: it’s not that it dances badly, but that it
>>> dances at all.
>> I think that that limitation was specific to BGB's handling of DLLs;
>> it was not made clear.
>>
>
> Yes.
>
> In my case (for my custom target) I am using a modified version of
> PE/COFF (with some tweaks, *), but it has the issue that there is not
> currently (any) mechanism for sharing variables across DLL boundaries
> apart from getter/setter functions or similar.

I find PE/COFF format a nightmare. I did eventually get my tools to
generate first OBJ then EXE files. But when it came to DLL, there was
something wrong in the files I generated that I couldn't fix.

So for a couple of years, I created my own shared library format,
utterly different from PE+, and about 10 times simpler.

The shared library files were called ML, and I extended it to standalone
executables called MX files.

However ML libraries could only be used from my languages, and MX files
needed a small conventional EXE loader to get started.

Eventually I fixed the problems with DLL. (Partly it was that my
generated code wasn't fully position-independent and only ran in
low-memory below 2GB, but there was also a bug in the base-reloc tables.)

Now, sadly, I will probably drop my ML/MX files, even though my MLs have
advantages over DLLs. (Eg. they have the same environment as the host
apps, so that they can share things like pointers to allocated memory
and file handles. With DLL, a pointer malloc-ed in the host cannot be
freed within the DLL and vice versa.)

>> Each DLL exports certain symbols such as the addresses of functions
>> and variables. So no reason you can't access a variable exported from
>> any DLL, unless perhaps multiple instances of the same DLL have to
>> share the same static data, but that sounds very unlikely, as little
>> would work.
>>
>
> Much past roughly Win9x or so, it has been possible to use
> "__declspec(dllimport)" on global variables in Windows (in an earlier
> era, it was not possible to use the __declspec's, but instead necessary
> to manage DLL import/exports by writing out lists in ".DEF" files).
>
> It isn't entirely transparent, but yes, on actual Windows, it is very
> much possible to share global variables across DLL boundaries.
>
>
> Just, this feature is not (yet) supported by my compiler. Personally, I
> don't see this as a huge loss (even if it did work; I personally see it
> as "poor coding practice").

This is a language issue. Or, in C, it is compiler related.

I've never been quite sure how you tell a C compiler to export a certain
symbol when creating a DLL. Sometimes it just works; I think it just
exports everything that is not static (it may depend on a compiler
option too).

And some compilers may need this __declspec business, but I've never
bothered with it.

Mine just exports all not-static names. So this program:

int abc;
static int def;

void F(void) {}
static void G(void) {}

if compiled as: 'mcc -dll prog', produces a file prog.dll which, if I
dump it, shows this export table:

Export Directory

0 00000000 0 Fun F
1 00000000 0 Var abc

(There's something in it that distinguishes functions from variables,
but I can't remember the details.)

In any case, in C it can be hit and miss. In my own language, it is more
controlled: I used an 'export' prefix to export symbols from a program.

(It also conventiently creates interface files to be able to use the DLL
library from a program. The equivalent of prog.h for my example
containing the API needed to use it. Rolling that out to C is not
practical however as my 'export' applies also to things like types and
enums.)

>> This [somes] my experience of software originating in Linux. This is why
>> Windows had to acquire CYGWIN then MSYS then WSL. You can't build the
>> simplest program without involving half of Linux.
>
> Yes, and it is really annoying sometimes.
>
>
> For the most part, Linux software builds and works fairly well... if one
> is using a relatively mainline and relatively up-to-date Linux distro...
>
>
> But, if one is not trying to build in or for a typical Linux style / GNU
> based userland; it is straight up pain...
>
> Like, typically either the "./configure" script is going to go down in a
> crap-storm of error messages (say, if the shell is not "bash", or some
> commands it tries to use are absent or don't accept the same
> command-line arguments, etc); or libraries are going to be missing; or
> the build just ends up dying due to compiler errors (say, which headers
> exist are different, or their contents are different, ...).

./configure is an abomination anyway; I've seen 30,000-line scripts
which take forever to run, and test things like whether 'printf' is
supported.

But the biggest problem with them is when someone expects a Windows user
to use that same build process. Of course, ./configure is a Bash script
using Linux utilities.

It's like someone providing a .BAT file and expecting Linux users to do
something with it.

>
> Within the code itself, it often doesn't take much looking to find one of:
> Pointer arithmetic on "void *";
> Various GCC specific "__attribute__((whatever))" modifiers;
> Blobs of GAS specific inline ASM;
> ...
>
>
> Whereas in more cross-platform code, one will usually find stuff like:
> #ifdef __GNUC__
> ... GCC specific stuff goes here ...
> #endif
> #ifdef _MSC_VER
> ... MSVC specific stuff goes here ...
> #endif
> ...

Those conditional blocks never list my compiler, funnily enough. (#ifdef
__MCC__ will do it.)

> My compiler uses sort of an intermediate C dialect, but is more
> conservative by default in some areas, such as treating things like TBAA
> as "opt-in" features, rather than "opt-out", ...
>
> Though, I did designate various cases as "no consistent or sensible
> behavior exists", so "whatever happens, happens". Separating out cases
> that are "technically undefined, but has a conventionally accepted
> behavior" (such as using pointer casts for type punning, etc), vs "no
> accepted behavior and any behavior that may result is effectively a dice
> roll..." (a lot of cases involving out-of-bounds memory access, etc).
>
> Some amount of the extensions have more MSVC-like syntax (albeit the ASM
> syntax itself is more derived from GAS style ASM syntax than Intel style
> syntax). Though, in particular, it is derived from "GAS SuperH" (which
> falls into a similar category as M68K and PDP-11 ASM syntax):
> R4 //this is a register
> @R4 //memory with address in R4
> (R4) //same as @R4
> (R4,32) //displacement
> 32(R4) //same as (R4,32)

My C compiler is from 2017. Eventually I decided it was too
non-conforming and buggy to be a serious tool. It became a private one
(for example one special feature is being able to turn library APIs
defined as C headers, into bindings for either of my own languages,
although that can only do 90% of the work).

Last autumn I upgraded it with a new backend. But I also got rid of all
experimental features I'd played with.

It may be a poor compiler but CLI-wise it's much better than gcc which
continues to be a pain to use (still generating a.exe), and has odd bugs
in its CLI.

So, mine is still satisfying to have. I just call it a C-subset or
C-dialect compiler to get over the non-conformance. But for building my
own C code, it's my first choice.

Bart

unread,

Dec 31, 2023, 10:45:41 AM12/31/23

to

Take this program, which uses two nested typedefs and one macro:

typedef short T;
typedef T U;
#define V U

typedef struct R {
V a, b, c;
} S;

Passed through 'gcc -E', it manages to expand the V in the struct with
U. (-fdirectives-only doesn't even do that).

So what are the types of 'a, b, c'? Across 1000s of line of code, they
may need tracking down. At least, for someone not using your super-duper
tools.

If I use my compiler with 'mcc -mheaders', I get an output file that
includes this:

record R = $caligned
i16 a
i16 b
i16 c
end

It gives all the information I might need. Including the fact that it
uses default C alignment rules.

Notice however the name of the record is R not S; here it needs a
struct-tag to avoid an anonymous name. The typedef name is harder to use
as it is replaced early on in compilation.

Here, I'm effectively expanding a typedef. The output could just as
equally have been C source code.

Kaz Kylheku

unread,

Dec 31, 2023, 12:26:44 PM12/31/23

to

On 2023-12-31, Bart <b...@freeuk.cm> wrote:
> and file handles. With DLL, a pointer malloc-ed in the host cannot be
> freed within the DLL and vice versa.)

What??? All DLLs are in the same address space. malloc and free are
sister functions that typically live in the same DLL, and don't
care what calls them.

Maybe you're referring to one DLL's free not being able to handle
pointers produced by another DLL's malloc.

(That's not a problem caused by the DLL mechanism itself and would not
go away if those were somehow statically linked together.)

James Kuyper

unread,

Dec 31, 2023, 12:44:05 PM12/31/23

to

On 12/31/23 08:40, David Brown wrote:
...

> In C, the "inline" qualifier is pretty much a message from the
> programmer saying "I think the resulting code would be more efficient
> if this is inlined by the optimiser".

Actually, what the C standard says is "Making a function an
inline function suggests that calls to the function be as fast as
possible". The standard does not specify how this is to be achieved, it
merely imposes some requirements that constrain how it could be
achieved. Inlining a function call is just one way to do that.

Scott Lurndal

unread,

Dec 31, 2023, 1:34:12 PM12/31/23

to

Bart <b...@freeuk.cm> writes:
>On 31/12/2023 01:36, Lawrence D'Oliveiro wrote:
>> On Sat, 30 Dec 2023 01:58:53 +0000, Bart wrote:
>>
>>> So, why don't the vendors of the library do that exercise?
>>
>> Maybe because most of the “vendors” of proprietary libraries have gone
>> extinct. What we have now is “developers” and “contributors” to open-
>> source projects. And if you have a bright idea for how they can do things
>> better, you are free to contribute it.
>
>I have plenty of ideas, but people are generally not interested.

Perhaps they are not very good ideas, then....

Frankly, your obsession with header files is puzzling. 99.9%
percent of C/C++ programmers don't care.

Scott Lurndal

unread,

Dec 31, 2023, 1:38:06 PM12/31/23

to

BGB <cr8...@gmail.com> writes:
>On 12/30/2023 12:51 AM, Blue-Maned_Hawk wrote:
>> Bart wrote:
>>
>>> Why not have a dedicated header file that is the specific to a
>>> particular version of a C compiler for a given platform? That it can be
>>> streamlined for that purpose.
>>
>> In fact, this is similar to exactly what Plan 9 did: for include files
>> that had arch-dependent content, it'd have a separate version of them for
>> each arch, with an arch's versions of headers like these all stored in
>> their own directory.
>>
>
>Yeah, we don't actually need big monolithic C libraries, nor big
>monolithic C compilers, ...

What monolithic C libraries are you referring to? My application
links with a several dozen libraries, some at startup, some dynamically
at run-time. Nothing monolithic there. POSIX only requires that
-lc include the necessarily functionality for the POSIX API, it
doesn't require that the implementation use a single library/shared
object at run-time.

Both GCC and CLANG do just as you describe here:

>Though, it seems like a different kind of strategy could be possible:
>Split compilers into frontends and backends (which may exist as
>semi-independent projects).
>
>Frontend deals with source languages, compiles down to a common IR (with
>the IR taking the place of object files);
>Backend compiles this to the actual machine code, and does any linking,
>before emitting the actual binary.

See LLVM.

Scott Lurndal

unread,

Dec 31, 2023, 1:40:40 PM12/31/23

to

Bart <b...@freeuk.cm> writes:
>On 31/12/2023 15:25, David Brown wrote:

>> I realise that you (and possibly others) might find it useful for a tool
>> to replace typedef identifiers with their definitions, but it could only
>> be done for some cases, and is not as simple as macro substitution.
>
>Take this program, which uses two nested typedefs and one macro:
>
> typedef short T;
> typedef T U;
> #define V U
>
> typedef struct R {
> V a, b, c;
> } S;
>
>Passed through 'gcc -E', it manages to expand the V in the struct with
>U. (-fdirectives-only doesn't even do that).
>
>So what are the types of 'a, b, c'? Across 1000s of line of code, they
>may need tracking down. At least, for someone not using your super-duper
>tools.

Perhaps you need to use better names than V, a, b, and c.

Kaz Kylheku

unread,

Dec 31, 2023, 1:44:54 PM12/31/23

to

On 2023-12-31, Bart <b...@freeuk.cm> wrote:

> Take this program, which uses two nested typedefs and one macro:
>
> typedef short T;
> typedef T U;
> #define V U
>
> typedef struct R {
> V a, b, c;
> } S;
>
> Passed through 'gcc -E', it manages to expand the V in the struct with
> U. (-fdirectives-only doesn't even do that).
>
> So what are the types of 'a, b, c'? Across 1000s of line of code, they
> may need tracking down. At least, for someone not using your super-duper
> tools.

Super duper tools like, oh, Exuberant Ctags from 2011, packaged in Ubuntu:

$ ctags --version
Exuberant Ctags 5.9~svn20110310, Copyright (C) 1996-2009 Darren Hiebert
Addresses: <dhie...@users.sourceforge.net>, http://ctags.sourceforge.net
Optional compiled features: +wildcards, +regex

We run that and then super-duper editor Vim will use the tags
file to jump to the definition of V, and of U.

Bart

unread,

Dec 31, 2023, 2:23:42 PM12/31/23

to

On 31/12/2023 17:26, Kaz Kylheku wrote:
> On 2023-12-31, Bart <b...@freeuk.cm> wrote:
>> and file handles. With DLL, a pointer malloc-ed in the host cannot be
>> freed within the DLL and vice versa.)
>
> What??? All DLLs are in the same address space. malloc and free are
> sister functions that typically live in the same DLL, and don't
> care what calls them.
>
> Maybe you're referring to one DLL's free not being able to handle
> pointers produced by another DLL's malloc.

I'm referring to the possibilty that, if host and DLL both import say
msvcrt.dll, that each may have its own instance of msvcrt.dll, with its
own static data. That would also be the case with two DLLs.

But I can't reproduce the kind of error that would cause.

So I was either mistaken, or it's been fixed in the last decade, or
maybe my original test failed for other reasons, eg. because of
statically linked libraries not shared ones, or mixed compilers (and do
libraries) were used.

Bart

unread,

Dec 31, 2023, 2:38:12 PM12/31/23

to

Actually ctags appears to be also part of Windows.

I've played with it and the results are interesting.

I've recently created something vaguely similar for my language, invoked
via the compiler (mm -getst prog) which scans all modules and produces a
list of references (currently only top-level names) to help find them
from my IDE.

I don't know how useful ctags might be in helping extract bindings from
a complex set of API headers, but I've got that covered.

Richard Damon

unread,

Dec 31, 2023, 2:47:01 PM12/31/23

to

Sounds like either something used a static library or forced the system
to link in two different copies of msvcrt.dll (perhaps by incorrectly
including their own in an off-path directory)

This wasn't that uncommon of a problem until people got burned enough by
not following "the rules" that they started to do it right.

Scott Lurndal

unread,

Dec 31, 2023, 3:00:51 PM12/31/23

to

Or the super-duper tool called cscope.

$ find . -name '*.[chCHsS]*' -print |grep -v .svn |sort -u | cscope -q -b -k -f csc -i -

Which the super-duper VIM editor will leverage for both
tags support and cscope query support.

$ vim
:cs add csc
:cs f 1 main

Lawrence D'Oliveiro

unread,

Dec 31, 2023, 4:41:54 PM12/31/23

to

On Sun, 31 Dec 2023 02:06:37 +0000, Bart wrote:

> I have plenty of ideas, but people are generally not interested.

Lessig’s Law: “the one who writes the code, makes the rules”.

Nobody cares about “ideas”. “Ideas” are a dime a dozen. What matters is
execution. Put your “idea” into working code, and that will prove your
point.

Lawrence D'Oliveiro

unread,

Dec 31, 2023, 4:44:37 PM12/31/23

to

On Sun, 31 Dec 2023 16:25:08 +0100, David Brown wrote:

> I realise that you (and possibly others) might find it useful for a tool
> to replace typedef identifiers with their definitions, but it could only
> be done for some cases, and is not as simple as macro substitution.

String-based macros are nothing but trouble. Typedefs are scoped, string
macros are not.

If you want to see the right way to do macros, look at LISP, where they
are token-based, and much more robust as as result. I think they even
manage to apply scoping rules to macro definitions as well.

BGB

unread,

Dec 31, 2023, 4:49:23 PM12/31/23

to

OK.

I had considered a fully custom format at one point, original idea would
have been something vaguely resembling a AR or TAR file, say:
Program Header;
Segment Header;
LZ compressed segment;
Segment Header;
LZ compressed segment;
...

Where say, each segment may have one or more sections, specify a loader
address, and may include additional commands for what to do with it
(such as interpreting it as base relocs rather than being part of the
final image).

Technically, this would have also been along vaguely similar lines to
the Mach-O format.

But ended up opting with a modified PE/COFF as it already had most of
what I wanted (and I was already reasonably familiar with the format).
My case differed slightly in that I didn't need to care about whether
Windows or existing tools could understand the binaries, and so "PEL4"
is sort of its own format in a way as well.

Technically, it still fit what I wanted to do better than what ELF would
have been, though I had ended up tweaking some stuff to allow it to
support loading multiple binaries into the same address space:
All binaries include reloc tables;
The read-only and read/write sections are split up into two different
segments (using the Global Pointer data-directory entry to effectively
define the read/write section, with the Global Pointer pointed to the
start of this segment on program start-up).

I had looked into a few possible compression schemes, and LZ4 gave the
best properties for binaries.

I have a different (byte-oriented) compression scheme that tends to give
better compression for general purpose data compression, but LZ4 seemed
to give better results with executable code in this case.

>
>>> Each DLL exports certain symbols such as the addresses of functions
>>> and variables. So no reason you can't access a variable exported from
>>> any DLL, unless perhaps multiple instances of the same DLL have to
>>> share the same static data, but that sounds very unlikely, as little
>>> would work.
>>>
>>
>> Much past roughly Win9x or so, it has been possible to use
>> "__declspec(dllimport)" on global variables in Windows (in an earlier
>> era, it was not possible to use the __declspec's, but instead
>> necessary to manage DLL import/exports by writing out lists in ".DEF"
>> files).
>>
>> It isn't entirely transparent, but yes, on actual Windows, it is very
>> much possible to share global variables across DLL boundaries.
>>
>>
>> Just, this feature is not (yet) supported by my compiler. Personally,
>> I don't see this as a huge loss (even if it did work; I personally see
>> it as "poor coding practice").
>
> This is a language issue. Or, in C, it is compiler related.
>
> I've never been quite sure how you tell a C compiler to export a certain
> symbol when creating a DLL. Sometimes it just works; I think it just
> exports everything that is not static (it may depend on a compiler
> option too).
>
> And some compilers may need this __declspec business, but I've never
> bothered with it.
>

Yeah. GCC seems to be "export everything".

MSVC needs either __declspec or ".DEF" files.

I am not sure when exactly __declspec started being used for this,
seemingly sometime between "Visual C++ 4.0" and "Visual Studio 2003".
This isn't really documented online to really narrow it down much further.

In my case, I went with using a similar approach to MSVC, namely
explicit export.

Where the normal "extern" storage class is shared between translation
units, but does not cross DLL boundaries.

> Mine just exports all not-static names. So this program:
>
>    int abc;
>    static int def;
>
>    void F(void) {}
>    static void G(void) {}
>
> if compiled as: 'mcc -dll prog', produces a file prog.dll which, if I
> dump it, shows this export table:
>
> Export Directory
>
>     0 00000000        0 Fun F
>     1 00000000        0 Var abc
>
> (There's something in it that distinguishes functions from variables,
> but I can't remember the details.)
>
> In any case, in C it can be hit and miss. In my own language, it is more
> controlled: I used an 'export' prefix to export symbols from a program.
>
> (It also conventiently creates interface files to be able to use the DLL
> library from a program. The equivalent of prog.h for my example
> containing the API needed to use it. Rolling that out to C is not
> practical however as my 'export' applies also to things like types and
> enums.)
>

In my case, there are two major ways of invoking the compiler, say:
bgbcc /Fefoo.dll foo.c
Or:
bgbcc -o foo.dll foo.c

Where the compiler uses the file extension, and if it is DLL, it assumes
you want a DLL:
EXE: EXE file, "PBO ABI", fully relocatable.
DLL: DLL file, "PBO ABI", fully relocatable.
SYS: Bare-metal EXE, ABI more like traditional Win32 EXEs.
BIN: ROM image, no EXE headers, no relocs, ...
RIL: RIL Bytecode
OBJ: RIL Bytecode
O: Also RIL Bytecode
S: ASM output.

If no output file is given, it looks at whether it is trying to mimic
GCC style command-line behavior:
No: Assume "foo.exe" as default output.
Yes: Assume "a.exe" as default output.

Where, say, for the GCC-like mode:
-o <name> Output file.
-c Compile only
-E Preprocess only
-S ASM only.
-I<path> Add include path
-L<path> Add library path
-S<path> Add source path (excludes '-S' by itself)
-l<name> Add library
-D<name>[=<value>] #define something
-W<opt> Warning option
-m<tgt> Specify target machine
-f<opt> Specify target option/flag
-O<opt> Specify optimizer option.
-Z<opt> Specify debug option.
-g<opt> Also specify debug option.
...

For libraries, it checks the library path, where for "-l<name>" it will
look for:
lib<name>.<arch>.ril
lib<name>.ril
Assuming static linking in this case (the handling for DLLs is a little
different).

Note that, sort of like with MSVC, any debug data is dumped into
external files, and is not held within the EXE itself. Thus far, it is
fairly limited, mostly as a big ASCII text file with with a vaguely
similar structure to "nm" output.

Note for -E and -S, if no output file is specified, output is dumped to
stdout; and a bare '-' option indicates to read the input from stdin.
This was also partly to mimic GCC behavior.

Technically, to mimic GCC, for Linux and similar, it also symlinks other
tool names back to the BGBCC binary:
bjx2-pel-cc
bjx2-pel-gcc
bjx2-pel-ld
bjx2-pel-as
bjx2-pel-ar
...

Where, if it is called with a name in this form, it assumes that it
needs to try to emulate the respective command-line interface and
behavior (but, this part is still fairly incomplete).

Technically, this 'ar' is very nonstandard and currently can't entirely
emulate the standard behaviors.
But, apart from things like 'ar -c libname.a objfile*', the 'ar' tool
doesn't see much use (so its inability to be used to incrementally
update contents or similar is mostly N/A; could in theory add proper
support for '.a' files if it became an issue).

In this case, the main compiler binary functions like a sort of hydra
that takes over the roles of the entirety of "binutils".

>>> This [somes] my experience of software originating in Linux. This is
>>> why Windows had to acquire CYGWIN then MSYS then WSL. You can't build
>>> the simplest program without involving half of Linux.
>>
>> Yes, and it is really annoying sometimes.
>>
>>
>> For the most part, Linux software builds and works fairly well... if
>> one is using a relatively mainline and relatively up-to-date Linux
>> distro...
>>
>>
>> But, if one is not trying to build in or for a typical Linux style /
>> GNU based userland; it is straight up pain...
>>
>> Like, typically either the "./configure" script is going to go down in
>> a crap-storm of error messages (say, if the shell is not "bash", or
>> some commands it tries to use are absent or don't accept the same
>> command-line arguments, etc); or libraries are going to be missing; or
>> the build just ends up dying due to compiler errors (say, which
>> headers exist are different, or their contents are different, ...).
>
> ./configure is an abomination anyway; I've seen 30,000-line scripts
> which take forever to run, and test things like whether 'printf' is
> supported.
>

Yeah, and it is seemingly a bit of an uphill battle to try to make it
work in any environment that is not "GNU userland with GCC".

In the case of Clang, it seems to actively lie about its identity to try
to make configure and similar willing to accept it.

> But the biggest problem with them is when someone expects a Windows user
> to use that same build process. Of course, ./configure is a Bash script
> using Linux utilities.
>
> It's like someone providing a .BAT file and expecting Linux users to do
> something with it.
>

Yes.

For simple programs, one-liner ".bat" or ".sh" files are fairly effective.

And, then, "Makefile.tgt" or similar for more involved cases.

A lot of the more complex build systems are often either unnecessary, or
indicate a more fundamental problem with the program and its dependency
management (along with other annoyances, like indirectly making
"perl"/"python"/"nodejs"/etc effectively prerequisites to get the
program built).

>>
>> Within the code itself, it often doesn't take much looking to find one
>> of:
>>    Pointer arithmetic on "void *";
>>    Various GCC specific "__attribute__((whatever))" modifiers;
>>    Blobs of GAS specific inline ASM;
>>    ...
>>
>>
>> Whereas in more cross-platform code, one will usually find stuff like:
>> #ifdef __GNUC__
>>    ... GCC specific stuff goes here ...
>> #endif
>> #ifdef _MSC_VER
>>    ... MSVC specific stuff goes here ...
>> #endif
>> ...
>
> Those conditional blocks never list my compiler, funnily enough. (#ifdef
> __MCC__ will do it.)
>
>

I didn't list my own either, which is using __BGBCC__, ...

But, in practice, I can often partly overlap the __BGBCC__ and _MSC_VER
blocks, as a lot of the dialect-specific functionality is closer to MSVC
than GCC (but does support some GCC extensions as well).

There were some differences, like I ended up aligning with GCC and
making it so that "sizeof(long)==sizeof(void *)" rather than
"sizeof(long)==4" for 64-bit targets.

Summarized history:
~ 2001: (During high-school) Wrote a Scheme interpreter.
~ 2003: Started writing the first BGBScript interpreter.
This was around the end of high-school for me.
This interpreter used XML DOM for the ASTs
And AST walking for the interpreter.
It was dead slow...
The language design somewhat resembled JavaScript / ES3.
~ 2006: Rewrote BGBScript interpreter.
Reused much of the core of the Scheme interpreter as a base.
Started gluing on features from ActionScript.
Went over to a bytecode interpreter.
Started experimenting with JIT.

~ 2007:
First BGBCC was written, as a fork off the 2003 BGBScript.
Idea was to try to allow using C as a scripting language.
But, C was not a good scripting language...
BGBCC was repurposed as an FFI generator for BGBScript.
Still used XML-DOM based ASTs, with a Stack-Machine IR.

~ 2008-2013:
Wrote a 3D engine that was originally Doom3 like
Was using some Half-Life based file-formats (for maps/models/etc).
Was using dynamic Phong lighting and stencil shadows (like Doom3).
But, then shifted to copying Minecraft (with a Doom3 style renderer)
Its performance and memory usage was "not good"...

~ 2014: Made BGBScript2 VM
This was a redesign of BGBScript made to more resemble Java and C#.
Simplified some stuff, and made it primarily static typed.
Used stack-machine bytecode
Translated into 3AC traces for interpretation.
This strategy was a lot faster than direct interpretation.
Architecturally, it was similar to the Java JVM.

~ 2015/2016: Made a 2nd Minecraft like 3D engine
Was written in a mixture of C and BS2.
Core engine was C, most game code was BS2.
Was intended to be simpler/faster/lighter than its predecessor.

~ 2016: Started taking an interest in ISA design stuff.
BGBCC was revived, and was made to target SuperH / SH-4.
Ended up going with the WinCE PE/COFF variant for binaries.
Was also using GCC built for SH-4 / PE-COFF as well.
This mutated into my "BJX1" ISA, which was a modified SH-4.
Though, BJX1 turned into a horrid mess.

~ 2018:
The ISA design was rebooted into BJX2.
Basically, a new encoding scheme that was "less horrible".
The new ISA could mostly reuse the old ASM with minor tweaks.
The compiler backend was partly reworked for the new ISA encoding.
But, most of the compiler backend was copy/pasted from BJX1.

~ 2019-present:
The BJX2 effort had continued and expanded somewhat.
The ISA design has mutated a fair bit since it started.
My compiler's backend has, however, turned into a horrible mess.

Not much reason to target BGBCC to mainstream ISAs:
MSVC, GCC, CLang, etc, do well enough...

Some past small scale experiments trying to generate native code on ARM
performed horribly. It seems like, unless I could fix some of the issues
that still plague code generation for my own ISA, there is basically no
hope of being able to compete with GCC on ARM (which seemed to be not
particularly forgiving of crappy code generation, at least on the A53
and A55).

In some ways, BGBCC is a little bit of a throwback vs my BS2VM
(BGBScript2 VM design).

BGBCC:
Originally XML based ASTs.
Organized in linked-lists;
Using strings for node/attribute names;
...
Now, object-based ASTs faking the original XML-based ASTs.
No more string pointers for tag/attribute names.
The Bytecode was mostly unstructured.
Loading the bytecode is effectively a purely linear process.
You run the stack model, and the ops build all the 3AC and metadata.

BS2VM:
Object-based ASTs (conceptually JSON-like);
The bytecode uses a TLV based container format for bytecode.
Stuff is organized into sections and tables.
The metadata has an actual structure.

At present, in both cases, the ASTs use a similar structure internally:
Key-value pairs, with 16-bit keys and 64-bit values.
BGBCC uses type-tagged keys, BS2VM used a different tagging scheme.
Each node holds up to a fixed number of key/value pairs.
If this limit is exceeded, the nodes break-up B-Tree style.
Currently, this limit is 8, with one balancing for memory use.

At one point I did make a 3rd Minecraft-like 3D engine, but mostly
because the prior engine was still too heavyweight to run on an FPGA
board (and I wanted "something" that could run).

Say, my 2nd 3D engine needed around 256MB of RAM to work.

But, the FPGA board I was using has 128MB of RAM, and realistically
going much over 48-64MB of memory use is "seriously pushing it".

So, there was a bunch of effort trying to manage to make a small
"basically functional" Minecraft like 3D engine be able to fit into
around 40MB or so of RAM.

Was mostly successful, at least assuming one doesn't go out far enough
that it is generating new chunks (which somewhat increases its RAM
requirements).

Had a major difference from the second engine in how it managed world
drawing:
Second engine:
Figure out potentially visible chunks (16x16x16 blocks);
Build a vertex array for every potentially visible chunk;
Draw all the visible chunks.
Third engine:
Do spherical raycasts from the camera position;
Build a list of block faces that a ray had hit;
Draw all of the block faces into a vertex array;
Draw the vertex array.

Both engines ended up still using 16x16x16 block chunks, however:
2nd engine had 16x16x16 chunk regions (256x256x256 blocks);
3rd engine had 8x8x8 chunk regions (128x128x128 blocks).
Both used a similar scheme for chunks:
Single block-type, chunk block no data;
2-16 block-types, 4 bits per block;
17-256 block types, 8 bits per block;
257+:
Raw 32-bit block entries (2nd engine)
Unsupported (3rd engine).

With a block layout sorta like, say:
( 7: 0): Block Type
(11: 8): Block Attribute
(15:12): Sky Light (15 if direct view of the sky)
(19:16): Block Light Intensity
(23:20): Block Light Color
(31:24): Depends on engine (eg, block flags).

Where, most of the chunks have fewer than 16 unique blocks (sky being
purely air at a constant sky-light=15; underground mostly solid stone at
sky-light=0, ...). Where, say, when rebuilding the vertex array, the
sky-light level is multiplied with the light-level of the sky (based on
a day/night cycle) with the block-light intensity and color being added
on, to give the final face vertex color.

Though, one big tradeoff is that the computational cost of the
third-engine's strategy scales very poorly with draw distance.

And, unlike Wolfenstein3D or ROTT, the number of rays needed to fully
cover the screen with a ray sweep is impractical for 3D:
Wolf3D/ROTT: 320 ray sweeps, in 2D;
Minecraft like:
2000 if we disregard block-faces smaller than 4 pixels (at 320x200).

Though, one can reduce the ray-sweep density by applying a random jitter
to the rays and discarding any faces that haven't been hit recently.

I had used a full spherical sweep rather than a frustum sweep, as with
an asynchronous "run the ray sweep at 4 times per second or so", a
frustum sweep will result in big holes in the world whenever one turns
the camera. With a spherical sweep, everything is already there (so
looking around doesn't result in big ugly holes being visible), but does
lessen the amount of rays one can cast in the forward direction, along
with increasing the number of visible block-faces (since the block-faces
are still being processed even if they are outside the area being looked
at).

Partly to limit both costs, and required ray density, the rays will
simply stop after a certain distance (if it doesn't hit anything).

Note that if one sets a limit of, say, 16000 block faces, then this also
sets an upper limit on how much memory they need for the vertex arrays
(which is also somewhat less than the memory required to build full
vertex arrays for every chunk within the current draw distance).

Though, for a slow periodic update and small draw distance, it is
possible to outrun the visible part of the terrain (as the raycast and
vertex arrays lag behind the current position of the camera); along with
temporary holes opening up whenever previously occluded areas come into
view. These would be less of an issue with a faster raycast update
though (say, 10 or 15Hz), but, on a 50MHz CPU, this is asking a lot.

Didn't really make any interesting game out of this, it was more of a
technical experiment than anything else.

...

Chris M. Thomasson

unread,

Dec 31, 2023, 4:51:49 PM12/31/23

to

Fwiw, check this out:

https://github.com/rofl0r/chaos-pp

:^)

Bart

unread,

Dec 31, 2023, 5:01:02 PM12/31/23

to

That's not the case, it just happened to be bundled with Windows.

It's a 1.25MB file so fairly 'super', compared with my stuff. The option
I said I'd added to my compiler was about 70 lines of extra code, but it
looks to be a bit more sophisticated as the compiler has done the hard
work, and it just dumps parts of the symbol table.

>> We run that and then super-duper editor Vim will use the tags
>> file to jump to the definition of V, and of U.

How does Vim know where in the file to look? The TAGS files I've managed
to produce doesn't have that info, and I can't see anything in the help
to add it.

How do you get it to look inside header files, or do those have to be
submitted manually?

Tim Rentsch

unread,

Dec 31, 2023, 5:46:00 PM12/31/23

to

Spiros Bousbouras <spi...@gmail.com> writes:

[... on #define _BSD_SOURCE, etc ...]

> By the way , this kind of question is more appropriate for
> comp.unix.programmer .

For what it's worth, I found the discussion valuable. I
look at comp.unix.programmer only sporadically, so I'm
glad it was posted here.

BGB

unread,

Dec 31, 2023, 6:00:09 PM12/31/23

to

On 12/31/2023 12:37 PM, Scott Lurndal wrote:
> BGB <cr8...@gmail.com> writes:
>> On 12/30/2023 12:51 AM, Blue-Maned_Hawk wrote:
>>> Bart wrote:
>>>
>>>> Why not have a dedicated header file that is the specific to a
>>>> particular version of a C compiler for a given platform? That it can be
>>>> streamlined for that purpose.
>>>
>>> In fact, this is similar to exactly what Plan 9 did: for include files
>>> that had arch-dependent content, it'd have a separate version of them for
>>> each arch, with an arch's versions of headers like these all stored in
>>> their own directory.
>>>
>>
>> Yeah, we don't actually need big monolithic C libraries, nor big
>> monolithic C compilers, ...
>
> What monolithic C libraries are you referring to? My application
> links with a several dozen libraries, some at startup, some dynamically
> at run-time. Nothing monolithic there. POSIX only requires that
> -lc include the necessarily functionality for the POSIX API, it
> doesn't require that the implementation use a single library/shared
> object at run-time.
>

Not in terms of "one C library has all the code" but rather, "everything
is expected to use the same C library" (such as GLIBC on Linux).

Granted, there are (sort of) Newlib and Musl as alternative libc's.

>
> Both GCC and CLANG do just as you describe here:
>
>> Though, it seems like a different kind of strategy could be possible:
>> Split compilers into frontends and backends (which may exist as
>> semi-independent projects).
>>
>> Frontend deals with source languages, compiles down to a common IR (with
>> the IR taking the place of object files);
>> Backend compiles this to the actual machine code, and does any linking,
>> before emitting the actual binary.
>
> See LLVM.
>

LLVM bitcode would be kind of a pain though to deal with in anything
that is not LLVM (exposes too many aspects of LLVM in its design, and
not really documented in a way that makes sense outside the context of
LLVM).

My own preference here would be for something more like .NET CIL, but
adapted to make more sense for a language like C (granted, compiling C
to .NET CIL is still less bad than some of the other options, such as
JVM/Flash/WASM/... which generally involved needing to go through a lot
of awkward contortions).

Main drawback of LLVM is mostly that it is a pain to recompile from
source, and writing a new backend for LLVM would presumably require
rebuilding it frequently, which is not an attractive option if it takes
30 minutes or so for each rebuild.

As for GCC, it seems not so much to split the compiler into a frontend
and backend; so much as to split and scatter each target across all of
binutils (so, would need to poke at LD, GAS, BFD, ... at each step
adding awareness of the specific aspects of the target relevant to the
component in question).

So, adding a target like mine would likely require touching pretty much
every part of the toolchain, which is already well into MLOC territory.

Well, and in this case, GCC is more like a confederation of tools and
libraries.

Rather than there being any obvious "backend module":
Well, say, if one imagines the compiler backend as sort of like a video
codec module, where one gives it inputs and configures the desired
operation and output using FOURCC's and similar ("Here is the 3AC and
metadata for a program; give me the output as a DLL"...).

GCC just doesn't work this way:
Like, it more goes as layers:
C -> preprocessor output;
PP-out -> AST -> GIMPLE;
GIMPLE -> ASM;
ASM -> OBJ;
Pass OBJ's to LD;
...
With, generally, each part of the process existing as multiple binaries
and using temporary files to move data around.

Granted, when I started my project, I didn't expect that my compiler
would end up being 250 kLOC, and there are a few things I did that were
obvious design missteps in retrospect.

In particular, I would prefer cleaner separation between internal
subsystems and stages. I had intended to address some issues in an
attempt at a new ground-up compiler design, but this effort stalled.

...

Lawrence D'Oliveiro

unread,

Dec 31, 2023, 6:46:23 PM12/31/23

to

On Sun, 31 Dec 2023 02:18:25 +0000, Bart wrote:

> This somes my experience of software originating in Linux. This is why
> Windows had to acquire CYGWIN then MSYS then WSL. You can't build the
> simplest program without involving half of Linux.

On Linux, we have package managers that only pull in the needed
dependencies. Windows just seems actively hostile to that kind of
infrastructure management. If you meant “you can’t build the simplest
program *on Windows* without involving half of Linux” ... well, that’s
just a reflection on the deficiencies of Windows. On Linux, you already
have the *whole* of Linux to start with.

Keith Thompson

unread,

Dec 31, 2023, 7:03:43 PM12/31/23

to

Bart <b...@freeuk.cm> writes:
> On 31/12/2023 19:37, Bart wrote:

[...]

>> Actually ctags appears to be also part of Windows.
>
> That's not the case, it just happened to be bundled with Windows.

I'd be very surprised if ctags were "bundled with Windows".

On my Windows system, I have a ctags command that's part of OpenWatcom,
which I apparently installed several years ago.

The most common version on Linux-like systems is "exuberant-ctags".
The OpenWatcom version appears to be a different implementation.

There are also versions available via Cygwin and WSL. Perhaps you
wouldn't be interested in those.

I don't know what version you have.

[...]

> How does Vim know where in the file to look? The TAGS files I've
> managed to produce doesn't have that info, and I can't see anything in
> the help to add it.
>
> How do you get it to look inside header files, or do those have to be
> submitted manually?

For the "exuberant" version, you invoke it with the names of the
files you want to index, and the "tags" file includes line numbers.
The OpenWatcom version appears to generate search patterns rather
than line numbers, which makes the tags file more stable as files
are edited. I don't know what version you have.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

unread,

Dec 31, 2023, 7:07:51 PM12/31/23

to

Janis Papanagnou <janis_pap...@hotmail.com> writes:

> This is a CPP question that arose last month. It's not about an
> actual issue with the software, just out of curiosity and to be sure
> it works reliable (it seemingly does).
>
> In a C99 program on Linux (Ubuntu) I intended to use usleep() and
> then also strnlen().
>
> When I added usleep() and its include file I got an error and was
> asked to define the CPP tag '_BSD_SOURCE'. I did so, and because I
> wanted side effects of that tag kept as small as possible I
> prepended it just before the respective #include and put it at the
> end of my #include list
>
> ...other #includes...
> #define _BSD_SOURCE
> #include <unistd.h>
>
> But as got obvious *that* way there had been side-effects and I
> had to put the tag at the beginning of all include files (which
> astonished me)
>
> #define _BSD_SOURCE
> #include <unistd.h>
> ...other #includes here...
>
> For the strnlen() function I needed another CPP tag, '_GNU_SOURCE'.
> So now I have both CPP tag definitions before the includes

I second the recommendations of Lowell Gilbert and others not to
define _BSD_SOURCE or _GNU_SOURCE (especially not _GNU_SOURCE)
but instead seek alternatives, which are readily available for
the two functionalities being sought in this case.

> #define _GNU_SOURCE /* necessary for strnlen() in string.h */
> #define _BSD_SOURCE /* necessary for usleep() in unistd.h */
> ...all #includes here...

For strnlen(), put an inline definition in a header file:

#ifndef HAVE_strnlen_dot_h_header
#define HAVE_strnlen_dot_h_header

#include <stddef.h>

static inline size_t
strnlen( const char *s, size_t n ){
extern void *memchr( const void *, int, size_t );
const char *p = memchr( s, 0, n );
return p ? (size_t){ p-s } : n;
}

#include <string.h>

#endif

Disclaimer: this code has been compiled but not tested.

(If you want you could call this header file "string.h" and do
a #include "string.h". I'm not advocating doing that, just
pointing it out as an alternative. I expect some people like the
idea, and others dislike it, and I don't want to get involved in
a style war.)

For usleep(), define an alternate function usnooze(), to be used
in place of usleep(). In header file usnooze.h:

extern int usnooze( unsigned long long );

In source file usnooze.c:

#ifndef _POSIX_C_SOURCE
#define _POSIX_C_SOURCE 199309L
#endif

#include "usnooze.h"

#include <errno.h>
#include <time.h>

int
usnooze( unsigned long long snooze_time_in_microseconds ){
typedef unsigned long long ULL;
typedef struct timespec TS;

ULL seconds = snooze_time_in_microseconds / 1000000;
ULL nanoseconds = snooze_time_in_microseconds % 1000000 * 1000;
TS need = { .tv_sec = seconds, .tv_nsec = nanoseconds };
TS more;
int rc;
unsigned most = 1000;

while( errno = 0, rc = nanosleep( & need, & more ), rc != 0 ){
if( errno != EINTR || most-- == 0 ) return -1;
need = more;
}

return 0;
}

The point of putting the definition of usnooze() in a separate file
is so the needed #define _POSIX_C_SOURCE doesn't contaminate any
more program source than it has to.

Lawrence D'Oliveiro

unread,

Dec 31, 2023, 7:12:28 PM12/31/23

to

> [dead project with spam link omitted]

Looks long defunct, which is just as well.

I really did give string-based macros a try. I thought it was just down to
deficiencies in the C/C++ preprocessor. So I did some work with GNU M4,
which is about as powerful a string macro processor as you could want. And
instead of being better, it was just as troublesome and fragile.

The watchword now is “homoiconicity”. This means that the program has an
AST representation in terms of objects in the language itself. LISP has
this very naturally. And all your macro-type manipulations are done on
this intermediate representation, not on the original program source.

I found a way to do it Python, too, which I used for this project
<https://gitlab.com/ldo/seaskirt> to generate both synchronous and
asynchronous versions of the same API classes from common code.

Keith Thompson

unread,

Dec 31, 2023, 7:36:57 PM12/31/23

to

strnlen() is specified by POSIX. It might make sense to re-implement it
if your code needs to work on a non-POSIX system (that doesn't also
provide it). Why would you want to do so otherwise?

memchr() is declared in <string.h>. Why would you duplicate its
declaration rather than just using `#include <string.h>`?

[...]

> For usleep(), define an alternate function usnooze(), to be used
> in place of usleep(). In header file usnooze.h:

[snip]

If your code doesn't need to be portable to systems that don't provide
usleep(), you can just use usleep(). If it does, its probably better to
modify the code so it uses nanosleep().

[...]

Bart

unread,

Dec 31, 2023, 8:33:53 PM12/31/23

to

And developers feel it necessary to USE everything that it provides!

I've never managed to build the GMP library on Windows for example (it
only comes as source code), because it requires that 30,000-line bash
script which in turn needs sed and m4 and all the rest.

Why? It's a numeric library. Why should it be dependent on OS?

Or maybe Linux developers NEED all that hand-holding and have no idea
how to build using a bare compiler. Remember that end-users building
such projects are only doing a one-time build to get a working binary.

I have sometimes made my own non-C projects available on Linux. I did
that by transpiling to a single C source file. All you then need to
build it is a C compiler. It would be almost as easy as building
hello.c, except I suspect some don't even know that, since they are used
to 'make'.

> On Linux, we have package managers that only pull in the needed
> dependencies. Windows just seems actively hostile to that kind of

Yeah, I've seen them in action. Sometimes they work, sometimes not. On a
Raspberry Pi, one 'apt-get install' went on for 90 minutes, then it
crashed. Fortunately by then I'd forgotten what it was that I'd been
trying to do.

It seems peculiar to me that you take what I see as a drawback -
excessive, gratuitous dependencies which also make the build process
more complex and slower - and somehow turn that into an advantage.

So any platform that doesn't include all that pointless crap must be at
fault, rather than the developer who is inflicting those needless
dependencies.

Lawrence D'Oliveiro

unread,

Dec 31, 2023, 9:00:24 PM12/31/23

to

On Mon, 1 Jan 2024 01:33:38 +0000, Bart wrote:

> On 31/12/2023 23:46, Lawrence D'Oliveiro wrote:
>>
>> If you meant “you can’t build the simplest
>> program *on Windows* without involving half of Linux” ... well, that’s
>> just a reflection on the deficiencies of Windows. On Linux, you already
>> have the *whole* of Linux to start with.
>
> And developers feel it necessary to USE everything that it provides!

It’s called “code reuse”. A well-designed package-management system just
makes it so much easier to do.

> I've never managed to build the GMP library on Windows for example (it
> only comes as source code), because it requires that 30,000-line bash
> script which in turn needs sed and m4 and all the rest.
>
> Why? It's a numeric library. Why should it be dependent on OS?

Those are just standard file-manipulation tools that any decent OS should
provide.

> Or maybe Linux developers NEED all that hand-holding and have no idea
> how to build using a bare compiler.

If only you could do that on Windows ... but no. Look at all the C runtime
stuff needed just to build a simple “Hello World” program ... because
Windows automatically assumes that every program must have a GUI.

> Remember that end-users building
> such projects are only doing a one-time build to get a working binary.

They find it easier to do “apt-get install”, or a GUI wrapper around same,
like Synaptic.

Tim Rentsch

unread,

Dec 31, 2023, 9:31:16 PM12/31/23

to

I'm trying to provide a helpful answer to the person I was
responding to, not espouse a philosophical viewpoint. Why do you
feel the need to start a style debate?

> memchr() is declared in <string.h>. Why would you duplicate its
> declaration rather than just using `#include <string.h>`?

I had a specific reason for writing the code the way I did.
It wasn't important to explain that so I didn't.

>> For usleep(), define an alternate function usnooze(), to be used
>> in place of usleep(). In header file usnooze.h:
>
> [snip]
>
> If your code doesn't need to be portable to systems that don't
> provide usleep(), you can just use usleep(). If it does, its
> probably better to modify the code so it uses nanosleep().

Not everyone agrees with that opinion. Again, I'm just trying to
provide an answer helpful to OP, not advance an agenda. Like I
said in the part of my posting that you left out, I don't want to
get involved in a style war. If OP wants to modify his code to
use nanosleep(), I'm fine with that. If want wants to keep using
usleep() or switch to using usnooze(), I'm fine with that too. I
think it's more important in this case to provide options than to
try to change someone's point of view.

Tim Rentsch

unread,

Dec 31, 2023, 9:33:08 PM12/31/23

to

Keith Thompson <Keith.S.T...@gmail.com> writes:

> Janis Papanagnou <janis_pap...@hotmail.com> writes:

> [...]
>
>> BTW, is 'inline' meanwhile C standard? (I know that from C++ but
>> haven't done much C for long now.)
>
> C added inline in C99 (the 1999 edition of the ISO C standard, the same
> one that removed implicit int).
>
> I think C and C++ have subtly different semantics for inline.

I'm not a C++ expert, but as best I can tell C and C++ have
markedly different semantics for what 'inline' does.

Kaz Kylheku

unread,

Dec 31, 2023, 9:59:05 PM12/31/23

to

On 2023-12-31, Bart <b...@freeuk.cm> wrote:
> On 31/12/2023 19:37, Bart wrote:
>> Actually ctags appears to be also part of Windows.
>
> That's not the case, it just happened to be bundled with Windows.

What??? By what system vendor?

You probably installed something that pulled in ctags.

Ctags is of no use to the average Windows user, and I can practically
guarantee you Microsoft has zero interest in it.

Microsoft has replaced this sort of thing with the Language Server
Protocol system, which is integrated into Visual Studio Code and all
sots of language compilers and run times. (Speaking of which, even VSS
doesn't come with Windows, to my knowledge.)

> How does Vim know where in the file to look? The TAGS files I've
> managed > to produce doesn't have that info, and I can't see anything
> in the help to add it.

Entries look like this:

addrinfo stdlib/socket.tl /^(defstruct addrinfo nil$/;" s

symbol[Tab]path[Tab]search command

The ;" is the start of a comment. The comment field contains extension
fields. The letter s is being used to indicate that the identifier is
the name of a structure (a Lisp one, in this case).

If the search command is a regex search, that is advantageous; the
tags file remains usable even when file contents are changing.
It's possible to use hard coded line numbers.

In Vim you can type :help tags-file-format to get documentation about
this.

> How do you get it to look inside header files, or do those have to be
> submitted manually?

ctags -R will process a tree recursively, including any C header files.
However, things that are only declared and not defined in header files,
like function declarations, are not indexed.

Emacs uses a different tag format, by the way, which I referred to
as etags, I think.

In the TXR Lisp ctags generator I used some tricks. Look at this
entry; it relates to the above addrinfo. canonname is a slot in
this structure:

canonname stdlib/socket.tl /^(defstruct addrinfo nil$/;/canonname/;" m struct:addrinfo

The search command contains two searches! First it searches for the
defstruct line. Then from that point it searches forward for canonname.

This is very important. That file contains two structures which have
a canonname slot; a sockaddr structure also has one:

canonname stdlib/socket.tl /^(defstruct sockaddr nil$/;/canonname/;" m struct:sockaddr

Vim will give you both choices and let you pick:

:tj canonname
canonname canonname_s
# pri kind tag file
1 F canonname socket.c
?
canonname_s = intern(lit("canonname"), user_package);
2 F m canonname stdlib/socket.tl
struct:sockaddr
(defstruct sockaddr nil$/;/canonname
3 F m canonname stdlib/socket.tl
struct:addrinfo
(defstruct addrinfo nil$/;/canonname

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazi...@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

Keith Thompson

unread,

Dec 31, 2023, 10:09:21 PM12/31/23

to

I don't. I simply asked you a question. You've refused to answer
it, and I won't waste my time asking again.

>> memchr() is declared in <string.h>. Why would you duplicate its
>> declaration rather than just using `#include <string.h>`?
>
> I had a specific reason for writing the code the way I did.
> It wasn't important to explain that so I didn't.

Unsurprisingly, you refuse to do so even when asked directly.

>>> For usleep(), define an alternate function usnooze(), to be used
>>> in place of usleep(). In header file usnooze.h:
>>
>> [snip]
>>
>> If your code doesn't need to be portable to systems that don't
>> provide usleep(), you can just use usleep(). If it does, its
>> probably better to modify the code so it uses nanosleep().
>
> Not everyone agrees with that opinion. Again, I'm just trying to
> provide an answer helpful to OP, not advance an agenda. Like I
> said in the part of my posting that you left out, I don't want to
> get involved in a style war. If OP wants to modify his code to
> use nanosleep(), I'm fine with that. If want wants to keep using
> usleep() or switch to using usnooze(), I'm fine with that too. I
> think it's more important in this case to provide options than to
> try to change someone's point of view.

As usual, you vaguely denigrate my opinion without sharing your own.
It's boring.

Keith Thompson

unread,

Dec 31, 2023, 10:18:39 PM12/31/23

to

Kaz Kylheku <433-92...@kylheku.com> writes:
> On 2023-12-31, Bart <b...@freeuk.cm> wrote:

[...]

>> How do you get it to look inside header files, or do those have to be
>> submitted manually?
>
> ctags -R will process a tree recursively, including any C header files.
> However, things that are only declared and not defined in header files,
> like function declarations, are not indexed.

The exuberant-ctags version I have on Ubuntu and Cygwin doesn't seem to
have an option to process a tree recursively, or to deal with
directories at all. (I find that a little surprising; it would be
useful functionality.) The -R option is documented as follows:

-R, --no-regex
Don't do any more regexp matching on the following files. May
be freely intermixed with filenames and the --regex option.

I haven't used ctags a lot, but I think that something like:
ctags **/*.[ch]
if your shell supports the "**" recursive wildcard, or:
find . -type f -name '*.[ch]' | xargs ctags -a
where "-a" tells it to append to the tags file.

If in doubt, consult the documentation for the version of ctags you're
using.

Chris M. Thomasson

unread,

Dec 31, 2023, 11:06:51 PM12/31/23

to

Check this out:

https://vcpkg.io/en/

;^)

Janis Papanagnou

unread,

Dec 31, 2023, 11:39:06 PM12/31/23

to

On 01.01.2024 04:18, Keith Thompson wrote:
> Kaz Kylheku <433-92...@kylheku.com> writes:
>> On 2023-12-31, Bart <b...@freeuk.cm> wrote:
> [...]
>>> How do you get it to look inside header files, or do those have to be
>>> submitted manually?
>>
>> ctags -R will process a tree recursively, including any C header files.
>> However, things that are only declared and not defined in header files,
>> like function declarations, are not indexed.
>
> The exuberant-ctags version I have on Ubuntu and Cygwin doesn't seem to
> have an option to process a tree recursively, or to deal with
> directories at all. (I find that a little surprising; it would be
> useful functionality.) The -R option is documented as follows:
>
> -R, --no-regex
> Don't do any more regexp matching on the following files. May
> be freely intermixed with filenames and the --regex option.

Does it have a '--recurse' option?

My Ubuntu version says

-R Equivalent to --recurse.

(version Exuberant Ctags 5.9~svn20110310)

Janis

> [...]

Lawrence D'Oliveiro

unread,

Dec 31, 2023, 11:48:37 PM12/31/23

to

How wonderful. From Microsoft, yet it is only for C/C++? Not even
supporting Microsoft’s flagship language, C#?

Janis Papanagnou

unread,

Dec 31, 2023, 11:56:53 PM12/31/23

to

On 01.01.2024 01:07, Tim Rentsch wrote:
> [...]

Thanks for your post and suggestions.

Some have already been addressed (and some also answered) in
this thread, but that could have easily been overlooked since
the thread (unexpectedly) grew so large (and partly even OT).
Some discussions got a bit heated; but it's not worth. Peace
to all of you for the new year! :-)

Janis

Keith Thompson

unread,

Jan 1, 2024, 1:57:10 AMJan 1

to

Janis Papanagnou <janis_pap...@hotmail.com> writes:
> On 01.01.2024 04:18, Keith Thompson wrote:
>> Kaz Kylheku <433-92...@kylheku.com> writes:
>>> On 2023-12-31, Bart <b...@freeuk.cm> wrote:
>> [...]
>>>> How do you get it to look inside header files, or do those have to be
>>>> submitted manually?
>>>
>>> ctags -R will process a tree recursively, including any C header files.
>>> However, things that are only declared and not defined in header files,
>>> like function declarations, are not indexed.
>>
>> The exuberant-ctags version I have on Ubuntu and Cygwin doesn't seem to
>> have an option to process a tree recursively, or to deal with
>> directories at all. (I find that a little surprising; it would be
>> useful functionality.) The -R option is documented as follows:
>>
>> -R, --no-regex
>> Don't do any more regexp matching on the following files. May
>> be freely intermixed with filenames and the --regex option.

*** I was wrong about this. ***

> Does it have a '--recurse' option?
>
> My Ubuntu version says
>
> -R Equivalent to --recurse.
>
> (version Exuberant Ctags 5.9~svn20110310)
>
> Janis
>
>> [...]

There are actually (at least?) two different versions of ctags available
on Ubuntu. One is provided by the "emacs-bin-common" package (and has
no option for recursion), and one by the "exuberant-ctags" package (and
has "--recurse" and "-R" as you describe).

(The reason for my mistake is that I had built emacs from source and
ended up with an emacs-bin-common version of ctags in my $PATH ahead of
/usr/bin.)

Chris M. Thomasson

unread,

Jan 1, 2024, 1:57:25 AMJan 1

to

On 12/31/2023 4:12 PM, Lawrence D'Oliveiro wrote:
> On Sun, 31 Dec 2023 13:51:28 -0800, Chris M. Thomasson wrote:
>
>> On 12/31/2023 1:44 PM, Lawrence D'Oliveiro wrote:
>>>
>>> If you want to see the right way to do macros, look at LISP, where they
>>> are token-based, and much more robust as as result. I think they even
>>> manage to apply scoping rules to macro definitions as well.
>>
>> Fwiw, check this out:
>>
>> [dead project with spam link omitted]

Spam? How?

>
> Looks long defunct, which is just as well.

The chaos pp lib? It was just an example of some fairly hard core macro
magic. Food for though. That's all. Did Boost stop using it? I never
really got into boost at all.

Chris M. Thomasson

unread,

Jan 1, 2024, 1:59:25 AMJan 1

to

Happy New Year! :^)

Chris M. Thomasson

unread,

Jan 1, 2024, 2:00:30 AMJan 1

to

I don't think it supports C# at all, highly doubt it. Fwiw, have you
heard of Q#? lol!

https://en.wikipedia.org/wiki/Q_Sharp

Lawrence D'Oliveiro

unread,

Jan 1, 2024, 2:00:33 AMJan 1

to

On Sun, 31 Dec 2023 22:57:12 -0800, Chris M. Thomasson wrote:

> It was just an example of some fairly hard core macro
> magic.

String-based macros aren’t “magic”, they’re just sad.

Chris M. Thomasson

unread,

Jan 1, 2024, 2:02:02 AMJan 1

to

Btw, C# can burn in hell for all I care.

Chris M. Thomasson

unread,

Jan 1, 2024, 2:03:16 AMJan 1

to

Fair enough. ;^) Happy new year.

Btw, the chaos pp had some fairly neat tricks.

Chris M. Thomasson

unread,

Jan 1, 2024, 2:06:45 AMJan 1

to

BTW:
________________
#include <stdio.h>

#define A "Happy"
#define B "New"
#define C "Year"

int
main()
{
printf(A " " B " " C "!\n");

return 0;
}
________________

:^D

Kaz Kylheku

unread,

Jan 1, 2024, 3:54:33 AMJan 1

to

On 2024-01-01, Keith Thompson <Keith.S.T...@gmail.com> wrote:
> Kaz Kylheku <433-92...@kylheku.com> writes:
>> On 2023-12-31, Bart <b...@freeuk.cm> wrote:
> [...]
>>> How do you get it to look inside header files, or do those have to be
>>> submitted manually?
>>
>> ctags -R will process a tree recursively, including any C header files.
>> However, things that are only declared and not defined in header files,
>> like function declarations, are not indexed.
>
> The exuberant-ctags version I have on Ubuntu and Cygwin doesn't seem to
> have an option to process a tree recursively, or to deal with
> directories at all. (I find that a little surprising; it would be
> useful functionality.) The -R option is documented as follows:
>
> -R, --no-regex
> Don't do any more regexp matching on the following files. May
> be freely intermixed with filenames and the --regex option.

That's verbatim out of the etags documentation (a tags generator
associated with Emacs), not Exuberant ctags.

Etags provides two commands: etags and ctags. (I'm guessing, because the
Emacs tag format is different from the ctags one, and the two programs
generate different formats?)

Thus what "ctags" refers to on a Debian-like system such as Ubuntu
will be decided by the "alternatives" system.

It may be that Emacs pulls in etags, and so you get a ctags command
referring to that.

In turn, I'm surprised to find that there is a ctags documented by POSIX.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ctags.html

Kaz Kylheku

unread,

Jan 1, 2024, 4:18:48 AMJan 1

to

Half the problem of C macros comes from the C language.

The C language has too much syntax, which causes friction
when you use the C preprocessor.

This really hit home for me when I developed the cppawk
project.

cppawk is a shell program that wraps C preprocessing around awk.

Awk has a C-inspired syntax, but there is a lot less of it;
for instance, no type declarations, and semicolons mandatory.
This simplicity makes preprocessing a lot smoother.

In cppawk, I was able to develop an iteration macro which
supports a vocabulary of clauses. Furthermore, the vocabulary
is application-extensible, very easily.
There are two variants: loop and loop_nest for parallel
and cross-product iteration. In loop_nest you can use
the parallel() combinator to combine certain clauses
to go in parallel with each other.

Here are test cases for the loop macro:

https://www.kylheku.com/cgit/cppawk/tree/testcases-iter#n67

Some of the test cases exercise user-defined clauses,
like a "first_then_until" clause which initializes
a variable with an arbitrary expression, and steps
it with another arbitrary expression, until a certain
condition:

#include <iter.h>

#define __init_first_then_until(var, first, then, until) (var = (first))
#define __test_first_then_until(var, first, then, until) (!(until))
#define __prep_first_then_until(var, first, then, until) 1
#define __fini_first_then_until(var, first, then, until) 1
#define __step_first_then_until(var, first, then, until) (var = (then))

BEGIN {
loop (first_then_until(i, 1, i * 2, i >= 60),
first_then_until(s, "x", s s, 0))
{
print i, s
}
}

Output:

1 x
2 xx
4 xxxx
8 xxxxxxxx
16 xxxxxxxxxxxxxxxx
32 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

By defining five macros: __init_X, __test_X, __prep_X, __fini_X
and __step_X, you define a new clause X. The man page details how.

Bart

unread,

Jan 1, 2024, 6:56:14 AMJan 1

to

On 01/01/2024 02:00, Lawrence D'Oliveiro wrote:
> On Mon, 1 Jan 2024 01:33:38 +0000, Bart wrote:
>
>> On 31/12/2023 23:46, Lawrence D'Oliveiro wrote:
>>>
>>> If you meant “you can’t build the simplest
>>> program *on Windows* without involving half of Linux” ... well, that’s
>>> just a reflection on the deficiencies of Windows. On Linux, you already
>>> have the *whole* of Linux to start with.
>>
>> And developers feel it necessary to USE everything that it provides!
>
> It’s called “code reuse”. A well-designed package-management system just
> makes it so much easier to do.
>
>> I've never managed to build the GMP library on Windows for example (it
>> only comes as source code), because it requires that 30,000-line bash
>> script which in turn needs sed and m4 and all the rest.
>>
>> Why? It's a numeric library. Why should it be dependent on OS?
>
> Those are just standard file-manipulation tools that any decent OS should
> provide.

What's that got to do with building able to build programs easily?

I have an interpreter app 'qc' which is 37 modules and 39Kloc. It's not
in C, and I usually build it on Windows like this:

c:\qx>mm qc

It takes 0.1 seconds. I don't support Linux directly, but I can port it
by transpiling to C first, which takes 90ms:

c:\qc>mc -c -linux qc

Now I can build it under WSL:

root@xxx:/mnt/c/qx# gcc qc.c -oqc -fno-builtin -lm -ldl

And run it, but it needs the '-nosys' option as its std libraries make
use of WinAPI:

root@xxx:/mnt/c/qx# ./qc -nosys hello
Hello, World! 1-Jan-2024 11:43:54

Here is the demo program it runs:

root@xxx:/mnt/c/qx# cat hello.q
println "Hello, World!", $date, $time

And these are the specs for those files:

c:\qx>dir qc.c qc
01/01/2024 11:40 1,392,581 qc.c
01/01/2024 11:42 846,488 qc

Quite a substantial program that can be built effortlessly on either
Windows or Linux. Using Tiny C, it apparently compiles it in 75ms.

Try the same exercise with any equivalentally-sized program originating
on Linux. That is end, end up with only a C file (even multiple C files)
that I can build with a bare compiler.

That seems to be beyond most Linux developers.

>
>> Or maybe Linux developers NEED all that hand-holding and have no idea
>> how to build using a bare compiler.
>
> If only you could do that on Windows ... but no. Look at all the C runtime
> stuff needed just to build a simple “Hello World” program ... because
> Windows automatically assumes that every program must have a GUI.

It sounds like you've either never used Windows or had one bad
experience, as this is incorrect.

If I do this on Windows:

mcc hello.c

it produces a file hello.exe which is 2.5KB. (Tcc manages it in 2.0KB!)

So what's the fuss?

>
>> Remember that end-users building
>> such projects are only doing a one-time build to get a working binary.
>
> They find it easier to do “apt-get install”, or a GUI wrapper around same,
> like Synaptic.

Windows applications have for long been a ready-to-run binary.

David Brown

unread,

Jan 1, 2024, 6:57:59 AMJan 1

to

On 31/12/2023 18:43, James Kuyper wrote:
> On 12/31/23 08:40, David Brown wrote:
> ...
>> In C, the "inline" qualifier is pretty much a message from the
>> programmer saying "I think the resulting code would be more efficient
>> if this is inlined by the optimiser".
>
> Actually, what the C standard says is "Making a function an
> inline function suggests that calls to the function be as fast as
> possible". The standard does not specify how this is to be achieved, it
> merely imposes some requirements that constrain how it could be
> achieved. Inlining a function call is just one way to do that.
>

Yes, that's what the standard says. The choice of "inline" as the
keyword is copied from C++, rather than meaning specifically that the
function should be inlined by the optimiser.

But most C programmers don't read the standards, and I think many will
assume they are asking for "inlining", rather than more generically "as
fast as possible". Usually - but not always - "as fast as possible"
will imply inlining optimisations anyway.

David Brown

unread,

Jan 1, 2024, 7:09:26 AMJan 1

to

On 31/12/2023 19:33, Scott Lurndal wrote:
> Bart <b...@freeuk.cm> writes:
>> On 31/12/2023 01:36, Lawrence D'Oliveiro wrote:
>>> On Sat, 30 Dec 2023 01:58:53 +0000, Bart wrote:
>>>
>>>> So, why don't the vendors of the library do that exercise?
>>>
>>> Maybe because most of the “vendors” of proprietary libraries have gone
>>> extinct. What we have now is “developers” and “contributors” to open-
>>> source projects. And if you have a bright idea for how they can do things
>>> better, you are free to contribute it.
>>
>> I have plenty of ideas, but people are generally not interested.
>
> Perhaps they are not very good ideas, then....
>
> Frankly, your obsession with header files is puzzling. 99.9%
> percent of C/C++ programmers don't care.

Bart needs to care about the library header files when he is working on
his C implementation, and it is understandable that he finds existing
headers frustrating when he is trying to turn them into something his C
implementation can use. But you are absolutely correct that the huge
majority of C programmers don't care what is in these headers. So as I
see it, the strange thing about Bart's view of library headers is not
that /he/ is obsessed with them or frustrated by them, but that he
apparently feels other people should also be bothered by them.

The glibc (and other C library) developers have a responsibility to make
headers that work correctly for C programmers, according to the C
standards and their own documentation of extensions, and without
significant measurable unnecessary inefficiency - beyond that, they can
organise things in whatever way suits them and whatever way they find
easiest. They have no responsibility to make life easier for people
making libraries, headers or compilers other than those they choose to
work with. Bart is right to think they /could/ have made it vastly
easier to use their headers with other tools, such as his compiler - but
wrong to think they /should/ have done so, and wrong to think that more
than a tiny proportion of C programmers are bothered about it.

C++ programmers (and C++ compiler writers) /do/ care a bit more about
headers, though not the ones from the C standard library. C++ headers
are often very big and are a significant part of compile and build
times. Thus compilers often support pre-compiled headers, and one of
the prime motivations of C++ modules in newer C++ standards is to be
more efficient and cleaner than header files.

Bart

unread,

Jan 1, 2024, 7:50:06 AMJan 1

to

On 31/12/2023 18:33, Scott Lurndal wrote:
> Bart <b...@freeuk.cm> writes:
>> On 31/12/2023 01:36, Lawrence D'Oliveiro wrote:
>>> On Sat, 30 Dec 2023 01:58:53 +0000, Bart wrote:
>>>
>>>> So, why don't the vendors of the library do that exercise?
>>>
>>> Maybe because most of the “vendors” of proprietary libraries have gone
>>> extinct. What we have now is “developers” and “contributors” to open-
>>> source projects. And if you have a bright idea for how they can do things
>>> better, you are free to contribute it.
>>
>> I have plenty of ideas, but people are generally not interested.
>
> Perhaps they are not very good ideas, then....
>
> Frankly, your obsession with header files is puzzling. 99.9%
> percent of C/C++ programmers don't care.

Well, they should care more.

Even considering only C, using libraries like SDL2, Windows, GTK looks
simple enough; you just write #include <header.h>, but behind that
header could be 100s of nested header files and 100s of 1000s of lines
of code.

I recently had to build a C program of 34 modules, which only totalled
8Kloc (with an executable of 150KB), but it took even my compiler 1.5
seconds. What was going on?

It turned out each module was processing the headers for SDL2, and many
of those headers were including other SDL headers. In all, 7-8000
#includes were being processed, and SDL2 is one of the smaller such
libraries.

The information in those SDL headers could be summarised in one header
file of 3000 lines. Why on earth would all those sub-headers need to be
exposed to a user-program anyway?

And here I've only looked at the impact on compilation times. Not on
creating bindings for other languages.

Instead of looking at this problem and coming to a conclusion like mine,
people prefer to solve by piling on more complexity, or using faster
machines, or multiple cores, or creating clever build processes to try
and avoid compiling, or invent 'precompiled headers', or any number of
fanciful ideas rather than tackling the elephant in the room.

David Brown

unread,

Jan 1, 2024, 9:38:33 AMJan 1

to

On 31/12/2023 16:45, Bart wrote:
> On 31/12/2023 15:25, David Brown wrote:
>> On 29/12/2023 23:40, Bart wrote:
>>> On 29/12/2023 20:23, Kaz Kylheku wrote:
>>>> On 2023-12-29, Keith Thompson <Keith.S.T...@gmail.com> wrote:
>>>>> David Brown <david...@hesbynett.no> writes:
>>>>>> A useful tool that someone might like to write for this particular
>>>>>> situation would be a partial C preprocessor, letting you choose what
>>>>>> gets handled. You could choose to expand the code here for, say,
>>>>>> _GNU_SOURCE and _BSD_SOURCE - any use of these in #ifdef's and
>>>>>> conditional compilation would be expanded according to whether you
>>>>>> have defined the symbols or not, leaving an output that is easier to
>>>>>> understand while keeping most of the pre-processor stuff unchanged
>>>>>> (so
>>>>>> not affecting #includes, and leaving #define'd macros and constants
>>>>>> untouched and therefore more readable).
>>>>>
>>>>> The unifdef tool does some of this. (I haven't used it much.)
>>>>
>>>> GNU cpp has an option which is something like this: -fdirectives-only.
>>>> It causes it not to expand macros.
>>>
>>> It flattens include files, processes conditionals, and keeps #defines
>>> unchanged.
>>>
>>> However, it turns gcc's sys/stat.h from 300 lines into 3000 lines.
>>>
>>> If I apply it to my stat.h (also my stddef.h which it includes),
>>> which are 110 lines together, it produces 900 lines. Most of that
>>> consists of lots of built-in #defines with __ prefixes (each complete
>>> with a line saying it is built-in).
>>>
>>> When I use my own conversion tool (designed to turn C headers that
>>> define APIs into declarations in my language), the output is 65 lines.
>>>
>>> The gcc option does not expand typedefs or macros. So if there is a
>>> declaration using a type which uses both, that is unchanged, which is
>>> not helpful. (At least not if trying to create bindings for your FFI.)
>>>
>>> gcc with just -E will expand macros but still keep typedefs.
>>>
>>
>> Note that typedefs are part of the core C language, not the
>> preprocessor, so there could not possibly be a cpp option to do
>> anything with typedefs (the phrase "expand typedefs" is entirely wrong).
>>
>> I realise that you (and possibly others) might find it useful for a
>> tool to replace typedef identifiers with their definitions, but it
>> could only be done for some cases, and is not as simple as macro
>> substitution.
>
> Take this program, which uses two nested typedefs and one macro:
>
>    typedef short T;
>    typedef T U;
>    #define V U
>
>    typedef struct R {
>        V a, b, c;
>    } S;
>
> Passed through 'gcc -E', it manages to expand the V in the struct with
> U. (-fdirectives-only doesn't even do that).

V is a macro, so it is expanded by the preprocessor.
"-fdirectives-only" instructs cpp that it should not expand macros. So
far, this is all obvious and correct.

>
> So what are the types of 'a, b, c'? Across 1000s of line of code, they
> may need tracking down. At least, for someone not using your super-duper
> tools.

As I said, I appreciate that it could be useful for you to have a tool
that "unpacks" typedefs. But that would not be related to a
preprocessor, and it could not be done by textual substitution (as
macros are handled by the C preprocessor).

In particular, you could replace all uses of "T" with "short", once you
have taken scoping into account (something not needed for preprocessor
macros). And you could replace "U" with "short". But you could /not/
replace "S" with "struct R { short a, b, c; }" without changing the
meaning of the code.

typedef's of function types and function pointer types get complicated
in other ways - handling these is not at all like macro expansion.

>
> If I use my compiler with 'mcc -mheaders', I get an output file that
> includes this:
>
>     record R = $caligned
>         i16 a
>         i16 b
>         i16 c
>     end
>
> It gives all the information I might need. Including the fact that it
> uses default C alignment rules.
>
> Notice however the name of the record is R not S; here it needs a
> struct-tag to avoid an anonymous name. The typedef name is harder to use
> as it is replaced early on in compilation.
>
> Here, I'm effectively expanding a typedef. The output could just as
> equally have been C source code.
>

David Brown

unread,

Jan 1, 2024, 9:45:02 AMJan 1

to

On 31/12/2023 22:44, Lawrence D'Oliveiro wrote:

> On Sun, 31 Dec 2023 16:25:08 +0100, David Brown wrote:
>
>> I realise that you (and possibly others) might find it useful for a tool
>> to replace typedef identifiers with their definitions, but it could only
>> be done for some cases, and is not as simple as macro substitution.
>

> String-based macros are nothing but trouble.

What a strange thing to say.

Macros based on textual substitution have their advantages and their
disadvantages. It is a reasonable generalisation to say you should
prefer alternatives when available, such as inline functions, const
objects, enum constants, typedefs, etc., rather than macro equivalents.
But there are plenty of situations where C's pre-processor macros are
extremely useful in writing good, clear and maintainable code.

> Typedefs are scoped, string
> macros are not.

True. Sometimes that is an advantage, sometimes a disadvantage.

Like any powerful tool, macros can be abused or misused, leading to
poorer results - but that does not mean they are "nothing but trouble".

Other programming languages might have different ways of doing things,
again with advantages and disadvantages.

Blue-Maned_Hawk

unread,

Jan 1, 2024, 10:46:06 AMJan 1

to

Lawrence D'Oliveiro wrote:

> On Mon, 1 Jan 2024 01:33:38 +0000, Bart wrote:
>> I've never managed to build the GMP library on Windows for example (it
>> only comes as source code), because it requires that 30,000-line bash
>> script which in turn needs sed and m4 and all the rest.
>>
>> Why? It's a numeric library. Why should it be dependent on OS?
>
> Those are just standard file-manipulation tools that any decent OS
> should provide.

The phrase “all the rest” is ambiguous, and if a C program requires more
than the CPP (which is more powerful than people give it credit for…),
chances are that something's already gone pretty wrong.

--
Blue-Maned_HawkÃ¢”‚shortens to HawkÃ¢”‚/blu.mÃ‰›in.dÃŠÂ°ak/
Ã¢”‚he/him/his/himself/Mr.
blue-maned_hawk.srht.site
Now hiring waiter. Must be expert in railway salvage operations.

Bart

unread,

Jan 1, 2024, 10:54:46 AMJan 1

to

On 01/01/2024 14:44, David Brown wrote:
> On 31/12/2023 22:44, Lawrence D'Oliveiro wrote:
>> On Sun, 31 Dec 2023 16:25:08 +0100, David Brown wrote:
>>
>>> I realise that you (and possibly others) might find it useful for a tool
>>> to replace typedef identifiers with their definitions, but it could only
>>> be done for some cases, and is not as simple as macro substitution.
>>
>> String-based macros are nothing but trouble.
>
> What a strange thing to say.
>
> Macros based on textual substitution have their advantages and their
> disadvantages. It is a reasonable generalisation to say you should
> prefer alternatives when available, such as inline functions, const
> objects, enum constants, typedefs, etc., rather than macro equivalents.
> But there are plenty of situations where C's pre-processor macros are
> extremely useful in writing good, clear and maintainable code.

The cases where macros are used sensibly would be better replaced by apt
language features (D does this for example as it is no preprocessor).

But I tend to come across the ones where macros are overused or abused.
Some people seem to delight in doing so.

When working the Lua sources a few months ago, that makes heavy uses of
macros defined in lower case which look just like functions calls.

One complex expression that my compiler had trouble with, when expanded,
resulted in a line hundreds of character long, and using 11 levels of
nested parentheses, the most I've ever come across.

I really doubt that the authors would have written code that convoluted
if macros had not been available.

>> Typedefs are scoped, string
>> macros are not.
>
> True. Sometimes that is an advantage, sometimes a disadvantage.

When it is an advantage?

My systems language only acquired macros with parameters a year or two
ago. They can only be used to expand to well-formed terms of
expressions, and are used very sparingly.

They have normal scoping, so can be shadowed or have distinct versions
in different scopes and functions can define their own macros; they can
be imported or exported just like functions and variables.

The sorts of macros that C has seem stone age by comparison. And crude,
in being able to define random bits of syntax, or synthesise tokens from
multiple parts.

Some people will obviously love that, but imagine trying to search for
some token with a text editor, but it won't find it because it only
exists after preprocessing.

> Like any powerful tool, macros can be abused or misused, leading to
> poorer results - but that does not mean they are "nothing but trouble".

I wouldn't use the work 'powerful'; 'dangerous' and 'crazy' might be
more apt.

Imagine a CPP that could also map one letter another; more 'powerful',
or just more 'crazy'?

BGB

unread,

Jan 1, 2024, 2:06:51 PMJan 1

to

For a lot of stuff, it is surprising what one can get done with a
one-liner shell script or batch file:
gcc -o prog_name sources... options...
Or:
cl /Feprog_name sources... options...

And, then 'make' is still plenty usable.

The more complex build systems often don't really help, they often make
the problem worse, and are needlessly overkill for most programs.

Also usually makes sense to be conservative for what one uses, trying to
avoid dependencies on things that aren't actually needed for a given
program (or may not be available on the various targets).

>>
>>> Or maybe Linux developers NEED all that hand-holding and have no idea
>>> how to build using a bare compiler.
>>
>> If only you could do that on Windows ... but no. Look at all the C
>> runtime
>> stuff needed just to build a simple “Hello World” program ... because
>> Windows automatically assumes that every program must have a GUI.
>
> It sounds like you've either never used Windows or had one bad
> experience, as this is incorrect.
>
> If I do this on Windows:
>
> mcc hello.c
>
> it produces a file hello.exe which is 2.5KB. (Tcc manages it in 2.0KB!)
>
> So what's the fuss?
>

Yeah, and if one does a console program, the code is basically identical
either way:
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("Hello World\n");
return(0);
}

If building it from the command line with MSVC:
cl helloworld.c

Not a big issue...

GUI is a pain, but this is true regardless of OS.

Would be good though if there were a "good" portable way to do cross
platform GUI programs, but alas.

A common strategy is for a program does all the UI drawing itself, can
avoid need for platform-specific window-management code by using SDL or
similar.

Theoretically, GTK exists, but tends to be borderline unusable on
Windows (and then, it usually ends up needing to be built via Cygwin or
similar, which has the drawback that any GUI programs that use it will
also end up spawning a console window in the background).

One other drawback of GTK is that it doesn't really separate the API
from the implementation, so GTK only really exists as "the library" and
not as an API that may have multiple underlying implementations (which
would be the more preferable strategy IMO). But, at the same time, the
API design for GTK is not itself particularly friendly to this sort of
approach (it is better to have an API design where the libraries'
implementation details are almost entirely opaque to the client program;
which isn't really the case with GTK).

For programs that may use OpenGL, it ends up common to use OpenGL for
all the UI stuff as well.

Though, at this point, if one does make a cross-platform widget library,
also good if it can also play well with OpenGL (and would prefer if
OpenGL stays around, as Vulkan is not really a good "general purpose"
replacement IMHO).

For a lot of "general use" stuff (namely, for stuff that is not high-end
3D games), something like OpenGL 1.3 to 2.1 seems "nearly ideal" (GL3.x
deprecates a lot of stuff that is useful for these "lower end"
use-cases). In newer OpenGL versions, this variant was renamed to being
the "Compatibility Profile".

In my project's OpenGL implementation, it is mostly limited to 1.x style
functionality, as I have yet to implement a GLSL compiler (also the
hardware rasterizer module only does fixed function, so any shaders
would effectively end up being run on the CPU anyways). But, granted, I
was originally using software rasterization, so it isn't that much
different, apart from a question of how to compile GLSL in a way that
wont "totally suck" for performance.

One advantage also of GL1.x and fixed-function is that it is at least a
viable option to implement for targets which don't have a GPU.

Granted, one could argue that OpenGL is still a heavyweight option for
GUI if one doesn't need any fancy graphical features (and could have
just drawn the UI into a bitmap buffer).
Would be nice if the widget toolkit could work either way:
Using raw bitmap graphics if this is all that is needed;
Using OpenGL or a hybrid option if GL is in use.

Hybrid, eg: drawing into a bitmap and then using glDrawPixels or similar
to overlay the GUI widgets on top of the parts drawn via GL.

Maybe could also make sense to have a Unicode font expressed in SDF form
and similar (possibly with multiple mip levels to improve glyph quality).

...

>>
>>> Remember that end-users building
>>> such projects are only doing a one-time build to get a working binary.
>>
>> They find it easier to do “apt-get install”, or a GUI wrapper around
>> same,
>> like Synaptic.
>
> Windows applications have for long been a ready-to-run binary.

Yeah, and for the most part, pretty much any program compiled within the
past 25 years or so is going to work without issue using the original
binaries (much past this, the main problem cases being programs
targeting MS-DOS or Win16).

We don't need OS repos, because program installers can be casually
downloaded off the internet.

I guess, technically there is now the "Microsoft Store" (which is sort
of like "Google Play" on Android), but haven't really used it for much.

Bart

unread,

Jan 1, 2024, 3:13:18 PMJan 1

to

On 01/01/2024 19:06, BGB wrote:
> On 1/1/2024 5:56 AM, Bart wrote:

>> Quite a substantial program that can be built effortlessly on either
>> Windows or Linux. Using Tiny C, it apparently compiles it in 75ms.
>>
>> Try the same exercise with any equivalentally-sized program
>> originating on Linux. That is end, end up with only a C file (even
>> multiple C files) that I can build with a bare compiler.
>>
>> That seems to be beyond most Linux developers.
>>
>
> For a lot of stuff, it is surprising what one can get done with a
> one-liner shell script or batch file:
> gcc -o prog_name sources... options...
> Or:
> cl /Feprog_name sources... options...
>

The approaches I use are mainly:

1 Submit all the files, libraries and options on one line like
your example. Using up/down keys to recall commands.

2 Put the command line in (1) into a shell script

3 Put the same information, usually on multiple lines into an
'@' file, then build using 'gcc @file' for example.

The above are fine as instructions for someoneelse to do a one-off build
of your project. But what I normally use for development:

4 I use my 60KB console IDE and a project file descripting the modules
similar to the @ file. Now I can browse/edit/compile individual
files, and link and run the lot.

The problem with 1/2/3 when using a slow compiler like gcc is that you
have to compile everything. With (4) I can compile only what's relevant.
Since I will be very familiar with the project, I will know the main
dependencies and will know when I need to build everything.

> And, then 'make' is still plenty usable.

If you understand make, that's fine. I generally prefer my option (4),
which I've used, in various forms, for decades.

I wouldn't however inflict a makefile-based build on someone; I might
supply (3) and a one-line build instruction. Unless it's a generated C
then it will be one file anyway, or it might even be just *.c.

> The more complex build systems often don't really help, they often make
> the problem worse, and are needlessly overkill for most programs.

They very often don't work, especially outside their comfort zone of Linux.

> Yeah, and if one does a console program, the code is basically identical
> either way:
> #include <stdio.h>
> int main(int argc, char *argv[])
> {
> printf("Hello World\n");
> return(0);
> }
>
> If building it from the command line with MSVC:
> cl helloworld.c
>
> Not a big issue...

> GUI is a pain, but this is true regardless of OS.
>
>
> Would be good though if there were a "good" portable way to do cross
> platform GUI programs, but alas.
>
> A common strategy is for a program does all the UI drawing itself, can
> avoid need for platform-specific window-management code by using SDL or
> similar.

I've given up on GUI from a compiled language. I use it only from my
interpreted code using library that works on top of WinAPI. It was
supposed to be adaptable to X11, but I never got around to it, and that
looks as terrible as WinAPI anyway.

However SDL seems a reasonable compromise. It's a LOT smaller than GTK
(which is really a collection of libraries; I think it comprises 50
different DLLs). There are smaller ones around, but it general it's too
much of a pain. You're also never going to produce something as flashy
as even the tackiest Android app.

> One other drawback of GTK is that ...

... it's a monster.

(When I did try to test my compiler on it a few years ago - the API and
DLLs, not the source codes; the headers are complicated enough - I found
that one of its structs used bitfields, which are
implementation-dependent, in a public API.

That means the size of the struct, which is critical, depended on how
the compiler dealt with bitfields. I think it mainly works by luck since
most use gcc or compatible, that matches what was used to build the
library.)

> We don't need OS repos, because program installers can be casually
> downloaded off the internet.
>
> I guess, technically there is now the "Microsoft Store" (which is sort
> of like "Google Play" on Android), but haven't really used it for much.

Good luck in getting your app into one of those stores.

It's getting harder to provide binaries to Windows users, first because
your machine now needs to be unlocked to run arbitrary EXE files (even
MS apps like command.exe are otherwise banned!).

But also because a random EXE will tricker AV warnings or even
quarantine your program.

I don't know how programs such as C compilers that you download get
around such checks.