What I've learned in comp.lang.c

bart

unread,

Feb 4, 2024, 8:09:22 PMFeb 4

to

In no particular order.

* Software development can ONLY be done on a Unix-related OS

* It is impossible to develop any software, let alone C, on pure Windows

* You can spend decades developing and implementing systems languages at
the level of C, but you still apparently know nothing of the subject

* You can spend a decade developing whole-program compilers and a
suitably matched language, and somebody can still lecture you on exactly
what a whole-language compiler is, because you've got it wrong

* No matter how crazy the interface or behaviour of some Linux utility,
no one is ever going to admit there's anything wrong with it

* Every single tool I've written, is a toy.

* Every single project I've worked on, is a toy (even if it made my
company millions)

* No one should post or link code here, unless it passes '-std=c99
-pedantic-errors'

* Discussing build systems for C, is off-topic

* Discussing my C compiler, is off-topic, but discussing gcc is fine

* Nobody here apparently knows how to build a program consisting purely
of C source files, using only a C compiler.

* Simply enumerating the N files and submitting them to the compiler in
any of several easy methods seems to be out of the question. Nobody has
explained why.

* Nearly everyone here is working on massively huge and complex
projects, which all take from minutes to hours for a full build.

* Hardly anybody here has a project which can be built simply by
compiling and linking all the modules. Even Tim Rentsch's simplest
project has a dizzying set of special requirements.

* Funnily enough, every project I /have/ managed to build with my
compilers after eventually getting through the complexity, /has/ reduced
down to a simple list of .c files.

* The Tiny C compiler, is a toy. Even though you'd have trouble telling,
from the behaviour of a binary, whether or not it was built with tcc.

* Actually, any C compiler that is not gcc, clang, or possibly MSVC, is
a toy. Unless you have to buy it.

* There's is nothing wrong with AT&T assembly syntax

* There's especially nothing wrong with AT&T syntax written as a series
of string literals, with extra % symbols, together with \n and \t escapes.

* There is not a single feature of my alternate systems language that is
superior to the C equivalent

* There is not even a single feature that is worth discussing as a
possible feature of C

* There is nothing in my decades of implementing such languages (even
implementing C), that makes my views on such possible features have any
weight at all

* Having fast compilation speed of C is of no use to anyone and
impresses nobody.

* Having code where you naughtily cast a function pointer to or from a
function pointer is a no-no. No matter that the whole of C is widely
regarded as unsafe.

* Nobody here is interested in a simple build system for C. Not even my
idea of a README simply listing the files needed, and any special steps,
to accompany the usual makefiles.

* There is no benefit at all in having a tool like a compiler, be a
small, self-contained executable.

* Generated C code is not real C code.

* I should use makefiles myself for my own language, even though the
build-process is always one, simple, indivisable command that usually
completes in 1/10th of a second.

* Makefiles should be for everything.

* There's no problem in having to specify those pesky .c extensions to
compiler input files, or adding that -o option

* But it's too much work to specify a filename to 'make', or to even
remember what your project is called

* Linux /does/ use .c and .s extensions to distinguish between file contents

* But Linux also uses a.out to mean both an executable and an object
file. Huh.

* C added a 'text' mode to to convert \n to/from CRLF when Windows came
along.

* Somebody who's only developed under Unix, and using a plethora of
ready-made tools and utilities, is not in a bubble.

* But somebody who's developed under a range of other environments
spanning eras, is the one who's been in their own bubble.

* I was crazy to write '1M' lines of code (I've no idea how much) in my
private language

* I am apparently ignorant, a moron and might even be a BOT.

* I am allowed to have strong opinions, but I will always be wrong.

Shall I post this pile of crap or not?

I really need to get back to some of those pointless, worthless toy
projects of mine.

So here goes....

Kaz Kylheku

unread,

Feb 5, 2024, 12:59:08 AMFeb 5

to

On 2024-02-05, bart <b...@freeuk.com> wrote:
>
> In no particular order.
>
> * Software development can ONLY be done on a Unix-related OS
>
> * It is impossible to develop any software, let alone C, on pure Windows

I've developed on DOS, Windows as well as for DSP chips and some
microcontrollers. I find most of the crap that you say is simply wrong.

Speaking of Windows, the CL.EXE compiler does not know where its
include files are. You literally cannot do "cl program.c".
You have to give it options which tell it where the SDK is installed:
where the headers and libraries are.

The Visual Studio project-file-driven build build system passes all
those details to every invocation of CL.EXE. Your project file (called
a "solution" nowadays) includes information like the path where your SDK
is installed. In the GUI there is some panel where you specify it.

If I'm going to be doing programming on Windows today, it's either going
be some version of that CL.EXE compiler from Microsoft, or GCC.

> * You can spend decades developing and implementing systems languages at
> the level of C, but you still apparently know nothing of the subject

There is forty years of experience and then there is 8 years, five times
over again.

> * You can spend a decade developing whole-program compilers and a
> suitably matched language, and somebody can still lecture you on exactly
> what a whole-language compiler is, because you've got it wrong

Writing a compiler is pretty easy, because the bar can be set very low
while still calling it a compiler.

Whole-program compilers are easier because there are fewer requirements.
You have only one kind of deliverable to produce: the executable.
You don't have to deal with linkage and produce a linkable format.

> * No matter how crazy the interface or behaviour of some Linux utility,
> no one is ever going to admit there's anything wrong with it

That is false; the stuff has a lot of critics, mostly from the inside
now. (Linux outsiders are mostly a lunatic fringe nowadays. The tables
have turned.)

You don't seem to understand that the interfaces tools that are not
directly invoked by people don't matter, as long as they are reliable.

And then, interfaces that are exposed to user are hard to change, even
if we don't like them, because changes break things. Everyone hates
breaking changes more than they hate the particular syntax of a tool.

The environment is infinitely customizeable. Users have their private
environments which works they way they want. At the command line,
you can use aliases and shell functions to give yourself the ideal
commands you want.

You only have to use the standard commands when writing scripts to be
used by others. And even then, you can include functions which work
the way you want, and then use your functions.

> * Discussing my C compiler, is off-topic, but discussing gcc is fine

GCC is maintained by people who know what a C compiler is, and GCC can
be asked to be one.

You've chosen not to read the C standard, which leaves you unqualified
to even write test cases to validate that something is a C compiler.

Your idea of writing a C compiler seems to be to pick some random
examples of code believed to be C and make them work. (Where "work"
means that they compile and show a few behaviors that look like
the expected ones.)

Basically, you don't present a very credible case that you've actually
written a C compiler.

> * Nobody here apparently knows how to build a program consisting purely
> of C source files, using only a C compiler.
>
> * Simply enumerating the N files and submitting them to the compiler in
> any of several easy methods seems to be out of the question. Nobody has
> explained why.
>
> * Nearly everyone here is working on massively huge and complex
> projects, which all take from minutes to hours for a full build.

That's the landscape. Nobody is going to pay you for writing small
utilities in C. That sort of thing all went to scripting languages.
(It happens from time to time as a side task.)

I currently work on a a firmware application that compiles to a 100
megabyte (stripped!) executable.

> * There is not a single feature of my alternate systems language that is
> superior to the C equivalent

The worst curve ball someone could throw you would be to
be eagerly interested in your language, and ask for guidance
in how to get it installed and start working in it.

Then you're screwed.

As long as you just post to comp.lang.c, you're safe from that.

> * Having fast compilation speed of C is of no use to anyone and
> impresses nobody.

Not as much as fast executable code, unfortunately.

If it takes 10 extra seconds of compilation to shave off a 100
milliseconds off a program, it's worth if it millions of copies of that
program are used.

Most of GCC's run time is spent in optimizing. It's a lot faster
with -O0.

I just measured a 3.38X difference compiling a project with -O0 versus
its usual -O2. This means it's spending over 70% of its time on
optimizing.

The remaining 30% is still kind of slow.

But it's not due to scanning lots of header files.

If I run it with the "-fsyntax-only" option so that it parses all
the syntax, but doesn't produce output, it gets almost 4X faster
(versus -O0, and thus about 13.5X faster compared to -O2).

Mode: | -fsyntax-only | -O0 | -O2 |
Time: | 1.0 | 4.0 | 13.5 |

Thus, about 7.5% is spent on scanning, preprocessing and parsing.
22.2% is spent on the intermediate code processing and target
generation activities, and 70.4 on optimization.

Is it due to decades of legacy code in GCC? Clang is a newer
implementatation, so you might think it's faster than GCC. But it
manages only to be about the same.

Compilers that blaze through large amounts of code in the blink of an
eye are almost certainly dodging on the optimization. And because they
don't need the internal /architecture/ to support the kinds
optimizations they are not doing, they can speed up the code generation
also. There is no need to generate an intermediate representation like
SSA; you can pretty much just parse the syntax and emit assembly code in
the same pass. Particularly if you only target one architecture.

A poorly optimizing retargetable compiler that emits an abstract
intermediate code will never be as blazingly fast as something equally
poorly optimizing that goes straight to code in one pass.

> * Having code where you naughtily cast a function pointer to or from a
> function pointer is a no-no.

Nobody said that, but it was pointed out that this isn't a feature of
the ISO C standard dialect. It's actually a common extension, widely
exploited by programs. There is nothing wrong with using it, but people
who know C understand that it's not "maximally portable". Most code
does not have to anywhere near "maximally portable".

> * There is no benefit at all in having a tool like a compiler, be a
> small, self-contained executable.

Not as much as there used to, decades ago.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazi...@mstdn.ca

Chris M. Thomasson

unread,

Feb 5, 2024, 1:50:06 AMFeb 5

to

On 2/4/2024 9:58 PM, Kaz Kylheku wrote:
> On 2024-02-05, bart <b...@freeuk.com> wrote:
>>
>> In no particular order.
>>
>> * Software development can ONLY be done on a Unix-related OS
>>
>> * It is impossible to develop any software, let alone C, on pure Windows
>
> I've developed on DOS,

TSR's?

> Windows as well as for DSP chips and some
> microcontrollers. I find most of the crap that you say is simply wrong.

[...]

Kaz Kylheku

unread,

Feb 5, 2024, 2:03:33 AMFeb 5

to

On 2024-02-05, Chris M. Thomasson <chris.m.t...@gmail.com> wrote:
> On 2/4/2024 9:58 PM, Kaz Kylheku wrote:
>> On 2024-02-05, bart <b...@freeuk.com> wrote:
>>>
>>> In no particular order.
>>>
>>> * Software development can ONLY be done on a Unix-related OS
>>>
>>> * It is impossible to develop any software, let alone C, on pure Windows
>>
>> I've developed on DOS,
>
> TSR's?

I did make a couple of TSRs back in the day, but only as a hobby.

Not in C.

Tim Rentsch

unread,

Feb 5, 2024, 2:37:03 AMFeb 5

to

bart <b...@freeuk.com> writes:

> [...]

>
> * Hardly anybody here has a project which can be built simply by
> compiling and linking all the modules.

Not everyone is working on a project where the deliverable is a
single executable. It's much more difficult to work on a project
where the deliverables form a set of related files that make up a
third-party library, and target multiple platforms.

> Even Tim Rentsch's simplest project has a dizzying set of special
> requirements.

This statement is a misrepresentation, and undoubtedly a deliberate
one. Furthermore how it is expressed is petty and childish.

Chris M. Thomasson

unread,

Feb 5, 2024, 2:51:30 AMFeb 5

to

On 2/4/2024 11:03 PM, Kaz Kylheku wrote:
> On 2024-02-05, Chris M. Thomasson <chris.m.t...@gmail.com> wrote:
>> On 2/4/2024 9:58 PM, Kaz Kylheku wrote:
>>> On 2024-02-05, bart <b...@freeuk.com> wrote:
>>>>
>>>> In no particular order.
>>>>
>>>> * Software development can ONLY be done on a Unix-related OS
>>>>
>>>> * It is impossible to develop any software, let alone C, on pure Windows
>>>
>>> I've developed on DOS,
>>
>> TSR's?
>
> I did make a couple of TSRs back in the day, but only as a hobby.
>
> Not in C.
>

Nice. I only messed around with them a couple of times. There was a cool
one, iirc, called key correspondence (KEYCOR), iirc. It was
programmable, and could be used with any program. I used it for a
reporting system and to control WordPerfect 5.1. I still have it! lol.
For legacy purposes.

Chris M. Thomasson

unread,

Feb 5, 2024, 2:52:24 AMFeb 5

to

Writing WordPerfect 5.1 macros was a fun time.... ;^o

Malcolm McLean

unread,

Feb 5, 2024, 3:29:50 AMFeb 5

to

On 05/02/2024 01:09, bart wrote:
>
> In no particular order.
>
> * Software development can ONLY be done on a Unix-related OS
>
> * It is impossible to develop any software, let alone C, on pure Windows
>
> * You can spend decades developing and implementing systems languages at
> the level of C, but you still apparently know nothing of the subject
>

The tone's currently rather bad, and somehow it has developed that you
and I are on one side and pretty much everyone else on the other. We
both have open source projects which are or at least attempt to be
actually useful to other people, whilst I don't think many of the others
can say that, and maybe that's the underlying reason. But who knows.

I'm trying to improve the tone. It's hard because people have got lots
of motivations for posting, and some of them aren't very compatible with
a good humoured, civilised group. And we've got a lot of bad behaviour,
not all of it directed at us by any means. However whilst you're very
critical of other people's design decisions, I've rarely if ever heard
to say that therefore you criticise someone's general character. But
finally tolerance has snapped.

--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm

Jan van den Broek

unread,

Feb 5, 2024, 3:37:05 AMFeb 5

to

2024-02-05, Chris M. Thomasson <chris.m.t...@gmail.com> schrieb:
> On 2/4/2024 11:51 PM, Chris M. Thomasson wrote:

[Schnipp]

>> Nice. I only messed around with them a couple of times. There was a cool
>> one, iirc, called key correspondence (KEYCOR), iirc. It was
>> programmable, and could be used with any program. I used it for a
>> reporting system and to control WordPerfect 5.1. I still have it! lol.
>> For legacy purposes.
>
> Writing WordPerfect 5.1 macros was a fun time.... ;^o

Writing my own macro-compiler was also fun.
--
Jan v/d Broek
balg...@dds.nl
Look out, here he comes again
The kid with the replaceable head

Dan Purgert

unread,

Feb 5, 2024, 6:03:27 AMFeb 5

to

On 2024-02-05, bart wrote:
> [...]

> * Nobody here apparently knows how to build a program consisting purely
> of C source files, using only a C compiler.

What does this one mean, exactly? I thought the "compiler" and "linker"
were separate tools used under the umbrella term "compiling".

--
|_|O|_|
|_|_|O| Github: https://github.com/dpurgert
|O|O|O| PGP: DDAB 23FB 19FA 7D85 1CC1 E067 6D65 70E5 4CE7 2860

David Brown

unread,

Feb 5, 2024, 7:15:38 AMFeb 5

to

On 05/02/2024 02:09, bart wrote:
>
> In no particular order.
>

<snip>

>
> Shall I post this pile of crap or not?

No. It is, after all, a pile of crap.

What we have learned about Bart in c.l.c. :

* Bart generally does not read what other people post.

* Bart exaggerates /everything/ - including things no one ever wrote.

* Bart extrapolates /everything/ to get a binary black-or-white
all-or-nothing straw man that he can complain about. If one person says
they do X and not Y, Bart takes that to mean /everyone/ does X and /no
one/ does Y.

* Bart prefers tilting at windmills to learning about C, or indeed
anything in the real world.

* Bart has no interest in what anyone else does, wants or needs.

>
> I really need to get back to some of those pointless, worthless toy
> projects of mine.
>

No, you should go back to doing something you actually /like/. Stop
fighting imaginary battles about problems that don't exist. Stop
winding yourself into a frenzy over nothing.

You don't like C. Find something else that you /do/ like, and use that
instead.

I can't speak for anyone else, but I'd rather you were happily doing
something you are comfortable with, instead of endless gripes and rants
here.

Ben Bacarisse

unread,

Feb 5, 2024, 9:09:56 AMFeb 5

to

David Brown <david...@hesbynett.no> writes:

> On 05/02/2024 02:09, bart wrote:
>> In no particular order.
>>
>
> <snip>
>
>> Shall I post this pile of crap or not?
>
> No. It is, after all, a pile of crap.

Why do you reply so much? I get why Bart posts, but not why you reply
so much? It's not as if gcc (to take one but one example) needs to be
defended from people saying it does it all wrong!

He sometimes posts things that, in my opinion, benefit from a reply.
His "C is so mysterious, how can any one use it posts", for example,
benefit from a reply or two that explains what C's rules really are so
that people coming along later will know what the facts of the matter
are. But there have been none of those lately.

--
Ben.

Scott Lurndal

unread,

Feb 5, 2024, 9:52:55 AMFeb 5

to

I have reached the point where it's not worth my time to respond
to bart, even to correct his misrepresentations of what I and
other have said.

Michael S

unread,

Feb 5, 2024, 11:24:11 AMFeb 5

to

On Mon, 5 Feb 2024 05:58:55 -0000 (UTC)
Kaz Kylheku <433-92...@kylheku.com> wrote:

> On 2024-02-05, bart <b...@freeuk.com> wrote:
> >
> > In no particular order.
> >
> > * Software development can ONLY be done on a Unix-related OS
> >
> > * It is impossible to develop any software, let alone C, on pure
> > Windows
>
> I've developed on DOS, Windows as well as for DSP chips and some
> microcontrollers. I find most of the crap that you say is simply
> wrong.
>
> Speaking of Windows, the CL.EXE compiler does not know where its
> include files are. You literally cannot do "cl program.c".
> You have to give it options which tell it where the SDK is installed:
> where the headers and libraries are.
>

It depends on definitions.
cl.exe called from random command prompt, either cmd.exe or powershell,
does not know.
cl.exe called from "x64 Native Tools Command Prompt for VS 2019" that I
have installed on the computer that I'm writing this message, knows
very well where they are, because when I clicked on the shortcut it was
written into environment variables, respectively named Include and Lib.
So, from this prompt I can do "cl program.c".
In practice, I d likely prefer "cl -W4 -O1 -MD program.c", but that's
because I am more than most people concerned about unimportant details.
Call it a defect of character.

> The Visual Studio project-file-driven build build system passes all
> those details to every invocation of CL.EXE. Your project file (called
> a "solution" nowadays) includes information like the path where your
> SDK is installed. In the GUI there is some panel where you specify
> it.
>
> If I'm going to be doing programming on Windows today, it's either
> going be some version of that CL.EXE compiler from Microsoft, or GCC.
>

Native C language gcc programming under MSYS2 and native C language
clang programming under MSYS2 have extremely similar look and feel. I
can't think about any technical reasons to prefer one over the other.

Michael S

unread,

Feb 5, 2024, 11:32:48 AMFeb 5

to

On Mon, 5 Feb 2024 05:58:55 -0000 (UTC)
Kaz Kylheku <433-92...@kylheku.com> wrote:
>

> Is it due to decades of legacy code in GCC? Clang is a newer
> implementatation, so you might think it's faster than GCC. But it
> manages only to be about the same.
>

I still believe that "decades of legacy" are the main reason.
clang *was* much faster than gcc 10-12 years ago. Since then it
accumulated a decade of legacy. And this particular decade mostly
consisted of code that was written by people that (a) less experienced
than gcc maintainers (b) care about speed of compilation even less than
gcc maintainers. Well, for the later, I don't really believe that it is
possible, but I need to bring a plausible explanation, don't I?

Michael S

unread,

Feb 5, 2024, 12:02:25 PMFeb 5

to

On Mon, 5 Feb 2024 05:58:55 -0000 (UTC)
Kaz Kylheku <433-92...@kylheku.com> wrote:

> > * Nearly everyone here is working on massively huge and complex
> > projects, which all take from minutes to hours for a full build.
>
> That's the landscape. Nobody is going to pay you for writing small
> utilities in C. That sort of thing all went to scripting languages.
> (It happens from time to time as a side task.)
>
> I currently work on a a firmware application that compiles to a 100
> megabyte (stripped!) executable.
>

My before last firmware project compiles from scratch in 0m1.623s
despite using bloated STmicro libraries and headers.
On Windows, with antivirus running, using 10 y.o. PC.
With brand new CPU, bare metal Linux and modern NVMe SSD it will likely
finish 3 times faster.
Windows by itself is not a measurable slowdown, but antivirus is, and
until now I didn't find a way to get antivirus-free Windows at work.

Projects that small are not typical in my embedded development
practice practice. But embedded projects that on somewhat beefier 5 y.o.
hardware compile from scratch in less than 5 sec are typical.

As to PC development, project that I am trying to fix right now uses
link-time code generation, so it takes ~8 seconds (VS 2019, msbuild,
command line tools) to rebuild when just one file changed. I accept it
because it's not my own project. If it was mine, I'd probably want to
improve it. Besides, today I grew older, less intolerable of delays
than 25-30 years ago.

Jim Jackson

unread,

Feb 5, 2024, 1:02:01 PMFeb 5

to

Did you expect anything else?

David Brown

unread,

Feb 5, 2024, 2:53:55 PMFeb 5

to

Early clang was faster than C at compilation and static error checking.
And it had much nicer formats and outputs for its warnings. But it
wasn't close to gcc for optimisation and generated code efficiency, and
had less powerful checking.

Over time, clang has gained a lot more optimisation and is now similar
to gcc in code generation (each is better at some things), while gcc has
sped up some aspects and greatly improved the warning formats.

clang is now a similar speed to gcc because it does a similar job. It
turns out that doing a lot of analysis and code optimisation takes effort.

Kaz Kylheku

unread,

Feb 5, 2024, 3:54:03 PMFeb 5

to

It takes more and more effort for diminishing results.

A compiler can spend a lot of time just searching for the conditions
that allow a certain optimization, where those conditions turn out to be
false most of the time. So that in a large code base, there will be just
a couple of "hits" (the conditions are met, and the optimization can
take place). Yet all the instruction sequences in every basic block in
every file had to be looked at to determine that.

Mnay of these conditions are specific to the optimization. Another
kind of optimization has its own conditions that don't reuse anything
from that one. So the more optimizations you add, the more work it takes
just to determine applicability.

The optimizer may have to iterate on the program graph. After certain
optimizations are applied, the program graph changes. And that may
"unlock" more opportunities to do optimizations that were not possible
before. But because the program graph changed, its properties have to be
recalculated, like liveness of variables/temporaries and whatnot.
More time.

Kenny McCormack

unread,

Feb 5, 2024, 5:58:48 PMFeb 5

to

In article <XC6wN.397626$p%Mb.3...@fx15.iad>,
Scott Lurndal <sl...@pacbell.net> wrote:
...

>I have reached the point where it's not worth my time to respond
>to bart, even to correct his misrepresentations of what I and
>other have said.

Being killfiled by Scotty is almost as good a thing as being kf'd by
Keith. Keep it up!

--
Life's big questions are big in the sense that they are momentous. However, contrary to
appearances, they are not big in the sense of being unanswerable. It is only that the answers
are generally unpalatable. There is no great mystery, but there is plenty of horror.
(https://en.wikiquote.org/wiki/David_Benatar)

Lawrence D'Oliveiro

unread,

Feb 5, 2024, 6:28:18 PMFeb 5

to

On Mon, 5 Feb 2024 19:02:09 +0200, Michael S wrote:

> Windows by itself is not a measurable slowdown, but antivirus is, and
> until now I didn't find a way to get antivirus-free Windows at work.

But if you don’t have antivirus on your build machine, the sad fact of
development on Windows is that there are viruses that will insinuate
themselves into the build products.

Lawrence D'Oliveiro

unread,

Feb 5, 2024, 6:29:18 PMFeb 5

to

On Mon, 5 Feb 2024 01:09:10 +0000, bart wrote:

> * Software development can ONLY be done on a Unix-related OS

Does the term “butthurt” mean anything to you?

Richard Harnden

unread,

Feb 5, 2024, 6:41:16 PMFeb 5

to

Reflections on Trusting Trust?

Michael S

unread,

Feb 5, 2024, 6:46:28 PMFeb 5

to

No, if I use Windpws there are no danger of viruses like these.
Besides, it's not like antivirus could have helped against viruses if
I was stupid enough to catch them. To the opposite, I suspect that
presence of antivirus increases attak surface.

Chris M. Thomasson

unread,

Feb 5, 2024, 7:03:49 PMFeb 5

to

There can be viruses hidden in source code for public domain code...
Build it and they will come! ;^o

Chris M. Thomasson

unread,

Feb 5, 2024, 7:06:22 PMFeb 5

to

Other viruses can be build, not infected... Run it, BAM!!

David Brown

unread,

Feb 6, 2024, 3:44:37 AMFeb 6

to

Yes.

> A compiler can spend a lot of time just searching for the conditions
> that allow a certain optimization, where those conditions turn out to be
> false most of the time. So that in a large code base, there will be just
> a couple of "hits" (the conditions are met, and the optimization can
> take place). Yet all the instruction sequences in every basic block in
> every file had to be looked at to determine that.

This is always the case with optimisations. Each pass might only give a
few percent increase in speed - but when you have 50 passes, this adds
up to a lot. And some passes (that is, some types of optimisation) can
open up new opportunities for if you redo previous passes. And the same
applies to static error checking - there is quite an overlap in the
kinds of analysis used for optimisations and for static error checking.

>
> Mnay of these conditions are specific to the optimization. Another
> kind of optimization has its own conditions that don't reuse anything
> from that one. So the more optimizations you add, the more work it takes
> just to determine applicability.
>
> The optimizer may have to iterate on the program graph. After certain
> optimizations are applied, the program graph changes. And that may
> "unlock" more opportunities to do optimizations that were not possible
> before. But because the program graph changed, its properties have to be
> recalculated, like liveness of variables/temporaries and whatnot.
> More time.
>

Yes.

For a great lot of code, it is not necessary to squeeze out as much
speed as possible. But IMHO it is usually a good idea to have as much
static error checking as you reasonably can without too high a risk of
false positives.

Major compilers aren't really bothered about the speed of compilation of
C code - it is usually fast enough that it is of little concern. Those
that are building a lot, use make (or other build tools), perhaps
ccache, and usually use machines with plenty of cores and plenty of ram.

It's C++ that is the concern, especially big projects. And there you
/do/ need at least some optimisation effort, because C++ is generally
full of little functions that are expected to "disappear" entirely by
inlining. So that is where the compiler developer effort goes for
compiler speed, analysis, and optimisation.

Programmers are notoriously bad at determining which bits of their code
need to be efficient. And if they know their compiler is poor at
optimising, they do "manual optimisation". They use pointers where
arrays would be clearer. They reuse "temp" variables instead of making
new ones. They write jumbles of "gotos" instead of breaking code into
multiple functions. They write "(x << 3) + x" instead of "x * 9". It
is much better to write the clearest source code you can, and let the
compiler do its job and generate efficient object code.

It's never a bad thing if a compiler is faster. But IMHO it is more
important for the compiler to be /better/ - better warnings and checks
that catch issues earlier, and better optimisation because that allows
people to write code in the clearest, safest and most maintainable way
while still getting good results.

David Brown

unread,

Feb 6, 2024, 3:50:24 AMFeb 6

to

Nonsense. Well, /almost/ nonsense. When thinking about security, you
should not rule out anything entirely.

And of course there are those two or three unfortunate people that have
to work with embedded Windows.

David Brown

unread,

Feb 6, 2024, 3:54:50 AMFeb 6

to

My experience is that antivirus programs rarely catch anything unless
the user is very gullible, or very unlucky. I have seen antivirus
programs block valid programs with false positives more often than I
have seen them catch actual malware. (And that's company wide, not just
my machines.) There is no major antivirus software that has not killed
at least some Windows machines by false-positive blocking of critical
Windows components.

And yes, there have been many successful attacks and hacks that get into
Windows machines via flaws in the massively over-complicated "security"
software.

Chris M. Thomasson

unread,

Feb 6, 2024, 4:01:46 AMFeb 6

to

;^)

Chris M. Thomasson

unread,

Feb 6, 2024, 4:04:01 AMFeb 6

to

On 2/6/2024 12:44 AM, David Brown wrote:
[...]

> Programmers are notoriously bad at determining which bits of their code
> need to be efficient.

This brings me back to a code base I was ask to take a look at. Well,
the keyword register was all over the place! Spooky...

[...]

Michael S

unread,

Feb 6, 2024, 6:42:12 AMFeb 6

to

On Tue, 6 Feb 2024 09:44:20 +0100
David Brown <david...@hesbynett.no> wrote:
>
>
> > A compiler can spend a lot of time just searching for the conditions
> > that allow a certain optimization, where those conditions turn out
> > to be false most of the time. So that in a large code base, there
> > will be just a couple of "hits" (the conditions are met, and the
> > optimization can take place). Yet all the instruction sequences in
> > every basic block in every file had to be looked at to determine
> > that.
>
> This is always the case with optimisations. Each pass might only
> give a few percent increase in speed - but when you have 50 passes,
> this adds up to a lot. And some passes (that is, some types of
> optimisation) can open up new opportunities for if you redo previous
> passes.

Except that at least gcc by design never redo previous passes. More so,
it does not even try to compare result of optimization with certain
pass vs result without this pass and to take better of the two.

I don't know if the same applies to clang, I never had
conversations with clang maintainers (had plenty with gcc maintainers).
However, the bottom line for last 2-3 years is that when I compare
speed of gcc-compiled code vs clang-compiled then both can do good
job and both can do ordinary stupid things, but clang is much more
likely then gcc to do astonishingly stupid things. Like, for example,
vectorization that reduces the speed by factor of 3 vs non-vectorized
variant.
So, most likely, clang also proceeds pass after pass after pass and
never ever looks back. Seems like they took the lesson of Lot's wife
very seriously.

David Brown

unread,

Feb 6, 2024, 7:08:32 AMFeb 6

to

On 06/02/2024 12:41, Michael S wrote:
> On Tue, 6 Feb 2024 09:44:20 +0100
> David Brown <david...@hesbynett.no> wrote:
>>
>>
>>> A compiler can spend a lot of time just searching for the conditions
>>> that allow a certain optimization, where those conditions turn out
>>> to be false most of the time. So that in a large code base, there
>>> will be just a couple of "hits" (the conditions are met, and the
>>> optimization can take place). Yet all the instruction sequences in
>>> every basic block in every file had to be looked at to determine
>>> that.
>>
>> This is always the case with optimisations. Each pass might only
>> give a few percent increase in speed - but when you have 50 passes,
>> this adds up to a lot. And some passes (that is, some types of
>> optimisation) can open up new opportunities for if you redo previous
>> passes.
>
> Except that at least gcc by design never redo previous passes. More so,
> it does not even try to compare result of optimization with certain
> pass vs result without this pass and to take better of the two.

AFAIUI (I am not a gcc developer), gcc redoes certain types of
optimisations after later passes - even if it calls them different pass
numbers. For example, constant propagation and dead code elimination is
done early on in functions. Then after inlining and IPA passes, it is
done again using the new information.

I expect you are correct that it does not try to compare the results
from pass to pass. I think that would quickly be infeasible. You can't
just compare the results of applying optimisation B to base A to see if
it is better or worse than before A was, and then decide which to keep
before moving to step C. Maybe A was better than AB, but ABC is better
than AC. You'd need to keep comparing all sorts of combinations, and it
would be a scalability nightmare.

>
> I don't know if the same applies to clang, I never had
> conversations with clang maintainers (had plenty with gcc maintainers).
> However, the bottom line for last 2-3 years is that when I compare
> speed of gcc-compiled code vs clang-compiled then both can do good
> job and both can do ordinary stupid things, but clang is much more
> likely then gcc to do astonishingly stupid things. Like, for example,
> vectorization that reduces the speed by factor of 3 vs non-vectorized
> variant.

I see the same, though I have not used clang very seriously for real
work. It does, however, seem a bit over-enthusiastic about vectorising
code.

Lawrence D'Oliveiro

unread,

Feb 6, 2024, 6:23:26 PMFeb 6

to

On Tue, 6 Feb 2024 09:44:20 +0100, David Brown wrote:

> They reuse "temp" variables instead of making new ones.

I like to limit the scope of my temporary variables. In C, this is as easy
as sticking a pair of braces around a few statements.

Lawrence D'Oliveiro

unread,

Feb 6, 2024, 6:24:56 PMFeb 6

to

On Tue, 6 Feb 2024 09:50:02 +0100, David Brown wrote:

> And of course there are those two or three unfortunate people that have
> to work with embedded Windows.

I thought this has pretty much gone away, pushed aside by Linux.

bart

unread,

Feb 6, 2024, 8:35:50 PMFeb 6

to

On 05/02/2024 08:29, Malcolm McLean wrote:

> On 05/02/2024 01:09, bart wrote:
>>
>> In no particular order.
>>

>> * Software development can ONLY be done on a Unix-related OS
>>

>> * It is impossible to develop any software, let alone C, on pure Windows
>>

>> * You can spend decades developing and implementing systems languages
>> at the level of C, but you still apparently know nothing of the subject
>>
>
> The tone's currently rather bad, and somehow it has developed that you
> and I are on one side and pretty much everyone else on the other. We
> both have open source projects which are or at least attempt to be
> actually useful to other people, whilst I don't think many of the others
> can say that, and maybe that's the underlying reason. But who knows.
>
> I'm trying to improve the tone. It's hard because people have got lots
> of motivations for posting, and some of them aren't very compatible with
> a good humoured, civilised group. And we've got a lot of bad behaviour,
> not all of it directed at us by any means. However whilst you're very
> critical of other people's design decisions, I've rarely if ever heard
> to say that therefore you criticise someone's general character.
>

Well we've both posted code of sizeable, actual and practical projects.
Very few on the 'other side' have. Maybe it's proprietory or there are
other reasons. But it means their own output can't be criticised here.

Myself I've also pretty much given up on discussing new features for C
or new directions. The thread on build systems lies outside the
language. But few regulars are that interested in that side of it; only
in what C does right now.

From what I can see, the most fascinating topics for them are pedantic
details of the C standard, and the most low level technical details of
Unix-like systems.

> But finally tolerance has snapped.

I usually argue against ideas not people. But there is only so many
personal insults that you can take.

bart

unread,

Feb 6, 2024, 9:19:06 PMFeb 6

to

On 05/02/2024 05:58, Kaz Kylheku wrote:
> On 2024-02-05, bart <b...@freeuk.com> wrote:

> Writing a compiler is pretty easy, because the bar can be set very low
> while still calling it a compiler.

> Whole-program compilers are easier because there are fewer requirements.
> You have only one kind of deliverable to produce: the executable.
> You don't have to deal with linkage and produce a linkable format.

David Brown suggested that they were harder than I said. You're saying
they are easier.

BTW your statements are wrong, but I'm not going to argue about it.

My whole-program compiler is here:

https://github.com/sal55/langs/blob/master/MCompiler.md

It has a dozen different outputs.

> GCC is maintained by people who know what a C compiler is, and GCC can
> be asked to be one.

So what is it when it's not a C compiler? What language is it compiling
here:

c:\qx>gcc qc.c
c:\qx>

This program passes. Mine does the same:

c:\qx>mcc qc.c
Compiling qc.c to qc.exe

Whatever language that mcc processes must be similar to that that gcc
processes.

Yet it is true that gcc can be tuned to a particular standard, dialect,
set of extensions and a set of user-specified behaviours. Which means it
can also compile some Frankensteinian version of 'C' that anyone can devise.

Mine at least is a more rigid subset.

> Your idea of writing a C compiler seems to be to pick some random
> examples of code believed to be C and make them work. (Where "work"
> means that they compile and show a few behaviors that look like
> the expected ones.)

That's what most people expect!

> Basically, you don't present a very credible case that you've actually
> written a C compiler.

Well, don't believe it if you don't want. There 1000s of amateur 'C'
compilers about, it must be the most favoured language for such projects
(since it looks deceptively simple).

Among such compilers, mine is quite accomplished by comparison. One task
it is used for is to take APIs defined by C header files and turn into
into bindings in my two languages. It does that as well as any such tool
can. So fuck you.

> I currently work on a a firmware application that compiles to a 100
> megabyte (stripped!) executable.

And yet 90% of the executables on my PC are under 1MB. SOMEBODY must be
writing small programs!

The NASM.EXE program is bit larger at 1.3MB for example, that's 98.7%
smaller than your giant program.

You want to make me feel bad about my stuff because you work on a big
project and mine are small. Let me go and find that length of rope then...

>> * There is not a single feature of my alternate systems language that is
>> superior to the C equivalent
>
> The worst curve ball someone could throw you would be to
> be eagerly interested in your language, and ask for guidance
> in how to get it installed and start working in it.

That happened 2-3 years ago and I was able to help out. However I'm not
pushing my actual language, which is anyway volatile as it is a vehicle
for new ideas, I was only discussing the utility of certain features.

Surely somebody can do that without going to the trouble of creating and
implementation a whole language, and using the feature over years, as
proof of concept.

But when someone actually does that, THEN they are not worth listening to?

I mean, where is YOUR lower-level system language? Where is anybody's? I
don't mean the Zigs and Rusts because that would be like comparing a
40-tonne truck with a car.

My language is a modernish family car compared with C's Model T.

> Not as much as fast executable code, unfortunately.

And yet most people code in Python and JavaScript and a whole pile of
slow languages.

> Compilers that blaze through large amounts of code in the blink of an
> eye are almost certainly dodging on the optimization.

Yes, probably. But the optimisation is overrated. Do you really need
optimised code to test each of those 200 builds you're going to do today?

Not for a language at the level of C. (Maybe for C++ code as it needs it
to collapse the mountain of redundant code that templates etc will produce.)

For the programs I write, gcc-O3 makes then 1.5 to 2.0 faster typically,
for 100 times longer compile time.

And if I do want the boost, I can transpile to C to use gcc-O3. I don't
need the super-optimisation within my own product.

> And because they
> don't need the internal /architecture/ to support the kinds
> optimizations they are not doing, they can speed up the code generation
> also. There is no need to generate an intermediate representation like
> SSA; you can pretty much just parse the syntax and emit assembly code in
> the same pass. Particularly if you only target one architecture.
>
> A poorly optimizing retargetable compiler that emits an abstract
> intermediate code will never be as blazingly fast as something equally
> poorly optimizing that goes straight to code in one pass.

My non-C compiler uses multiple passes including an IL stage. It is not
much slower than TCC which is one pass, but generally produces faster code.

It can compile itself at about 15Hz. (That is, 15 new generations per
second. Unoptimised.)

>> * There is no benefit at all in having a tool like a compiler, be a
>> small, self-contained executable.
>
> Not as much as there used to, decades ago.

Simplicity is always good. Somebody deletes one of the 1000s of files of
your gcc installation. Is it something that is essential? Who knows.

But if your compiler is the one file mm.exe, it's easy to spot if it's
missing!

Kaz Kylheku

unread,

Feb 6, 2024, 9:26:17 PMFeb 6

to

On 2024-02-07, bart <b...@freeuk.com> wrote:
> Well we've both posted code of sizeable, actual and practical projects.
> Very few on the 'other side' have. Maybe it's proprietory or there are
> other reasons. But it means their own output can't be criticised here.

Posting large amounts of code into discussion groups isn't practical,
and against netiquette.

The right thing is to host your code somewhere (which it behooves you to
do for obvious other reasons) and post a link to it.

People used to share code via comp.sources.*. Some well-known old
projects first made their appearance that way. E.g. Dick Grune posted
the first version of CVS in what was then called mod.sources in 1986.

David Brown

unread,

Feb 7, 2024, 2:54:33 AMFeb 7

to

Generally, you want to have the minimum practical scope for your local
variables. It's rare that you need to add braces just to make a scope
for a variable - usually you have enough braces in loops or conditionals
- but it happens.

However, the context here was compiler optimisation. Not all compilers
have good optimisation. In the embedded world, there are vast numbers
of C compilers, many of which are much more limited than the modern and
advanced tools most of us use today. These weaker compilers are much
rarer now, as are many of the ISAs they served - 32-bit ARM "M" cores
are dominant along with gcc. But in the old days, an embedded C
programmer had to write their code in a way that suited the compiler if
they wanted the best out of their microcontroller - and efficient code
means cheaper devices, lower power and longer battery life. Some of
these weaker tools would allocate registers to local variables on a
first come, first served basis, with no lifetime analysis or reuse
inside a function. Thus you re-used your temporary variables.

Making some "temp" variables and re-using them was also common for some
people in idiomatic C90 code, where all your variables are declared at
the top of the function.

David Brown

unread,

Feb 7, 2024, 2:56:35 AMFeb 7

to

It was never common in the first place, and yes, it is almost entirely
non-existent now. I'm sure there are a few legacy products still
produced that use some kind of embedded Windows, but few more than that
- which is what I was hinting at in my post.

David Brown

unread,

Feb 7, 2024, 3:30:44 AMFeb 7

to

On 07/02/2024 03:18, bart wrote:
> On 05/02/2024 05:58, Kaz Kylheku wrote:
>> On 2024-02-05, bart <b...@freeuk.com> wrote:
>
>> Writing a compiler is pretty easy, because the bar can be set very low
>> while still calling it a compiler.
>
>> Whole-program compilers are easier because there are fewer requirements.
>> You have only one kind of deliverable to produce: the executable.
>> You don't have to deal with linkage and produce a linkable format.
>
> David Brown suggested that they were harder than I said. You're saying
> they are easier.

I described what /I/ see as "whole program compilers", and where I see
them being used as serious tools that give better results than
traditional compile-and-link toolchains. The key here is whole program
/optimisation/ and static analysis. And I think there can be little
doubt that this is a far harder task than the much more limited tools
you are talking about.

Maybe it was unreasonable of me to conflate "whole program compiler" and
"whole program optimiser", even though I see no real-world use of the
former without the later. Using your definition of the term, your tool
is a "whole program compiler".

And I think Kaz was using the term in the same way as you do when he
says he thinks it is easier. I don't know either way, but it would
certainly skip several things that are otherwise necessary in a
traditional setup - assembly generation, an assembler, and a linker.
You also don't have to deal with linking object files from other sources.

(For the record, I there are many things that cannot be done with C and
traditional compile-link setups, that could be done with some kind of
whole-program analysis and a suitable language. Rust's borrow checker,
and XMOS XC's thread analysis are two examples.)

>
>> GCC is maintained by people who know what a C compiler is, and GCC can
>> be asked to be one.
>
> So what is it when it's not a C compiler? What language is it compiling
> here:
>

You walked right into that one - how many times has the difference
between standard C and sort-of-C been explained to you? As always, I
must point out that a tool does not have to be standards compliant -
that's a choice of the tool developer. But when the distinction is
made, and Kaz was clearly making that distinction, a "C compiler" is one
that follows the C standards (one or more published version) accurately
in terms of what it accepts or does not accept, the minimum guaranteed
behaviour, and the minimum required diagnostics. As has been explained
many times, "gcc" is not, in those terms, a "C compiler" by default - it
needs flags to put it in a compliant mode. Your tool, AFAIK, has never
claimed to be a standards-compliant C compiler.

>
> Whatever language that mcc processes must be similar to that that gcc
> processes.

Yes. Both accept some version of sort-of-C, with a common subset. (The
common subset in this example code may also, by coincidence, be standard
C. I haven't looked at it to see.)

>
> Yet it is true that gcc can be tuned to a particular standard, dialect,
> set of extensions and a set of user-specified behaviours. Which means it
> can also compile some Frankensteinian version of 'C' that anyone can
> devise.
>
> Mine at least is a more rigid subset.
>

Your idea of "rigid" is other people's idea of "inflexible". Rigid is
fine for one user.

>> Your idea of writing a C compiler seems to be to pick some random
>> examples of code believed to be C and make them work. (Where "work"
>> means that they compile and show a few behaviors that look like
>> the expected ones.)
>
> That's what most people expect!
>

No, it is not. /I/ expect a compiler to be written by people who have
extensive knowledge of the C standards and who do their best to get the
compiler correct /by design/. Not by luck or trial and error. By
/design/. And I expect it to have an extensive test suite of both
simple code and extreme code and corner cases, because even the best
designers can get things wrong sometimes and testing helps catch bugs.

Malcolm McLean

unread,

Feb 7, 2024, 3:59:27 AMFeb 7

to

On 07/02/2024 07:54, David Brown wrote:
> On 07/02/2024 00:23, Lawrence D'Oliveiro wrote:
>> On Tue, 6 Feb 2024 09:44:20 +0100, David Brown wrote:
>>
>>> They reuse "temp" variables instead of making new ones.
>>
>> I like to limit the scope of my temporary variables. In C, this is as
>> easy
>> as sticking a pair of braces around a few statements.
>
> Generally, you want to have the minimum practical scope for your local
> variables. It's rare that you need to add braces just to make a scope
> for a variable - usually you have enough braces in loops or conditionals
> - but it happens.
>

The two common patterns are to give each variable the minimum scope, or
to decare all variables at the start of the function and give them all
function scope.

The case for minimum scope is the same as the case for scope itself. The
variable is accessible where it is used and not elsewhere, which makes
it less likely it will be used in error, and means there are fewer names
to understand.

However there are also strong arguments for ducntion scope. A function
is a natural unit. Adn all the varibales used in that unit are listed
together and, ideally, commented. So at a glance you can see what is in
scope and what is being operated on. And there are only three levels of
scope. A varibale is global, or it is file scope, or it is scoped to the
function.

I tend to prefer function scope for C. However I use a lot of C++ these
days, and in C++ local scope is often better, and in some cases even
necessary. So I find that I'm tending to use local scope in C more.

--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm

Malcolm McLean

unread,

Feb 7, 2024, 4:04:34 AMFeb 7

to

On 07/02/2024 02:18, bart wrote:
> On 05/02/2024 05:58, Kaz Kylheku wrote:
>>
>
>> Basically, you don't present a very credible case that you've actually
>> written a C compiler.
>
> Well, don't believe it if you don't want. There 1000s of amateur 'C'
> compilers about, it must be the most favoured language for such projects
> (since it looks deceptively simple).
>

It's absolutely clear to me that Bart has written a C compiler, and this
statement by Kaz is ridiculous.

Ben Bacarisse

unread,

Feb 7, 2024, 5:04:41 AMFeb 7

to

David Brown <david...@hesbynett.no> writes:

> Making some "temp" variables and re-using them was also common for some
> people in idiomatic C90 code, where all your variables are declared at the
> top of the function.

The comma suggests (I think) that it is C90 that mandates that all one's
variables are declared at the top of the function. But that's not the
case (as I am sure you know). The other reading -- that this is done in
idiomatic C90 code -- is also something that I'd question, but not
something that I'd want to argue.

I comment just because there seems to be a myth that "old C" had to have
all the declarations at the top of a function. That was true once, but
so long ago as to be irrelevant. Even K&R C allowed declarations at the
top of a compound statement.

--
Ben.

Michael S

unread,

Feb 7, 2024, 5:10:07 AMFeb 7

to

Is there any digital oscilloscope that is not Windows under the hood?
How about medical equipment?
The first question is mostly rhetorical, the second is not.

Ben Bacarisse

unread,

Feb 7, 2024, 5:48:01 AMFeb 7

to

Malcolm McLean <malcolm.ar...@gmail.com> writes:

> On 07/02/2024 07:54, David Brown wrote:
>> On 07/02/2024 00:23, Lawrence D'Oliveiro wrote:
>>> On Tue, 6 Feb 2024 09:44:20 +0100, David Brown wrote:
>>>
>>>> They reuse "temp" variables instead of making new ones.
>>>
>>> I like to limit the scope of my temporary variables. In C, this is as
>>> easy
>>> as sticking a pair of braces around a few statements.
>> Generally, you want to have the minimum practical scope for your local
>> variables. It's rare that you need to add braces just to make a scope
>> for a variable - usually you have enough braces in loops or conditionals
>> - but it happens.
>>
> The two common patterns are to give each variable the minimum scope, or to
> decare all variables at the start of the function and give them all
> function scope.

The term "function scope" has a specific meaning in C. Only labels have
function scope. I know you are not very interested in using exact
terms, but some people might like to know the details.

Since you want to argue for the peculiar (but common) practice of giving
names the largest possible scope (without altering their linkage) you
need a term for the outer-most block scope, but "function scope" is
taken.

> The case for minimum scope is the same as the case for scope itself.

Someone might well misinterpret the term "minimum scope" since it would
require adding lots of otherwise redundant braces. I *think* you mean
declaring names at the point of first use. The resulting scope is not
minimum because it often extends beyond the point of last use.

Other people, not familiar with" modern" C, might interpret the term to
mean declaring names at the top of the inner-most appropriate block.

> The
> variable is accessible where it is used and not elsewhere, which makes it
> less likely it will be used in error, and means there are fewer names to
> understand.

The case for declaration at first use is much stronger than this. It
almost always allows for a meaningful initialisation at the same point,
so the initialisation does not need to be hunted down a checked. For
me, this is a big win. (Yes, some people then insist on a dummy
initialisation when the proper one isn't know, but that's a fudge that
is, to my mind, even worse.)

> However there are also strong arguments for ducntion scope. A function is a
> natural unit. Adn all the varibales used in that unit are listed together
> and, ideally, commented. So at a glance you can see what is in scope and
> what is being operated on.

You should not need an inventory of what's being operated on. Any
function so complex that I can't tell immediately what declaration
corresponds to which name needs to be re-written. I'd argue that
this is also a big win for "short scopes". A policy that leads to early
triggers for refactoring is worth considering.

> And there are only three levels of scope. A
> varibale is global, or it is file scope, or it is scoped to the
> function.

You are mixing up scope and lifetime. C has no "global scope". A name
may have external linkage (which is probably what you are referring to),
but that is not directly connected to its scope.

> I tend to prefer function scope for C.

We could call it outer-most block scope rather than re-use a term with
an existing, but different, technical meaning.

> However I use a lot of C++ these
> days, and in C++ local scope is often better, and in some cases even
> necessary. So I find that I'm tending to use local scope in C more.

Interesting. Is it just that using C++ has given you what you would
think of as a bad habit in C, or has using C++ led you to see that your
old preference was not the best one?

--
Ben.

bart

unread,

Feb 7, 2024, 5:48:11 AMFeb 7

to

On 07/02/2024 02:26, Kaz Kylheku wrote:
> On 2024-02-07, bart <b...@freeuk.com> wrote:
>> Well we've both posted code of sizeable, actual and practical projects.
>> Very few on the 'other side' have. Maybe it's proprietory or there are
>> other reasons. But it means their own output can't be criticised here.
>
> Posting large amounts of code into discussion groups isn't practical,
> and against netiquette.
>
> The right thing is to host your code somewhere (which it behooves you to
> do for obvious other reasons) and post a link to it.
>
> People used to share code via comp.sources.*. Some well-known old
> projects first made their appearance that way. E.g. Dick Grune posted
> the first version of CVS in what was then called mod.sources in 1986.
>

Directly including source code as part of a post is not that practical
beyond a few hundred lines of code.

Clearly that wouldn't count as 'sizeable'. That would need to be done
via a link.

bart

unread,

Feb 7, 2024, 6:04:58 AMFeb 7

to

On 07/02/2024 10:47, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.ar...@gmail.com> writes:

>> However there are also strong arguments for function scope. A function is a
>> natural unit. And all the variables used in that unit are listed together

>> and, ideally, commented. So at a glance you can see what is in scope and

>> what is being operated on. [typos fixed]

>
> You should not need an inventory of what's being operated on. Any
> function so complex that I can't tell immediately what declaration
> corresponds to which name needs to be re-written.

But if you keep functions small, eg. the whole body is visible at the
same time, then there is less need for declarations to clutter up the
code. They can go at the top, so that you can literally can just glance
there.

>> And there are only three levels of scope. A
>> varibale is global, or it is file scope, or it is scoped to the
>> function.

> You are mixing up scope and lifetime. C has no "global scope". A name
> may have external linkage (which is probably what you are referring to),
> but that is not directly connected to its scope.

Funny, I use the same definitions of scope:

int abc; // inter-file scope, may be imported or exported
static int def; // file scope

void F(void) {
int ghi; // function-scope
}

If I look inside my compiler, I can see these sets of enums to describe
scope (not C code):

(function_scope, "Fn"), !within a function (note
import/exported names can be declared in a block scope)
(local_scope, "Loc"), !file-scope/not exported
(imported_scope, "Imp"), !imported from another module
(exported_scope, "Exp") !file-scope/exported
end

Within a function, there is an additional mechanism to deal with block
scopes. Plus another overall to deal with namespaces.

Malcolm McLean

unread,

Feb 7, 2024, 7:44:48 AMFeb 7

to

On 07/02/2024 10:47, Ben Bacarisse wrote:

> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>
>> On 07/02/2024 07:54, David Brown wrote:
>>> On 07/02/2024 00:23, Lawrence D'Oliveiro wrote:
>>>> On Tue, 6 Feb 2024 09:44:20 +0100, David Brown wrote:
>>>>
>>>>> They reuse "temp" variables instead of making new ones.
>>>>
>>>> I like to limit the scope of my temporary variables. In C, this is as
>>>> easy
>>>> as sticking a pair of braces around a few statements.
>>> Generally, you want to have the minimum practical scope for your local
>>> variables. It's rare that you need to add braces just to make a scope
>>> for a variable - usually you have enough braces in loops or conditionals
>>> - but it happens.
>>>
>> The two common patterns are to give each variable the minimum scope, or to
>> decare all variables at the start of the function and give them all
>> function scope.
>
> The term "function scope" has a specific meaning in C. Only labels have
> function scope. I know you are not very interested in using exact
> terms, but some people might like to know the details.
>

To explain this, if we have

void function(void)
{
int i;

for (i = 0; i < 10;; i++)
dosomething();
if ( condition)
{
int i;

for (i = 0; i < 11; i++)
dosomething();
if (i == 10)
/* always false */
}
}

The first i is not in scope when we test for i == 10 and the test will
be false. So "fucntion scope" isn't the term.

However if we have this:

void fucntion(void)
{
label:
dosomething();
if (condition)
{
label:
dosomething();
}
got label:
}

Then it is a error. Both labels are in scope and that isn't allowed.

> Since you want to argue for the peculiar (but common) practice of giving
> names the largest possible scope (without altering their linkage) you
> need a term for the outer-most block scope, but "function scope" is
> taken.
>

So "function scope" isn't the correct term. So we need another. I expect
that at this point someone will jump in and say it must be "Malcolm
scope". As you say, it's common enough to need a term for it.

>> The case for minimum scope is the same as the case for scope itself.
>
> Someone might well misinterpret the term "minimum scope" since it would
> require adding lots of otherwise redundant braces. I *think* you mean
> declaring names at the point of first use. The resulting scope is not
> minimum because it often extends beyond the point of last use.
>

Yes, I don't mean literally the minimum scope that would be possible by
artificially ending a block when a variable is used for the last time.
No one would do that. I mean that the variable is either declared at
point of first use or, if this isn't allowed because of the C version,
at the top of the block in which it is used. But also that variables are
not reused if in fact the value is discarded between statements or
especially between blocks.

> Other people, not familiar with" modern" C, might interpret the term to
> mean declaring names at the top of the inner-most appropriate block.
>

Top of the block or point of first use?

>> The
>> variable is accessible where it is used and not elsewhere, which makes it
>> less likely it will be used in error, and means there are fewer names to
>> understand.
>
> The case for declaration at first use is much stronger than this. It
> almost always allows for a meaningful initialisation at the same point,
> so the initialisation does not need to be hunted down a checked. For
> me, this is a big win. (Yes, some people then insist on a dummy
> initialisation when the proper one isn't know, but that's a fudge that
> is, to my mind, even worse.)
>

If you go for top of block and you don't have a value, you either
intialise, usually to zero, or leave it wild. Neither is ideal. But it
rarely makes a big difference. However if you go for policy two, all the
variables are either given initial values at the top of the function or
they are not given initial values at the top of the function,and so you
can easily check, and ensure that all the initial values are consistent
woth each other.

>
> We could call it outer-most block scope rather than re-use a term with
> an existing, but different, technical meaning.
>

The variable has scope within the function, within the whole of the
function, and the motive is that the function is the natural unit of
thought. So I think we need the word "function".

>> However I use a lot of C++ these
>> days, and in C++ local scope is often better, and in some cases even
>> necessary. So I find that I'm tending to use local scope in C more.
>
> Interesting. Is it just that using C++ has given you what you would
> think of as a bad habit in C, or has using C++ led you to see that your
> old preference was not the best one?
>

Not sure. If I thought it was a terrible habit of course I wouldn't do
it. I do think it makes the code look a little bit less clear. But it's
slightly easier to write and hack, which is why I do it.

David Brown

unread,

Feb 7, 2024, 8:01:43 AMFeb 7

to

On 07/02/2024 09:59, Malcolm McLean wrote:
> On 07/02/2024 07:54, David Brown wrote:
>> On 07/02/2024 00:23, Lawrence D'Oliveiro wrote:
>>> On Tue, 6 Feb 2024 09:44:20 +0100, David Brown wrote:
>>>
>>>> They reuse "temp" variables instead of making new ones.
>>>
>>> I like to limit the scope of my temporary variables. In C, this is as
>>> easy
>>> as sticking a pair of braces around a few statements.
>>
>> Generally, you want to have the minimum practical scope for your local
>> variables. It's rare that you need to add braces just to make a scope
>> for a variable - usually you have enough braces in loops or
>> conditionals - but it happens.
>>
> The two common patterns are to give each variable the minimum scope, or
> to decare all variables at the start of the function and give them all
> function scope.
>
> The case for minimum scope is the same as the case for scope itself. The
> variable is accessible where it is used and not elsewhere, which makes
> it less likely it will be used in error, and means there are fewer names
> to understand.
>

It makes code simpler, clearer, easier to reuse, easier to see that it
is correct, and easier to see if there is an error. It is very much
easier for automatic tools (static warnings) to spot issues.

> However there are also strong arguments for ducntion scope.

Not in my experience and in my opinion.

> A function
> is a natural unit.

True, but irrelevant.

> Adn all the varibales used in that unit are listed
> together and, ideally, commented.

In reality, not commented. And if commented, then commented incorrectly.

Rather than trying to write vague comments to say what something is how
it is used, it is better to write the code so that it is clear. Giving
variables appropriate names is part of that. For the most part, I'd say
if you think a variable needs a comment, your code is not clear enough
or has poor structure.

It is /massively/ simpler and clearer to write :

for (int i = 0; i < 10; i++) { ... }

than

int i;

/* ... big gap ... */

for (i = 0; i < 10; i++) { ... }

It doesn't help if you have "int loop_index;" or add a comment to the
variable definition. Putting it at the loop itself is better.

> So at a glance you can see what is in
> scope and what is being operated on. And there are only three levels of
> scope. A varibale is global, or it is file scope, or it is scoped to the
> function.

Every block is a new scope. Function scope in C is only for labels.

>
> I tend to prefer function scope for C. However I use a lot of C++ these
> days, and in C++ local scope is often better, and in some cases even
> necessary. So I find that I'm tending to use local scope in C more.
>

I hate having to work with code written in long-outdated "declare
everything at the top of the function" style. I realise style and
experience are subjective, but I have not seen any code or any argument
that has led me to doubt my preferences.

Richard Harnden

unread,

Feb 7, 2024, 8:21:54 AMFeb 7

to

We could have 'malcolm-scope' ?!

(sorry :) )

David Brown

unread,

Feb 7, 2024, 8:22:11 AMFeb 7

to

On 07/02/2024 12:04, bart wrote:
> On 07/02/2024 10:47, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>
>>> However there are also strong arguments for function scope. A
>>> function is a
>>> natural unit. And all the variables used in that unit are listed
>>> together
>>> and, ideally, commented. So at a glance you can see what is in scope and
>>> what is being operated on. [typos fixed]
>>
>> You should not need an inventory of what's being operated on. Any
>> function so complex that I can't tell immediately what declaration
>> corresponds to which name needs to be re-written.
>
> But if you keep functions small, eg. the whole body is visible at the
> same time, then there is less need for declarations to clutter up the
> code. They can go at the top, so that you can literally can just glance
> there.
>

With a small enough function, the benefits of minimum practical scope
(or "define on first use") are reduced, but not removed. The perceived
benefits of "declare everything at the start of the function" disappear
entirely.

>>> And there are only three levels of scope. A
>>> varibale is global, or it is file scope, or it is scoped to the
>>> function.
>
>> You are mixing up scope and lifetime. C has no "global scope". A name
>> may have external linkage (which is probably what you are referring to),
>> but that is not directly connected to its scope.
>
> Funny, I use the same definitions of scope:
>

For discussions of C, it's best to use the well-defined C terms for
scope and lifetime. Other languages may use different terms.

Malcolm McLean

unread,

Feb 7, 2024, 8:42:51 AMFeb 7

to

On 07/02/2024 13:01, David Brown wrote:
> On 07/02/2024 09:59, Malcolm McLean wrote:
>>
>> The case for minimum scope is the same as the case for scope itself.
>> The variable is accessible where it is used and not elsewhere, which
>> makes it less likely it will be used in error, and means there are
>> fewer names to understand.
>>
>
> It makes code simpler, clearer, easier to reuse, easier to see that it
> is correct, and easier to see if there is an error. It is very much
> easier for automatic tools (static warnings) to spot issues.
>

This is all true, but only in way. Whilst it's easier to see that there
are errors in one way, because you have to look at a smaller section of
code, it's harder in others, for example because that small section is
more cluttered. From experience with automatic tools, they give too many
false warnings for correct code, and then programmers often rewrite the
code less clearly to suppress the warning.

>> However there are also strong arguments for ducntion scope.
>
> Not in my experience and in my opinion.
>

That's not a legitimate response. The correct thing to say is "you have
given a argment there but I don't think it is strong one". Unless you
are claiming to be experieenced in arguing with people over scope, and I
donlt think that is what yiu mean to say,

>> A function is a natural unit.
>
> True, but irrelevant.
>
>> Adn all the varibales used in that unit are listed together and,
>> ideally, commented.
>
> In reality, not commented. And if commented, then commented incorrectly.
>

Variable names mean something. The classic name for a variable is "x".
This usually means either "the value that is given" or "the horizontal
value on an axis". But it can of ciurse mean "a value which we shall
calculate that doesn;t have an abvous other name", or even maybe, "the
nunber of times the letter "x" appears in the data. It depnds on
context. However the imprtant thing is that x should always mean the
same thing within the same function. So if it's a real on the horizontal
axis of a graph, we don't also use "x" for an integer we need to
factorise, in the same function. And if it isn't clear, (x is such a
strong convention that it seldom needs a comment), we need to say how
"x" is being used and what it means in that function. Function and not
block is the unit for that.

>
> Rather than trying to write vague comments to say what something is how
> it is used, it is better to write the code so that it is clear. Giving
> variables appropriate names is part of that. For the most part, I'd say
> if you think a variable needs a comment, your code is not clear enough
> or has poor structure.
>

I prefer short variable names because it is the mathematical convention
and because it makes complex expressions easier to read. But of course
then they can't be as meaningful. So to use a short name and add a
comment is reasonable way to achieve both goals.

>
> It is /massively/ simpler and clearer to write :
>
>     for (int i = 0; i < 10; i++) { ... }
>
> than
>
>     int i;
>
>     /* ... big gap ... */
>
>     for (i = 0; i < 10; i++) { ... }
>
> It doesn't help if you have "int loop_index;" or add a comment to the
> variable definition. Putting it at the loop itself is better.
>

This pattern is quite common in C.

for (i = 0; i < N; i++)
if (x[i] == 0)
break;
if (i == N) /* no zero found */

So you can't scope the counter to the loop.

i is always a loop index. Usually I just out one at the top so it is
hanging around and handy.

>
>
> I hate having to work with code written in long-outdated "declare
> everything at the top of the function" style. I realise style and
> experience are subjective, but I have not seen any code or any argument
> that has led me to doubt my preferences.
>

I quite often work with code which was written a very long time ago and
is still useful. That's one of the big strengths of C. It is subjective
however. It's not about making life easier for the compiler. It's about
what is clearer. That depends on the way people read code and think
about it, and that won't necessarily be the same for every person.

David Brown

unread,

Feb 7, 2024, 8:50:03 AMFeb 7

to

"Function scope" is not the term, because - as has been explained to you
- "function scope" has a specific meaning in C, and this is not it.

Everyone can figure out what you are trying to say - you mean the
outermost block scope of the function. It's just block scope, as normal.

(By the way, you do know that Thunderbird has a pretty good spell
checker? I don't want to get hung up on this, and don't want to start a
new branch or argument, but avoiding the silly typos in your posts would
improve them.)

>
> However if we have this:
>
> void fucntion(void)
> {
>    label:
>    dosomething();
>    if (condition)
>    {
>       label:
>       dosomething();
>    }
>    got label:
> }
>
> Then it is a error. Both labels are in scope and that isn't allowed.

Yes, that's because labels have function scope in C.

>
>> Since you want to argue for the peculiar (but common) practice of giving
>> names the largest possible scope (without altering their linkage) you
>> need a term for the outer-most block scope, but "function scope" is
>> taken.
>>
> So "function scope" isn't the correct term. So we need another. I expect
> that at this point someone will jump in and say it must be "Malcolm
> scope". As you say, it's common enough to need a term for it.
>

We don't need a new term. We have the terms in the C standards. Block
scope is fine.

Note that there is another very big difference between "function scope"
and "block scope". Labels in function scope are in scope within the
function, even before they are declared. For identifiers in block
scope, their scope does not start until they are declared.

>
>>> The case for minimum scope is the same as the case for scope itself.
>>
>> Someone might well misinterpret the term "minimum scope" since it would
>> require adding lots of otherwise redundant braces. I *think* you mean
>> declaring names at the point of first use. The resulting scope is not
>> minimum because it often extends beyond the point of last use.
>>
>
> Yes, I don't mean literally the minimum scope that would be possible by
> artificially ending a block when a variable is used for the last time.
> No one would do that. I mean that the variable is either declared at
> point of first use or, if this isn't allowed because of the C version,
> at the top of the block in which it is used. But also that variables are
> not reused if in fact the value is discarded between statements or
> especially between blocks.
>
>> Other people, not familiar with" modern" C, might interpret the term to
>> mean declaring names at the top of the inner-most appropriate block.
>>
> Top of the block or point of first use?

In C90, you have to declare your variables before any statements within
the block. In C99, you can intermingle declarations and statements.
Thus even in C90, you can still have top of block declarations.

>>> The
>>> variable is accessible where it is used and not elsewhere, which
>>> makes it
>>> less likely it will be used in error, and means there are fewer names to
>>> understand.
>>
>> The case for declaration at first use is much stronger than this. It
>> almost always allows for a meaningful initialisation at the same point,
>> so the initialisation does not need to be hunted down a checked. For
>> me, this is a big win. (Yes, some people then insist on a dummy
>> initialisation when the proper one isn't know, but that's a fudge that
>> is, to my mind, even worse.)
>>
> If you go for top of block and you don't have a value, you either
> intialise, usually to zero, or leave it wild. Neither is ideal.

Leaving it uninitialised is /much/ better, unless you are using weak
tools or don't know how to use them properly. (There can be
circumstances where code is too complex for compilers to be sure that a
variable is never used uninitialised, and you might find it appropriate
to give a dummy initialisation in that case. But such cases are rare.)

Even better, of course, is not to declare the variable at all until you
have something sensible to put in it. (And then consider making it
"const" if it does not change.)

> But it
> rarely makes a big difference. However if you go for policy two, all the
> variables are either given initial values at the top of the function or
> they are not given initial values at the top of the function,and so you
> can easily check, and ensure that all the initial values are consistent
> woth each other.
>

If you declare your variables when you have a value for them, then the
initial values are all clear and consistent, and have no artificial
values, and in many cases, they never change. Having your variables
unchanging makes code /much/ easier to understand and check for correctness.

>>
>> We could call it outer-most block scope rather than re-use a term with
>> an existing, but different, technical meaning.
>>
> The variable has scope within the function, within the whole of the
> function, and the motive is that the function is the natural unit of
> thought. So I think we need the word "function".

No, we don't. And no, the scope is /not/ the entire function.

David Brown

unread,

Feb 7, 2024, 8:52:00 AMFeb 7

to

On 07/02/2024 11:04, Ben Bacarisse wrote:
> David Brown <david...@hesbynett.no> writes:
>
>> Making some "temp" variables and re-using them was also common for some
>> people in idiomatic C90 code, where all your variables are declared at the
>> top of the function.
>
> The comma suggests (I think) that it is C90 that mandates that all one's
> variables are declared at the top of the function. But that's not the
> case (as I am sure you know).

Yes.

> The other reading -- that this is done in
> idiomatic C90 code -- is also something that I'd question, but not
> something that I'd want to argue.

"Idiomatic" is perhaps not the best word. (And "idiotic" is too
strong!) I mean written in a way that is quite common in C90 code.

>
> I comment just because there seems to be a myth that "old C" had to have
> all the declarations at the top of a function. That was true once, but
> so long ago as to be irrelevant. Even K&R C allowed declarations at the
> top of a compound statement.
>

It's good to make it clear.

David Brown

unread,

Feb 7, 2024, 9:03:29 AMFeb 7

to

On 07/02/2024 11:09, Michael S wrote:
> On Wed, 7 Feb 2024 08:56:15 +0100
> David Brown <david...@hesbynett.no> wrote:
>
>> On 07/02/2024 00:24, Lawrence D'Oliveiro wrote:
>>> On Tue, 6 Feb 2024 09:50:02 +0100, David Brown wrote:
>>>
>>>> And of course there are those two or three unfortunate people that
>>>> have to work with embedded Windows.
>>>
>>> I thought this has pretty much gone away, pushed aside by Linux.
>>
>> It was never common in the first place, and yes, it is almost
>> entirely non-existent now. I'm sure there are a few legacy products
>> still produced that use some kind of embedded Windows, but few more
>> than that
>> - which is what I was hinting at in my post.
>>
>
> Is there any digital oscilloscope that is not Windows under the hood?

Yes, most that I know of. (There are some older ones that are Windows,
and high-end ones almost never used Windows.)

> How about medical equipment?

A great deal.

> The first question is mostly rhetorical, the second is not.
>

It used to be more common to have embedded Windows. Embedded Linux, and
RTOS's with GUI's (using, for example, QT) have long ago taken over.

There are some hold-outs, of course - no company wants to re-do their
systems and software if they can avoid it, and if they made the bad bet
to use embedded Windows before, they may stick to it.

bart

unread,

Feb 7, 2024, 9:24:42 AMFeb 7

to

Many of the terms used in the C grammar remind me exactly of the 'twisty
little passages' variations from the original text Adventure game.

In my program, I choose to use identifiers that make more sense to me,
and that match my view of how the language works.

David Brown

unread,

Feb 7, 2024, 10:18:09 AMFeb 7

to

On 07/02/2024 14:42, Malcolm McLean wrote:
> On 07/02/2024 13:01, David Brown wrote:
>> On 07/02/2024 09:59, Malcolm McLean wrote:
>>>
>>> The case for minimum scope is the same as the case for scope itself.
>>> The variable is accessible where it is used and not elsewhere, which
>>> makes it less likely it will be used in error, and means there are
>>> fewer names to understand.
>>>
>>
>> It makes code simpler, clearer, easier to reuse, easier to see that it
>> is correct, and easier to see if there is an error. It is very much
>> easier for automatic tools (static warnings) to spot issues.
>>
> This is all true, but only in way. Whilst it's easier to see that there
> are errors in one way, because you have to look at a smaller section of
> code, it's harder in others, for example because that small section is
> more cluttered.

No, it is not - unless you write it very badly.

> From experience with automatic tools, they give too many
> false warnings for correct code, and then programmers often rewrite the
> code less clearly to suppress the warning.

You need to use good tools, and you need to know how to use them. It is
unfortunately the case that some people are poor programmers - they
write bad code, and they don't know how to get the best from their tools.

But is that an excuse for /you/ not to write the best code you can, in
the clearest and most maintainable manner, using the best practical
tools to help catch any errors?

>>> However there are also strong arguments for ducntion scope.
>>
>> Not in my experience and in my opinion.
>>
> That's not a legitimate response. The correct thing to say is "you have
> given a argment there but I don't think it is strong one".

My experience and opinion is that there are no strong arguments in
favour of "all declarations at the top of the function." That is what I
meant to say, and it is a legitimate response.

> Unless you
> are claiming to be experieenced in arguing with people over scope, and I
> donlt think that is what yiu mean to say,
>

/Please/ get a spell checker! Or type more carefully.

>>> A function is a natural unit.
>>
>> True, but irrelevant.
>>
>>> Adn all the varibales used in that unit are listed together and,
>>> ideally, commented.
>>
>> In reality, not commented. And if commented, then commented incorrectly.
>>
> Variable names mean something. The classic name for a variable is "x".
> This usually means either "the value that is given" or "the horizontal
> value on an axis". But it can of ciurse mean "a value which we shall
> calculate that doesn;t have an abvous other name", or even maybe, "the
> nunber of times the letter "x" appears in the data. It depnds on
> context. However the imprtant thing is that x should always mean the
> same thing within the same function.

No.

The important thing is that the purpose of a variable should be clear
within its scope and use. It is completely artificial to suggest it
should be consistent within a function - you could equally well say it
should be consistent within a file, or within a block.

> >
>> Rather than trying to write vague comments to say what something is
>> how it is used, it is better to write the code so that it is clear.
>> Giving variables appropriate names is part of that. For the most
>> part, I'd say if you think a variable needs a comment, your code is
>> not clear enough or has poor structure.
>>
> I prefer short variable names because it is the mathematical convention
> and because it makes complex expressions easier to read. But of course
> then they can't be as meaningful. So to use a short name and add a
> comment is reasonable way to achieve both goals.

Or, far better, use small scopes and then variables can have short names
without comments and be clear.

>>
>> It is /massively/ simpler and clearer to write :
>>
>>      for (int i = 0; i < 10; i++) { ... }
>>
>> than
>>
>>      int i;
>>
>>      /* ... big gap ... */
>>
>>      for (i = 0; i < 10; i++) { ... }
>>
>> It doesn't help if you have "int loop_index;" or add a comment to the
>> variable definition. Putting it at the loop itself is better.
>>
> This pattern is quite common in C.
>
> for (i = 0; i < N; i++)
> if (x[i] == 0)
>      break;
> if (i == N) /* no zero found */
>

If you need to do that, you need a bigger scope for "i". But it would
be insane to use worse code style for 95% of your loops for the 5% (or
less) that need this.

> So you can't scope the counter to the loop.
>
> i is always a loop index. Usually I just out one at the top so it is
> hanging around and handy.

Laziness is not good.

Ben Bacarisse

unread,

Feb 7, 2024, 10:30:28 AMFeb 7

to

David Brown <david...@hesbynett.no> writes:

> On 07/02/2024 11:04, Ben Bacarisse wrote:
>> David Brown <david...@hesbynett.no> writes:
>>
>>> Making some "temp" variables and re-using them was also common for some
>>> people in idiomatic C90 code, where all your variables are declared at the
>>> top of the function.
>> The comma suggests (I think) that it is C90 that mandates that all one's
>> variables are declared at the top of the function. But that's not the
>> case (as I am sure you know).
>
> Yes.
>
>> The other reading -- that this is done in
>> idiomatic C90 code -- is also something that I'd question, but not
>> something that I'd want to argue.
>
> "Idiomatic" is perhaps not the best word. (And "idiotic" is too strong!)
> I mean written in a way that is quite common in C90 code.

The most common meaning of "idiomatic", and the one I usually associate
with it in this context, is "containing expressions that are natural and
correct". That's not how I would describe eschewing declarations in
inner blocks.

--
Ben.

Ben Bacarisse

unread,

Feb 7, 2024, 10:36:30 AMFeb 7

to

bart <b...@freeuk.com> writes:

> On 07/02/2024 10:47, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>
>>> However there are also strong arguments for function scope. A function is a
>>> natural unit. And all the variables used in that unit are listed together
>>> and, ideally, commented. So at a glance you can see what is in scope and
>>> what is being operated on. [typos fixed]
>> You should not need an inventory of what's being operated on. Any
>> function so complex that I can't tell immediately what declaration
>> corresponds to which name needs to be re-written.
>
> But if you keep functions small, eg. the whole body is visible at the same
> time, then there is less need for declarations to clutter up the code. They
> can go at the top, so that you can literally can just glance there.

Declarations don't clutter up the code, just as the code does not
clutter up the declarations. That's just your own spin on the matter.
They are both important parts of a C program.

>>> And there are only three levels of scope. A
>>> varibale is global, or it is file scope, or it is scoped to the
>>> function.
>
>> You are mixing up scope and lifetime. C has no "global scope". A name
>> may have external linkage (which is probably what you are referring to),
>> but that is not directly connected to its scope.
>
> Funny, I use the same definitions of scope:

You can use any definition you like, provided you don't insit that other
use your own terms. I was just pointing out that the problems
associated with using the wrong terms in a public post.

I'll cut the text where you use the wrong terms, because there is
nothing to be gained from correcting your usage.

--
Ben.

Malcolm McLean

unread,

Feb 7, 2024, 10:45:23 AMFeb 7

to

No. It means writing the code in a way which is common in C and has
certain advantages, but is not so in other languages.

Ben Bacarisse

unread,

Feb 7, 2024, 11:13:29 AMFeb 7

to

Malcolm McLean <malcolm.ar...@gmail.com> writes:

> On 07/02/2024 10:47, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>>
>>> On 07/02/2024 07:54, David Brown wrote:
>>>> On 07/02/2024 00:23, Lawrence D'Oliveiro wrote:
>>>>> On Tue, 6 Feb 2024 09:44:20 +0100, David Brown wrote:
>>>>>
>>>>>> They reuse "temp" variables instead of making new ones.
>>>>>
>>>>> I like to limit the scope of my temporary variables. In C, this is as
>>>>> easy
>>>>> as sticking a pair of braces around a few statements.
>>>> Generally, you want to have the minimum practical scope for your local
>>>> variables. It's rare that you need to add braces just to make a scope
>>>> for a variable - usually you have enough braces in loops or conditionals
>>>> - but it happens.
>>>>
>>> The two common patterns are to give each variable the minimum scope, or to
>>> decare all variables at the start of the function and give them all
>>> function scope.
>> The term "function scope" has a specific meaning in C. Only labels have
>> function scope. I know you are not very interested in using exact
>> terms, but some people might like to know the details.
>>
> To explain this, if we have

What is the "this" that you are explaining?

> void function(void)
> {
> int i;
>
> for (i = 0; i < 10;; i++)
> dosomething();
> if ( condition)
> {
> int i;
>
> for (i = 0; i < 11; i++)
> dosomething();
> if (i == 10)
> /* always false */
> }
> }
>
> The first i is not in scope when we test for i == 10 and the test will be
> false. So "fucntion scope" isn't the term.

"function scope" is not the term because only labels have function
scope. This example does not explain anything about the term
"functions scope" -- even why it's the wrong term.

> However if we have this:
>
> void fucntion(void)
> {
> label:
> dosomething();
> if (condition)
> {
> label:
> dosomething();
> }
> got label:

(you mean "goto label;")

> }
>
> Then it is a error. Both labels are in scope and that isn't allowed.

The key thing about the scope of labels is that they can be used before
that are defined:

int *f(int *p)
{
if (!p) goto error:
...
error:
return p;

}

>> Since you want to argue for the peculiar (but common) practice of giving
>> names the largest possible scope (without altering their linkage) you
>> need a term for the outer-most block scope, but "function scope" is
>> taken.
>>
> So "function scope" isn't the correct term. So we need another. I expect
> that at this point someone will jump in and say it must be "Malcolm
> scope". As you say, it's common enough to need a term for it.

I see no reason not to call it "the outer-most block scope".

>>> The case for minimum scope is the same as the case for scope itself.
>> Someone might well misinterpret the term "minimum scope" since it would
>> require adding lots of otherwise redundant braces. I *think* you mean
>> declaring names at the point of first use. The resulting scope is not
>> minimum because it often extends beyond the point of last use.
>
> Yes, I don't mean literally the minimum scope that would be possible by
> artificially ending a block when a variable is used for the last time. No
> one would do that. I mean that the variable is either declared at point of
> first use or, if this isn't allowed because of the C version, at the top of
> the block in which it is used. But also that variables are not reused if in
> fact the value is discarded between statements or especially between
> blocks.
>
>> Other people, not familiar with" modern" C, might interpret the term to
>> mean declaring names at the top of the inner-most appropriate block.
>>
> Top of the block or point of first use?

I don't know what you are asking. I was trying to point out these two
possible meanings for "minimum scope".

>>> The
>>> variable is accessible where it is used and not elsewhere, which makes it
>>> less likely it will be used in error, and means there are fewer names to
>>> understand.
>> The case for declaration at first use is much stronger than this. It
>> almost always allows for a meaningful initialisation at the same point,
>> so the initialisation does not need to be hunted down a checked. For
>> me, this is a big win. (Yes, some people then insist on a dummy
>> initialisation when the proper one isn't know, but that's a fudge that
>> is, to my mind, even worse.)
>>
> If you go for top of block and you don't have a value, you either
> intialise, usually to zero, or leave it wild. Neither is ideal. But it
> rarely makes a big difference. However if you go for policy two, all the
> variables are either given initial values at the top of the function or
> they are not given initial values at the top of the function,and so you can
> easily check, and ensure that all the initial values are consistent woth
> each other.

What?

>> We could call it outer-most block scope rather than re-use a term with
>> an existing, but different, technical meaning.
>>
> The variable has scope within the function, within the whole of the
> function, and the motive is that the function is the natural unit of
> thought. So I think we need the word "function".

You need the word function. I don't.

--
Ben.

Keith Thompson

unread,

Feb 7, 2024, 11:21:37 AMFeb 7

to

Malcolm McLean <malcolm.ar...@gmail.com> writes:
> On 07/02/2024 10:47, Ben Bacarisse wrote:

[...]

>> Since you want to argue for the peculiar (but common) practice of giving
>> names the largest possible scope (without altering their linkage) you
>> need a term for the outer-most block scope, but "function scope" is
>> taken.
>>
> So "function scope" isn't the correct term. So we need another. I
> expect that at this point someone will jump in and say it must be
> "Malcolm scope". As you say, it's common enough to need a term for it.

Please, no, not "Malcolm scope". That's the kind of thing that gets
suggested as a last resort, or as a joke, when you insist on using
existing terminology with your own idiosyncratic meaning.

"Outermost block scope" is a clear and correct description of what
you're talking about. Though what you're probably talking about is
outermost block scope before any statements. Or just "at the top of the
function definition".

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Scott Lurndal

unread,

Feb 7, 2024, 11:21:38 AMFeb 7

to

Malcolm McLean <malcolm.ar...@gmail.com> writes:
>On 07/02/2024 07:54, David Brown wrote:
>> On 07/02/2024 00:23, Lawrence D'Oliveiro wrote:
>>> On Tue, 6 Feb 2024 09:44:20 +0100, David Brown wrote:
>>>
>>>> They reuse "temp" variables instead of making new ones.
>>>
>>> I like to limit the scope of my temporary variables. In C, this is as
>>> easy
>>> as sticking a pair of braces around a few statements.
>>
>> Generally, you want to have the minimum practical scope for your local
>> variables. It's rare that you need to add braces just to make a scope
>> for a variable - usually you have enough braces in loops or conditionals
>> - but it happens.
>>
>The two common patterns are to give each variable the minimum scope, or
>to decare all variables at the start of the function and give them all
>function scope.
>
>The case for minimum scope is the same as the case for scope itself. The
>variable is accessible where it is used and not elsewhere, which makes
>it less likely it will be used in error, and means there are fewer names
>to understand.

And it means the compiler can re-use the local storage (if any was
allocated) for subsequent minimal scope variables (or even same scope
if the compiler knows the original variable is never used again),
so long as the address of the variable isn't taken.

Scott Lurndal

unread,

Feb 7, 2024, 11:26:00 AMFeb 7

to

Wind river is still popular, I believe, but the linux kernel + busybox is
probably the most common.

bart

unread,

Feb 7, 2024, 1:05:48 PMFeb 7

to

On 07/02/2024 15:36, Ben Bacarisse wrote:
> bart <b...@freeuk.com> writes:
>
>> On 07/02/2024 10:47, Ben Bacarisse wrote:
>>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>>
>>>> However there are also strong arguments for function scope. A function is a
>>>> natural unit. And all the variables used in that unit are listed together
>>>> and, ideally, commented. So at a glance you can see what is in scope and
>>>> what is being operated on. [typos fixed]
>>> You should not need an inventory of what's being operated on. Any
>>> function so complex that I can't tell immediately what declaration
>>> corresponds to which name needs to be re-written.
>>
>> But if you keep functions small, eg. the whole body is visible at the same
>> time, then there is less need for declarations to clutter up the code. They
>> can go at the top, so that you can literally can just glance there.
>
> Declarations don't clutter up the code, just as the code does not
> clutter up the declarations. That's just your own spin on the matter.
> They are both important parts of a C program.

That sounds like your opinion against mine. It's nothing to do with
spin, whatever that means.

I would argue however that it you take a clear, cleanly written
language-neutral algorithm, and then introduce type annotations /within/
that code rather than segragated, then it is no longer quite as clear or
as clean looking.

As a related example, suppose you had this function:

void F(int a, double* b) {...}

All the parameters are specified with their names and types at the top.
Now imagine if only the names were given, but the types specified only
at their first usage within the body:

void F(a, b) {...}

Now you no longer have an instant picture of the interface to the
function. The declarations could also be shadowed within the body, so
you can't tell whether a definition for 'a' refers to a parameter
without checking for definitions in an outer scope.

Imagine further than even the parameter names were specified within the
body ...

I /like/ having a summary of both parameters and locals at the top. I
/like/ code looking clean, and as aligned as possible (some decls will
push code to the right). I /like/ knowing that there is only one
instance of a variable /abc/, and it is the one at the top.

So it might be my opinion but also my preference.

>>>> And there are only three levels of scope. A
>>>> varibale is global, or it is file scope, or it is scoped to the
>>>> function.
>>
>>> You are mixing up scope and lifetime. C has no "global scope". A name
>>> may have external linkage (which is probably what you are referring to),
>>> but that is not directly connected to its scope.
>>
>> Funny, I use the same definitions of scope:
>
> You can use any definition you like, provided you don't insit that other
> use your own terms. I was just pointing out that the problems
> associated with using the wrong terms in a public post.
>
> I'll cut the text where you use the wrong terms, because there is
> nothing to be gained from correcting your usage.

That's a shame. I think there is something to be gained by not sticking
slavishly to what the C standard says (which very few people will study)
and using more colloquial terms or ones that more can relate to.

Apparently both 'typedef' and 'static' are forms of 'linkage'. But no
identifiers declared with those will ever be linked to anything!

Scott Lurndal

unread,

Feb 7, 2024, 1:26:21 PMFeb 7

to

Now imagine if the moon was made from green cheese. It's just as
likely, and neither are C.

bart

unread,

Feb 7, 2024, 2:54:08 PMFeb 7

to

On 07/02/2024 18:26, Scott Lurndal wrote:
> bart <b...@freeuk.com> writes:
>> On 07/02/2024 15:36, Ben Bacarisse wrote

>>> Declarations don't clutter up the code, just as the code does not
>>> clutter up the declarations. That's just your own spin on the matter.
>>> They are both important parts of a C program.
>>
>>
>> That sounds like your opinion against mine. It's nothing to do with
>> spin, whatever that means.
>>
>> I would argue however that it you take a clear, cleanly written
>> language-neutral algorithm, and then introduce type annotations /within/
>> that code rather than segragated, then it is no longer quite as clear or
>> as clean looking.
>>
>> As a related example, suppose you had this function:
>>
>> void F(int a, double* b) {...}
>>
>> All the parameters are specified with their names and types at the top.
>> Now imagine if only the names were given,
>
> Now imagine if the moon was made from green cheese. It's just as
> likely, and neither are C.

It's perfectly possible as an extension. Old C had something similar
that was halfway there.

But it was a hypothetical illustration to elicit a response to this
question: would it make harder to easier to understand what the function
is doing?

Because it is related to whether the locals used by a function are
declared all at the top, or buried within the code at random places.

BTW I've just done a quick survey of some codebases; functions tend to
have 3 local variables on average.

Is really worth spreading them out in nested block scopes?

Here is a histogram for tcc.c: the first column is how many locals, and
the second is how many functions with that number:

0 161
1 118
2 73
3 42
4 29
5 15
6 12
7 14
8 11
9 6
10 9
11 6
12 3
13 5
14 3
16 4
17 1
18 2
19 2
20 1
21 2
25 1
27 1
31 1
32 1
33 1
35 1

In one of my own programs, 92% of functions have 6 locals or fewer. (The
figures includes extra temporary locals created as part of the
transpilation to C.)

David Brown

unread,

Feb 7, 2024, 3:31:12 PMFeb 7

to

OK, I suppose. But if you want to talk about C with other people, it
makes sense to use the same terms they are using, in the same way.

I can certainly agree that there are bits of the C standards that are
not as clear as I would like. The definitions of scope are not one of
those parts.

David Brown

unread,

Feb 7, 2024, 3:37:44 PMFeb 7

to

On 07/02/2024 19:05, bart wrote:

> That's a shame. I think there is something to be gained by not sticking
> slavishly to what the C standard says (which very few people will study)
> and using more colloquial terms or ones that more can relate to.

There is something to be said for explaining the technical terms from
the C standards in more colloquial language to make it easier for others
to understand. There is nothing at all to be said for using C standard
terms in clearly and obviously incorrect ways. That's just going to
confuse these non-standard-reading C programmers when they try to find
out more, no matter where they look for additional information.

>
> Apparently both 'typedef' and 'static' are forms of 'linkage'. But no
> identifiers declared with those will ever be linked to anything!

Could you point to the paragraph of the C standards that justifies that
claim? Or are you perhaps mixing things up? (I can tell you the
correct answer, with references, if you are stuck - but I'd like to give
you the chance to show off your extensive C knowledge first.)

David Brown

unread,

Feb 7, 2024, 3:44:40 PMFeb 7

to

Some people do feel it is more "natural" to have all their declarations
at the start of their functions (and never declare variables in any
inner block scopes). It's common, and their code can be correct. You
and I both think there are usually better ways to structure code, but
does that mean it is not "idiomatic" ? I'm not sure there is a good
answer here. Unfortunately the C standards don't define the term
"idiomatic" :-(

If you can think of a better term to use here, I'd be happy to hear it -
otherwise I think we all know the kind of code structure I meant, which
was the most important point.

>>
> No. It means writing the code in a way which is common in C and has
> certain advantages, but is not so in other languages.

An idiom in C could also be an idiom in C++, Python, or any other
language. Nothing in "idiomatic" implies that it is unique to a
particular language, just that it is commonly used in that language.

David Brown

unread,

Feb 7, 2024, 3:50:05 PMFeb 7

to

VxWorks, you mean? Yes, that is still used in what might be called
"big" embedded systems. There are other RTOS's that have been common
for embedded systems with screens (and no one would bother with embedded
Windows without a screen!), including QNX, Integrity, eCOS, and Nucleus.

(There are many small RTOS's, but they are competing in a different field.)

Chris M. Thomasson

unread,

Feb 7, 2024, 4:04:38 PMFeb 7

to

Fwiw, I think the last one I used was quadros a long time ago.

Lawrence D'Oliveiro

unread,

Feb 7, 2024, 4:34:50 PMFeb 7

to

On Wed, 7 Feb 2024 14:01:27 +0100, David Brown wrote:

> It makes code simpler, clearer, easier to reuse, easier to see that it
> is correct, and easier to see if there is an error. It is very much
> easier for automatic tools (static warnings) to spot issues.

Here’s an example of how granular I like to make my scopes:

struct pollfd topoll[MAX_WATCHES + 1];
int total_timeout = -1; /* to begin with */
for (int i = 0; i < nr_watches; ++i)
{
DBusWatch * const watch = watches[i];
struct pollfd * const entry = topoll + i;
entry->fd = dbus_watch_get_unix_fd(watch);
entry->events = 0; /* to begin with */
if (dbus_watch_get_enabled(watch))
{
const int flags = dbus_watch_get_flags(watch);
if ((flags & DBUS_WATCH_READABLE) != 0)
{
entry->events |= POLLIN | POLLERR;
} /*if*/
if ((flags & DBUS_WATCH_WRITABLE) != 0)
{
entry->events |= POLLOUT | POLLERR;
} /*if*/
} /*if*/
} /*for*/
{
struct pollfd * const entry = topoll + nr_watches;
entry->fd = notify_receive_pipe;
entry->events = POLLIN;
}
for (int i = 0; i < nr_timeouts; ++i)
{
DBusTimeout * const timeout = timeouts[i];
if (dbus_timeout_get_enabled(timeout))
{
const int interval = dbus_timeout_get_interval(timeout);
if (total_timeout < 0 or total_timeout > interval)
{
total_timeout = interval;
} /*if*/
} /*if*/
} /*for*/
const long timeout_start = get_milliseconds();
bool got_io;
{
const int sts = poll(topoll, nr_watches + 1, total_timeout);
fprintf(stderr, "poll returned status %d\n", sts);
if (sts < 0)
{
perror("doing poll");
die();
} /*if*/
got_io = sts > 0;
}

Michael S

unread,

Feb 7, 2024, 4:37:21 PMFeb 7

to

On Wed, 7 Feb 2024 21:49:52 +0100
David Brown <david...@hesbynett.no> wrote:

> On 07/02/2024 17:25, Scott Lurndal wrote:
> > David Brown <david...@hesbynett.no> writes:
> >> On 07/02/2024 00:24, Lawrence D'Oliveiro wrote:
> >>> On Tue, 6 Feb 2024 09:50:02 +0100, David Brown wrote:
> >>>
> >>>> And of course there are those two or three unfortunate people
> >>>> that have to work with embedded Windows.
> >>>
> >>> I thought this has pretty much gone away, pushed aside by Linux.
> >>
> >> It was never common in the first place, and yes, it is almost
> >> entirely non-existent now. I'm sure there are a few legacy
> >> products still produced that use some kind of embedded Windows,
> >> but few more than that
> >> - which is what I was hinting at in my post.
> >
> > Wind river is still popular, I believe, but the linux kernel +
> > busybox is probably the most common.
>
> VxWorks, you mean? Yes, that is still used in what might be called
> "big" embedded systems. There are other RTOS's that have been common
> for embedded systems with screens (and no one would bother with
> embedded Windows without a screen!),

Then our company and me personally are no-ones 1.5 times.

The first time it was WinCE on small Arm-based board that served as
Ethernet interface and control plane controller for big boards that
was an important building blocks for very expensive industrial
equipment. Equipment as whole was not ours, we were sub-contractor for
this particular piece. This instance of Windows never ever had display
or keyboard.
We still make few boards per year more than 15 years later.

The second one was/is [part of] our own product, a regular Windows
Embedded, starting with XP, then 7, then 10. It runs on SBC that
functions as a host of Compact PCI frame with various I/O boards mostly
of our own making. SBC does both control plane and partial data plane
processing and handles Ethernet communication with the rest of the
system. It's completely different industry, the system as a whole not
nearly as expensive as the first one, but still expensive enough for
this particular computer to be small part of the total cost.
The system does have connectors for display, keyboard and mouse.
Ssometimes it is handy to connect them during manufacturing testing. But
they are never connected in fully assembled product. However since they
exist, with relation to this system I count myself as half-no-one
rather than full no-one.

Lawrence D'Oliveiro

unread,

Feb 7, 2024, 4:38:16 PMFeb 7

to

On Wed, 7 Feb 2024 19:53:53 +0000, bart wrote:

> BTW I've just done a quick survey of some codebases; functions tend to
> have 3 local variables on average.
>
> Is really worth spreading them out in nested block scopes?

If you write “average” functions, you know what the answer is.

Some of us don’t write “average” functions.

Lawrence D'Oliveiro

unread,

Feb 7, 2024, 4:42:00 PMFeb 7

to

On Wed, 07 Feb 2024 16:25:45 GMT, Scott Lurndal wrote:

> ... the linux kernel + busybox is probably the most common.

That “Ingenuity” Mars helicopter, that recently met its end after breaking
a rotor blade, ran Linux and other open-source software.

It was very much an afterthought project, a proof of concept, only meant
to last maybe 30 days. It ended up making dozens of flights over 3 years.

bart

unread,

Feb 7, 2024, 5:52:36 PMFeb 7

to

* The standard talks a lot about Linkage but there are no specific
lexical elements for those.

* Instead the standard uses lexical elements called 'storage-class
specifiers' to control what kind of linkage is applied to identifiers

* Because of this association, I use 'linkage symbol' to refer to those
particular tokens

* The tokens include 'typedef extern static'

6.2.2p3 says: "If the declaration of a file scope identifier for an
object or a function contains the storageclass specifier static, the
identifier has internal linkage."

So it talks about statics as having linkage of some kind. What did I
say? I said statics will never be linked to anything.

6.6.2p6 excludes typedefs (by omission). Or rather it says they have 'no
linkage', which is one of the three kinds of linkage (external,
internal, none).

So as far as I can see, statics and typedef are still lumped in to the
class of entities that have a form of linkage, and are part of the set
of tokens that control linkage.

---------------------------------------------------

This is to me is all a bit mixed up. Much as you dislike other languages
being brought in, they can give an enlightening perspective.

So for me, linking applies to all named entities that occupy memory, and
that have global/export scope.

But global/export scope applies also to all other named entities,
whether they occupy memory or not. I can show that here in in this chart:

M Scope? M Link? C Linkage?

Function names Y Y Y (internal/external)

Variable names Y Y Y (internal/external)

Enum names Y N ??

Named constants Y N --

Type names Y N Y (none)

Macro names Y N ??

Module names Y N --

(Type names include C's struct tags. Enum tags are not listed.)

In the M language, ALL user identifiers declared at file scope can be
imported and exported automatically by the language across modules.

This is the primary control method for visibility.

There is a special mechanism to import into a program/library, or export
from one. This is the only place linkage comes up, where those names
need to appear in EXE, DLL and OBJ file formats. Only functions and
variables (entities that have an address) are involved.

In the C column, ?? are identifiers that usually can't apppear in a
declaration with a storage class. And -- is for things not meaningful in C.

Kaz Kylheku

unread,

Feb 7, 2024, 6:24:31 PMFeb 7

to

On 2024-02-07, bart <b...@freeuk.com> wrote:
> On 05/02/2024 05:58, Kaz Kylheku wrote:
>> On 2024-02-05, bart <b...@freeuk.com> wrote:
>
>> Writing a compiler is pretty easy, because the bar can be set very low
>> while still calling it a compiler.
>
>> Whole-program compilers are easier because there are fewer requirements.
>> You have only one kind of deliverable to produce: the executable.
>> You don't have to deal with linkage and produce a linkable format.
>
> David Brown suggested that they were harder than I said. You're saying
> they are easier.

I'm saying it's somewhat easier to make a compiler which produces an
object file than to produce a compiler that produces object files *and*
a linker that combines them.

There is all that code you don't have to write to produce object files,
read them, and link them. You don't have to solve the problem of how to
represent unresolved references in an externalized form in a file.

David made it clear he was referring to whole program optimization.

>> GCC is maintained by people who know what a C compiler is, and GCC can
>> be asked to be one.
>
> So what is it when it's not a C compiler? What language is it compiling
> here:
>
> c:\qx>gcc qc.c
> c:\qx>

Yes, sorry. It is compiling C also: a certain revision of GNU C,
which is family of dialects in the C family.

> Mine at least is a more rigid subset.

Rigid? Where this subset it documented, other than in the code?

GNU C is documented, and tested.

>> Your idea of writing a C compiler seems to be to pick some random
>> examples of code believed to be C and make them work. (Where "work"
>> means that they compile and show a few behaviors that look like
>> the expected ones.)
>
> That's what most people expect!

That's may be verbal way of expressing what a lot of developers
want, but it it has to be carefully interpreted to avoid a fallacy.

"Most people" expect the C compiler to work on /their/ respective code
they care about, which is different based on who you ask. The more
people you include in a sample of "most people", the more code that is.

Most people don't just expect a compiler to work on /your/ few examples.

>> Basically, you don't present a very credible case that you've actually
>> written a C compiler.
>
> Well, don't believe it if you don't want.

Oh I want to believe; I just can't do that which I want, without
proper evidence.

Do you have a reference manual for your C dialect, and is it covered by
tests? What programs and constructs are required to work in your C dialect?
What are required to be diagnosed? What is left undefined?

If you make changes to the compiler which accidentally cause it to stray
from the dialect, how are you informed?

> The NASM.EXE program is bit larger at 1.3MB for example, that's 98.7%
> smaller than your giant program.

That's amazingly large for an assembler. Is that stripped of debug info?

> I mean, where is YOUR lower-level system language? Where is anybody's? I
> don't mean the Zigs and Rusts because that would be like comparing a
> 40-tonne truck with a car.

I'm not interested in working on lower-level systems languages.

I work on a the implementation of a Lisp dialect.

As far as low-level systems goes, I'm quite satisfied with the C
language and its mainstream implementations.

>> Compilers that blaze through large amounts of code in the blink of an
>> eye are almost certainly dodging on the optimization.
>
> Yes, probably. But the optimisation is overrated. Do you really need
> optimised code to test each of those 200 builds you're going to do today?

Yes, because of the principle that you should test what you ship.

>>> * There is no benefit at all in having a tool like a compiler, be a
>>> small, self-contained executable.
>>
>> Not as much as there used to, decades ago.
>
> Simplicity is always good. Somebody deletes one of the 1000s of files of
> your gcc installation. Is it something that is essential? Who knows.

That someone will have to hack the superuser account, since those
files are writable only to root, sitting in directories that are
writable only to root.

You will know when someting complains about the file not being found.

(A problem will arise if the file is part of a search, such that another
file will be found if that one is missing.)

> But if your compiler is the one file mm.exe, it's easy to spot if it's
> missing!

What if a bit randomly flips in mm.exe? Is it in a byte that is
essential? Who knows ... I don't see where this is going.

Sofware installations big and small can be damaged.

It seems disadvantageous for a compiler to have no satellite files. If
you have to fix something in <stdlib.h>, and that's buried in the
executable, you have to roll out a whole new mm.exe. The user has to
believe you when you say that you changed nothing else.

If the satellite files are kept reasonably small in number, and not
proliferated throughout a complex tree, that could be a good thing.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazi...@mstdn.ca

Malcolm McLean

unread,

Feb 7, 2024, 7:29:17 PMFeb 7

to

Most functions are short and trivial. But those functions tend to be
easy to understnad and unlikely to have bugs. What matters is how you
write the longer functions.

Malcolm McLean

unread,

Feb 7, 2024, 7:33:56 PMFeb 7

to

On 07/02/2024 20:44, David Brown wrote:
> On 07/02/2024 16:45, Malcolm McLean wrote:
>> On 07/02/2024 15:30, Ben Bacarisse wrote:
>>>
>>> The most common meaning of "idiomatic", and the one I usually associate
>>> with it in this context, is "containing expressions that are natural and
>>> correct". That's not how I would describe eschewing declarations in
>>> inner blocks.
>>>

>> No. It means writing the code in a way which is common in C and has
>> certain advantages, but is not so in other languages.
>
> An idiom in C could also be an idiom in C++, Python, or any other
> language. Nothing in "idiomatic" implies that it is unique to a
> particular language, just that it is commonly used in that language.
>

We must be able to point to at least one other language where it is not
the idiom, in order to say that it is an idiom.

Kaz Kylheku

unread,

Feb 7, 2024, 8:13:35 PMFeb 7

to

On 2024-02-07, bart <b...@freeuk.com> wrote:

> * The standard talks a lot about Linkage but there are no specific
> lexical elements for those.

Yes; linkage doesn't have a dedicated phrase element in the syntax.

> * Instead the standard uses lexical elements called 'storage-class
> specifiers' to control what kind of linkage is applied to identifiers

Yes. "storage-class specifier" is just a syntactic category, a "part of
speech" of C.

Not everything that is syntactically a "storage class specifier"
determines the kind of storage for an object.

It's not really a great situation and it has gotten worse with the
introduction of new storage class keywords for alignment and whatnot.

> * Because of this association, I use 'linkage symbol' to refer to those
> particular tokens

Your one term for everything is equally flawed and just gratuitously
different.

> * The tokens include 'typedef extern static'

> 6.2.2p3 says: "If the declaration of a file scope identifier for an
> object or a function contains the storageclass specifier static, the
> identifier has internal linkage."

Yes. If it's a function it has no storage class. If it's an object
at file scope, its storage duration is static. The concept of storage
class doesn't apply to anything at file scope, in other words.

The syntax is instead abused for determining linkage.

> So it talks about statics as having linkage of some kind. What did I
> say? I said statics will never be linked to anything.

file scope statics have internal linkage for the reason that they
are allowed to be multiply defined. You're thining of linkage as some
object-file-level resolution mechanism.

In C, linkage just refers to the situation when multiple declarations of
an identifier are permitted, and refer to a single entity according
to some rule.

At file scope "static int x;" has linkage because you can
have a situation like this:
things liek this:

static int x; // declaration / tentative definition

void foo(void)
{
int x;
{
extern int x; // links to the file scope static
}
}

static int x; // declaration / tentative definition

static int x = 42; // definition

The linkage is internal because all these occurrences of x do not
refer to a file scope x in another translation unit.

This internal linkage is not necessarily handled by a linker. Since it
happens in one translation unit, the compiler can take care of it so
that the resulting object file has resolved all the internal linkage;
then the linkage of multiple translation units into a single program
only deals with external linkage. That can make external linkage appear
more "real".

> 6.6.2p6 excludes typedefs (by omission). Or rather it says they have 'no
> linkage', which is one of the three kinds of linkage (external,
> internal, none).
>
> So as far as I can see, statics and typedef are still lumped in to the
> class of entities that have a form of linkage, and are part of the set
> of tokens that control linkage.

No; typedef is just the "part of C speech" called "storage class
specifier". When a declaration has "typedef" storage class, it's
understood as defining a typedef name in that scope: file scope
or lexical.

The phrase element called "storage class" serves as a general "command
verb" kind of thing in the declaration (which may be omitted).

It should be renamed accordingly. Maybe "declaration kind" or
"declaration category" or "declaration class" or what have you.

Mainly, get rid of the word "storage".

Good names for entities are important. Sometimes the systems we use
don't get them right.

The naming of "storage class" is less important than the multiple
meanings of static and whatnot. Unlike grammar terminology, that can't
be fixed without breaking programs.

> This is to me is all a bit mixed up. Much as you dislike other languages
> being brought in, they can give an enlightening perspective.

Right, nobody here knows anything outside of C, or can think outside of
the C box, except for you.

You're the newsgroup's Prometheus.

The vultures eat your liver daily and everything.

Kaz Kylheku

unread,

Feb 7, 2024, 8:30:39 PMFeb 7

to

There are two meanings of *idiom*.

The "strong" meaning of idiom is that it's a meaning arbitrarily
assigned to a canned combination of words which otherwise make no
sense or are ungrammatical.

The "weak" meaning refers to some often used phrase.

Your proposed rule has a logical flaw because it requires us to confirm
that something is not an idiom in order to confirm that it is.
Even though that is in separate languages, it is still a problem.

Suppose that that all the literal translations of some phrase X into all
known languages are naively considered idioms by all their respective
speakers.

Then according to your criterion, none of the languages have the right
to consider it to be an idiom.

Suppos that English speakers are the first to realize this problem, and
choose to stop considering the English translation of X an idiom.

At that point, the other remaining languages may keep it as an idiom:
their speakers can now point to at least one language where it isn't.

Problem is, the choice of which group must stop treating it as an idiom,
so that the others may, is arbitrary.

This is not a well-founded definition for a term!

Lawrence D'Oliveiro

unread,

Feb 7, 2024, 8:38:25 PMFeb 7

to

On Thu, 8 Feb 2024 00:33:35 +0000, Malcolm McLean wrote:

> We must be able to point to at least one other language where it is not
> the idiom, in order to say that it is an idiom.

How about pointing to alternative ways it might be said in the same
language, and then proclaiming that “for some reason, nobody who uses the
language is supposed to do it that way”?

bart

unread,

Feb 7, 2024, 8:47:12 PMFeb 7

to

On 07/02/2024 23:24, Kaz Kylheku wrote:
> On 2024-02-07, bart <b...@freeuk.com> wrote:
>> On 05/02/2024 05:58, Kaz Kylheku wrote:
>>> On 2024-02-05, bart <b...@freeuk.com> wrote:
>>
>>> Writing a compiler is pretty easy, because the bar can be set very low
>>> while still calling it a compiler.
>>
>>> Whole-program compilers are easier because there are fewer requirements.
>>> You have only one kind of deliverable to produce: the executable.
>>> You don't have to deal with linkage and produce a linkable format.
>>
>> David Brown suggested that they were harder than I said. You're saying
>> they are easier.
>
> I'm saying it's somewhat easier to make a compiler which produces an
> object file than to produce a compiler that produces object files *and*
> a linker that combines them.

Is there a 'than' missing above? Otherwise it's contradictory.

> There is all that code you don't have to write to produce object files,
> read them, and link them. You don't have to solve the problem of how to
> represent unresolved references in an externalized form in a file.

Programs that generate object files usually invoke other people's linkers.

But your comments are simplistic. EXE formats can be as hard to generate
as OBJ files. You still have to resolve the dynamic imports into an EXE.

You need to either have a ready-made language designed for whole-program
work, or you need to devise one.

Plus, if the minimal compilation unit is now all N source modules of a
project rather than just 1 module, then you'd better have a pretty fast
compiler, and some strategies for dealing with scale.

If your project involves only OBJ format, then you can also choose to
devise your own simple file format, then linking is a trivial though
fiddly task.

> David made it clear he was referring to whole program optimization.
>
>>> GCC is maintained by people who know what a C compiler is, and GCC can
>>> be asked to be one.
>>
>> So what is it when it's not a C compiler? What language is it compiling
>> here:
>>
>> c:\qx>gcc qc.c
>> c:\qx>
>
> Yes, sorry. It is compiling C also: a certain revision of GNU C,
> which is family of dialects in the C family.
>
>> Mine at least is a more rigid subset.
>
> Rigid? Where this subset it documented, other than in the code?

In early versions of the compiler, there was a specification. Now, it's
a personal tool and I don't bother. So shoot me.

> GNU C is documented, and tested.
>
>>> Your idea of writing a C compiler seems to be to pick some random
>>> examples of code believed to be C and make them work. (Where "work"
>>> means that they compile and show a few behaviors that look like
>>> the expected ones.)
>>
>> That's what most people expect!
>
> That's may be verbal way of expressing what a lot of developers
> want, but it it has to be carefully interpreted to avoid a fallacy.
>
> "Most people" expect the C compiler to work on /their/ respective code
> they care about, which is different based on who you ask. The more
> people you include in a sample of "most people", the more code that is.
>
> Most people don't just expect a compiler to work on /your/ few examples.

A C compiler that works on any arbitrary existing code is years of
effort. That's hard enough to achieve even with better and more mature
products than mine.

>>> Basically, you don't present a very credible case that you've actually
>>> written a C compiler.
>>
>> Well, don't believe it if you don't want.
>
> Oh I want to believe; I just can't do that which I want, without
> proper evidence.
>
> Do you have a reference manual for your C dialect, and is it covered by
> tests? What programs and constructs are required to work in your C dialect?
> What are required to be diagnosed? What is left undefined?

So no one can claim to write a 'C' compiler unless it does everything as
well as gcc which started in 1987, has had hordes of people working with
it, and has had feedback from myriads of users?

I had some particular aims with my project, most of which were achieved,
boxes ticked.

>> The NASM.EXE program is bit larger at 1.3MB for example, that's 98.7%
>> smaller than your giant program.
>
> That's amazingly large for an assembler. Is that stripped of debug info?

The as.exe assembler for gcc/TDM 10.3 is 1.8MB. For mingw 13.2 it was 1.5MB.

Mine is about 100KB, but it covers a subset of x86 opcodes and outputs
only a limited number of formats.

But the size of NASM is not an issue; it's an example of modestly sized
application which seem rare. People here claim their apps are always so
massive and complicated that a 'toy' compiler will never work.

>> Yes, probably. But the optimisation is overrated. Do you really need
>> optimised code to test each of those 200 builds you're going to do today?
>
> Yes, because of the principle that you should test what you ship.

Then you're being silly. You're not shipping build#139 of 200 that day,
not even #1000 that week. You're debugging a logic bug that is nothing
to do with optimisation.

bart

unread,

Feb 7, 2024, 9:09:16 PMFeb 7

to

Well, quite. AFAIK, nobody here HAS (1) used a comparable language to C;
(2) over such a long term; (3) which they have invented themselves; (4)
have implemented themselves; (5) is similar enough to C yet different
enough in how it works to give that perspective.

See, I gave an interesting comparison of how my module scheme works
orthogonally across all kinds of entities, compared with the confusing
mess of C, and you shut down that view.

You're never in a million years going to admit that my language has some
good points are you? Exactly as I said in my OP.

So what's the rule here, that people can only think INSIDE the C box?

Is the point of this group only to show off your master knowledge of C,
or the ins and outs of 300 kinds of Linux systems?

Malcolm McLean

unread,

Feb 7, 2024, 9:22:10 PMFeb 7

to

So how do you say "My French is lousy" in idomatic French?
Accroding to a Frenchman, it is "Doucement. Le Francais n'est pas ma
langue maternelle."
Now that means literally "Softly. The French is not my maternal language".
You wouldn't say that in English. You'd say "go easy" instead of
"softly". It would be "French" rather than "the French". And whilst you
might say "maternal language" it would be rare. Normally it would be
"native language".
So the French has one idiom and the English another, and we say things
in a slightly different way. What is the convention in one is not so in
the other, and that is what makes it idiom.

And of course the Frenchman made the point that whilst his information
was correct, to actually use his translation would be self-refuting.

Kaz Kylheku

unread,

Feb 7, 2024, 9:50:28 PMFeb 7

to

On 2024-02-08, bart <b...@freeuk.com> wrote:
> On 07/02/2024 23:24, Kaz Kylheku wrote:
>> On 2024-02-07, bart <b...@freeuk.com> wrote:
>>> On 05/02/2024 05:58, Kaz Kylheku wrote:
>>>> On 2024-02-05, bart <b...@freeuk.com> wrote:
>>>
>>>> Writing a compiler is pretty easy, because the bar can be set very low
>>>> while still calling it a compiler.
>>>
>>>> Whole-program compilers are easier because there are fewer requirements.
>>>> You have only one kind of deliverable to produce: the executable.
>>>> You don't have to deal with linkage and produce a linkable format.
>>>
>>> David Brown suggested that they were harder than I said. You're saying
>>> they are easier.
>>
>> I'm saying it's somewhat easier to make a compiler which produces an
>> object file than to produce a compiler that produces object files *and*

^^^^

>> a linker that combines them.
>
> Is there a 'than' missing above? Otherwise it's contradictory.

Other "than" that one? Hmm.

>> There is all that code you don't have to write to produce object files,
>> read them, and link them. You don't have to solve the problem of how to
>> represent unresolved references in an externalized form in a file.
> Programs that generate object files usually invoke other people's linkers.
>
> But your comments are simplistic. EXE formats can be as hard to generate
> as OBJ files. You still have to resolve the dynamic imports into an EXE.

Generating just the EXE format is objectively less work than generating
OBJ files and linking them into that ... same EXE format, right?

> You need to either have a ready-made language designed for whole-program
> work, or you need to devise one.
>
> Plus, if the minimal compilation unit is now all N source modules of a
> project rather than just 1 module, then you'd better have a pretty fast
> compiler, and some strategies for dealing with scale.

Easy; just drop language conformance, diagnostics, optimization.

>>> Well, don't believe it if you don't want.
>>
>> Oh I want to believe; I just can't do that which I want, without
>> proper evidence.
>>
>> Do you have a reference manual for your C dialect, and is it covered by
>> tests? What programs and constructs are required to work in your C dialect?
>> What are required to be diagnosed? What is left undefined?
>
> So no one can claim to write a 'C' compiler unless it does everything as
> well as gcc which started in 1987, has had hordes of people working with
> it, and has had feedback from myriads of users?

Nope; unless it is documented so that there is a box, where it says what
is in the box, and some way to tell that what's on the box is in the
box.

> I had some particular aims with my project, most of which were achieved,
> boxes ticked.
>
>>> The NASM.EXE program is bit larger at 1.3MB for example, that's 98.7%
>>> smaller than your giant program.
>>
>> That's amazingly large for an assembler. Is that stripped of debug info?
>
> The as.exe assembler for gcc/TDM 10.3 is 1.8MB. For mingw 13.2 it was 1.5MB.

"as" on Ubuntu 18, 32 bit.

$ size /usr/bin/i686-linux-gnu-as
text data bss dec hex filename
430315 12544 37836 480695 755b7 /usr/bin/i686-linux-gnu-as

Still pretty large. Always use the "size" utility, rather than raw
file size. This has 430315 bytes of code, 12544 of non-zero static data, 37836
bytes of zeroed data (not part of the executable size).

That's still large for an assembler, but at least it's not larger
than GNU Awk.

>>> Yes, probably. But the optimisation is overrated. Do you really need
>>> optimised code to test each of those 200 builds you're going to do today?
>>
>> Yes, because of the principle that you should test what you ship.
>
> Then you're being silly. You're not shipping build#139 of 200 that day,

If I make a certain change for build #139, and that part of the code
(function or entire source file) is not touched until build #1459 which
ships, that compiled code remains the same! So in fact, the #139 version of
that code is what build #1459 ships with. That code is being tested as part of
#140, #141, #142, ... even while some other things are changing.

You should not be doing all your development and developer testing with
unoptimized builds so that only Q&A people test optimized code before
shipping.

Every test, even of a private build, is a potential opportunity to find
something wrong with some optimized code that would end up shipping
otherwise.

Here is another reason to work with optimized code. If you have to debug
at the machine language level, optimized code is shorter and way more readable.
And it can help you understand logic bugs, because the compiler performs
logical analysis in doing optimizations. The optimized code shows you what your
calculation reduced to, and can even help you see a better way of writing the
code, like a tutor.

> not even #1000 that week. You're debugging a logic bug that is nothing
> to do with optimisation.

Though debugging logic bugs that have nothing to do with optimization can be
somewhat impeded by optimization, it's still better to prioritize working with
the code in the intended shipping state.

You can drop to an unoptimized build when necessary.

Pretty much that only happens when

1. It is just a logic bug, but you have to resort to a debugger, and
the optimizations are interfering with being able to see variable values.)

2. You suspect it does have to do with optimization, so you see if it
the issue goes away in the unoptimized build.

Kaz Kylheku

unread,

Feb 7, 2024, 10:07:24 PMFeb 7

to

On 2024-02-08, bart <b...@freeuk.com> wrote:

> On 08/02/2024 01:13, Kaz Kylheku wrote:
>> On 2024-02-07, bart <b...@freeuk.com> wrote:
>
>>> This is to me is all a bit mixed up. Much as you dislike other languages
>>> being brought in, they can give an enlightening perspective.
>>
>> Right, nobody here knows anything outside of C, or can think outside of
>> the C box, except for you.
>
> Well, quite. AFAIK, nobody here HAS (1) used a comparable language to C;
> (2) over such a long term; (3) which they have invented themselves; (4)
> have implemented themselves; (5) is similar enough to C yet different
> enough in how it works to give that perspective.

You've taken a perspective is not transferrable to others.

If one can only see something after using your own invention for many
years, and other people don't have that same invention and
implementation experience, then they just cannot see what you see.

You cannot teach (2) through (4), just like a basketball coach cannot
teach a player to be seven foot tall.

> See, I gave an interesting comparison of how my module scheme works
> orthogonally across all kinds of entities, compared with the confusing
> mess of C, and you shut down that view.
>
> You're never in a million years going to admit that my language has some
> good points are you? Exactly as I said in my OP.

I have no idea what it is; I've not seen the reference manual / spec,
and even if I did, I wouldn't have implemented it myself and used it for
a long time, so I don't have the right perspective.

Kaz Kylheku

unread,

Feb 7, 2024, 10:07:49 PMFeb 7

to

On 2024-02-08, Malcolm McLean <malcolm.ar...@gmail.com> wrote:

> On 08/02/2024 01:38, Lawrence D'Oliveiro wrote:
>> On Thu, 8 Feb 2024 00:33:35 +0000, Malcolm McLean wrote:
>>
>>> We must be able to point to at least one other language where it is not
>>> the idiom, in order to say that it is an idiom.
>>
>> How about pointing to alternative ways it might be said in the same
>> language, and then proclaiming that “for some reason, nobody who uses the
>> language is supposed to do it that way”?
>
> So how do you say "My French is lousy" in idomatic French?

Je parle Quebecois.

David Brown

unread,

Feb 8, 2024, 2:52:27 AMFeb 8

to

On 07/02/2024 22:37, Michael S wrote:
> On Wed, 7 Feb 2024 21:49:52 +0100
> David Brown <david...@hesbynett.no> wrote:
>
>> On 07/02/2024 17:25, Scott Lurndal wrote:
>>> David Brown <david...@hesbynett.no> writes:
>>>> On 07/02/2024 00:24, Lawrence D'Oliveiro wrote:
>>>>> On Tue, 6 Feb 2024 09:50:02 +0100, David Brown wrote:
>>>>>
>>>>>> And of course there are those two or three unfortunate people
>>>>>> that have to work with embedded Windows.
>>>>>
>>>>> I thought this has pretty much gone away, pushed aside by Linux.
>>>>
>>>> It was never common in the first place, and yes, it is almost
>>>> entirely non-existent now. I'm sure there are a few legacy
>>>> products still produced that use some kind of embedded Windows,
>>>> but few more than that
>>>> - which is what I was hinting at in my post.
>>>
>>> Wind river is still popular, I believe, but the linux kernel +
>>> busybox is probably the most common.
>>
>> VxWorks, you mean? Yes, that is still used in what might be called
>> "big" embedded systems. There are other RTOS's that have been common
>> for embedded systems with screens (and no one would bother with
>> embedded Windows without a screen!),
>
> Then our company and me personally are no-ones 1.5 times.

You are just a rounding error :-)

But it is interesting to hear of exceptions to the general trend.

bart

unread,

Feb 8, 2024, 6:08:22 AMFeb 8

to

On 08/02/2024 02:50, Kaz Kylheku wrote:
> On 2024-02-08, bart <b...@freeuk.com> wrote:

>> But your comments are simplistic. EXE formats can be as hard to generate
>> as OBJ files. You still have to resolve the dynamic imports into an EXE.
>
> Generating just the EXE format is objectively less work than generating
> OBJ files and linking them into that ... same EXE format, right?

That depends:

* Are you generating OBJ files, or just ASM files and using a 3rd party
assembler?

* Are you producing OBJ files and relying on a 3rd party linker, or also
writing the linker?

* If the latter, are you using an official binary OBJ format, or
devising your own? That latter can make it a lot simpler.

And also, what exactly do you mean by a whole-program compiler?

I don't mean a C compiler which takes N source files at the same time,
compiles them internally, and links them internally. That's just
wrapping up all the steps involved in independent compilation into one
package.

>> Plus, if the minimal compilation unit is now all N source modules of a
>> project rather than just 1 module, then you'd better have a pretty fast
>> compiler, and some strategies for dealing with scale.
>
> Easy; just drop language conformance, diagnostics, optimization.

You're sceptical about something, I'm not sure what. Maybe you're used
to compilers taking forever to turn source into binary, so that you're
suspicious of anything that figures out how to do it faster.

Have you considered that recompiling after a one-line change, you don't
really the same in-depth analysis that you did 30 seconds ago?

/I/ am suspicious of compilers that produce a benchmark that completes
in 0.0 seconds, since it is most likely shirking the task that has been set.

> "as" on Ubuntu 18, 32 bit.
>
> $ size /usr/bin/i686-linux-gnu-as
> text data bss dec hex filename
> 430315 12544 37836 480695 755b7 /usr/bin/i686-linux-gnu-as
>
> Still pretty large. Always use the "size" utility, rather than raw
> file size. This has 430315 bytes of code, 12544 of non-zero static data, 37836
> bytes of zeroed data (not part of the executable size).

Raw file size is fine. The 'size' thing on my WSL shows that 'as' is
about 700KB for text and data.

BTW the same exercise on the 1.3MB NASM.EXE shwows the code as 0.3MB,
the rest is data. On my mcc compiler:

Compiling cc.m---------- to cc.exe
Code size: 186,647 bytes
Idata size: 86,392
Zdata size: 1,333,240
EXE size: 277,504

> That's still large for an assembler, but at least it's not larger
> than GNU Awk.

So Awk is another product that is still smaller than 1MB. Maybe there's
more of such programs than was thought!

But can you imagine if you're a developer of such a program, and being
told your product is a toy. Or being denied that use of a fast compiler,
because such a compiler will not scale to something that is 100x the size.

>>>> Yes, probably. But the optimisation is overrated. Do you really need
>>>> optimised code to test each of those 200 builds you're going to do today?
>>>
>>> Yes, because of the principle that you should test what you ship.
>>
>> Then you're being silly. You're not shipping build#139 of 200 that day,
>
> If I make a certain change for build #139, and that part of the code
> (function or entire source file) is not touched until build #1459 which
> ships, that compiled code remains the same! So in fact, the #139 version of
> that code is what build #1459 ships with. That code is being tested as part of
> #140, #141, #142, ... even while some other things are changing.
>
> You should not be doing all your development and developer testing with
> unoptimized builds so that only Q&A people test optimized code before
> shipping.
>
> Every test, even of a private build, is a potential opportunity to find
> something wrong with some optimized code that would end up shipping
> otherwise.

OK. This is entirely up to you. But then you can't complain when builds
routinely take makes minutes or even hours.

On the stuff I do, a whole-program build completes in about the time it
takes you to take your finger off the Enter key.

> Here is another reason to work with optimized code. If you have to debug
> at the machine language level, optimized code is shorter and way more readable.

I think the exact opposite is true, since optimised code can bear little
relation to the source code. The code may even have been elided.

> And it can help you understand logic bugs, because the compiler performs
> logical analysis in doing optimizations. The optimized code shows you what your
> calculation reduced to, and can even help you see a better way of writing the
> code, like a tutor.
>
>> not even #1000 that week. You're debugging a logic bug that is nothing
>> to do with optimisation.
>
> Though debugging logic bugs that have nothing to do with optimization can be
> somewhat impeded by optimization, it's still better to prioritize working with
> the code in the intended shipping state.
>
> You can drop to an unoptimized build when necessary.
>
> Pretty much that only happens when
>
> 1. It is just a logic bug, but you have to resort to a debugger, and
> the optimizations are interfering with being able to see variable values.)
>
> 2. You suspect it does have to do with optimization, so you see if it
> the issue goes away in the unoptimized build.

My method is to do 99.99% of builds unoptimised. The program may or may
not be in C.

If I want the finished program to be faster, then I can transpile to C
if necessary, and invoke an optimising C compiler. Or I can just supply
some C source to somebody and they can do what they like.

Millions of people code in scripting languages where there is no deep
analysis or optimisation at all. And yet they manage. Behind the scenes
there will be a fast bytecode compiler that does little other than
generated code that corresponds 99% to the source code.

But you're saying that as soon as you step over the line into AOT
compiled code, nothing less than full -O3 with a million other options
will do FOR EVERY SINGLE CCOMPILATION, even if the only change is to add
an extra space to a string literal because some message annoyingly
doens't line up?

OKay....

Michael S

unread,

Feb 8, 2024, 6:10:28 AMFeb 8

to

$ /mingw32/bin/as.exe --version
GNU assembler (GNU Binutils) 2.40
Copyright (C) 2023 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms
of the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `i686-w64-mingw32'.

$ size /mingw32/bin/as.exe

text data bss dec hex filename

2941952 10392 43416 2995760 2db630
C:/bin/msys64a/mingw32/bin/as.exe

$ /mingw64/bin/as.exe --version
GNU assembler (GNU Binutils) 2.39
Copyright (C) 2022 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms
of the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-w64-mingw32'.

$ size /mingw64/bin/as.exe

text data bss dec hex filename

2966156 14776 45216 3026148 2e2ce4
C:/bin/msys64a/mingw64/bin/as.exe

> That's still large for an assembler, but at least it's not larger
> than GNU Awk.
>

$ awk --version
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0-p13, GNU MP 6.2.1)
Copyright (C) 1989, 1991-2021 Free Software Foundation.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.

$ size /usr/bin/awk.exe

text data bss dec hex filename

619497 15884 23104 658485 a0c35 C:/bin/msys64a/usr/bin/awk.exe

Looks like on msys2 GNU as is much larger than GNU awk.

Michael S

unread,

Feb 8, 2024, 6:27:17 AMFeb 8

to

On Wed, 07 Feb 2024 16:21:25 GMT
sc...@slp53.sl.home (Scott Lurndal) wrote:

> Malcolm McLean <malcolm.ar...@gmail.com> writes:
> >On 07/02/2024 07:54, David Brown wrote:
> >> On 07/02/2024 00:23, Lawrence D'Oliveiro wrote:
> >>> On Tue, 6 Feb 2024 09:44:20 +0100, David Brown wrote:
> >>>
> >>>> They reuse "temp" variables instead of making new ones.
> >>>
> >>> I like to limit the scope of my temporary variables. In C, this
> >>> is as easy
> >>> as sticking a pair of braces around a few statements.
> >>
> >> Generally, you want to have the minimum practical scope for your
> >> local variables.Â It's rare that you need to add braces just to
> >> make a scope for a variable - usually you have enough braces in
> >> loops or conditionals
> >> - but it happens.
> >>
> >The two common patterns are to give each variable the minimum scope,
> >or to decare all variables at the start of the function and give
> >them all function scope.
> >
> >The case for minimum scope is the same as the case for scope itself.
> >The variable is accessible where it is used and not elsewhere, which
> >makes it less likely it will be used in error, and means there are
> >fewer names to understand.
>
> And it means the compiler can re-use the local storage (if any was
> allocated) for subsequent minimal scope variables (or even same scope
> if the compiler knows the original variable is never used again),
> so long as the address of the variable isn't taken.

That's completely orthogonal to the scope of declaration, at least as
long as compiler is not completely idiotic.

Ben Bacarisse

unread,

Feb 8, 2024, 6:37:55 AMFeb 8

to

bart <b...@freeuk.com> writes:

> On 07/02/2024 15:36, Ben Bacarisse wrote:
>> bart <b...@freeuk.com> writes:

>>
>>> On 07/02/2024 10:47, Ben Bacarisse wrote:
>>>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>>>

>>>>> However there are also strong arguments for function scope. A function is a
>>>>> natural unit. And all the variables used in that unit are listed together
>>>>> and, ideally, commented. So at a glance you can see what is in scope and
>>>>> what is being operated on. [typos fixed]
>>>> You should not need an inventory of what's being operated on. Any
>>>> function so complex that I can't tell immediately what declaration
>>>> corresponds to which name needs to be re-written.
>>>
>>> But if you keep functions small, eg. the whole body is visible at the same
>>> time, then there is less need for declarations to clutter up the code. They
>>> can go at the top, so that you can literally can just glance there.
>> Declarations don't clutter up the code, just as the code does not
>> clutter up the declarations. That's just your own spin on the matter.
>> They are both important parts of a C program.
>
> That sounds like your opinion against mine. It's nothing to do with spin,
> whatever that means.

It's spin, because the term is emotive. "Cluttering up" is how you feel
about it. The phrase is just a mildly pejorative one about appearances.
There's no substance there. To make a technical point you would have to
explain how, for example,

struct item *items;
...
n_elements = get_number_of_items(...);
items = malloc(n_elements * sizeof *items);
...

is technically better than

n_elements = get_number_of_items(...);
struct item *items = malloc(n_elements * sizeof *items);

I've explained (more than once) how I find reasoning about the direct
initialise at first use style easier with fewer distractions.

> I would argue however that it you take a clear, cleanly written
> language-neutral algorithm, and then introduce type annotations /within/
> that code rather than segragated, then it is no longer quite as clear or as
> clean looking.

I agree. That's one big win for languages like Haskell with
sophisticated type inference. But the discussion (here) should be about
C where the disagreement is only about where to put the declaration.

> As a related example, suppose you had this function:
>
> void F(int a, double* b) {...}
>
> All the parameters are specified with their names and types at the top. Now
> imagine if only the names were given, but the types specified only at their
> first usage within the body:
>
> void F(a, b) {...}

That's not a related example. No one is suggesting anything remotely
like this.

This is why I keep asking if you have some political (or PR) background.
There is no reason at all to present an example where type information
is removed from the function prototype because no one is suggesting
that. It's a straw-man that you can argue against where, presumably,
you don't want to argue in favour of splitting the declaration away from
the point of first use.

> I /like/ having a summary of both parameters and locals at the top. I
> /like/ code looking clean, and as aligned as possible (some decls will push
> code to the right). I /like/ knowing that there is only one instance of a
> variable /abc/, and it is the one at the top.

That's fine. I have other concerns that I feel trump rather subjective
notions of aesthetics.

>>>>> And there are only three levels of scope. A
>>>>> varibale is global, or it is file scope, or it is scoped to the
>>>>> function.
>>>
>>>> You are mixing up scope and lifetime. C has no "global scope". A name
>>>> may have external linkage (which is probably what you are referring to),
>>>> but that is not directly connected to its scope.
>>>
>>> Funny, I use the same definitions of scope:
>> You can use any definition you like, provided you don't insit that other
>> use your own terms. I was just pointing out that the problems
>> associated with using the wrong terms in a public post.
>> I'll cut the text where you use the wrong terms, because there is
>> nothing to be gained from correcting your usage.

>
> That's a shame. I think there is something to be gained by not sticking
> slavishly to what the C standard says (which very few people will study)
> and using more colloquial terms or ones that more can relate to.

Avoiding incorrect use of technical terms never gets in the way of
writing clear and easy to understand explanations. Quite the reverse.
If you try to explain C's notions of scope and linkage by mixing them up
into terms like "global variables" you can only sow confusion.

But you rather like that:

> Apparently both 'typedef' and 'static' are forms of 'linkage'. But no
> identifiers declared with those will ever be linked to anything!

You /could/ explain what the term linkage means in relation to C
identifiers, but your preference is rarely to help people understand.
You'd rather just make a snide remark: "look, the C standard uses an
ordinary English word in a way that is not normal!".

--
Ben.

Ben Bacarisse

unread,

Feb 8, 2024, 6:45:21 AMFeb 8

to

Malcolm McLean <malcolm.ar...@gmail.com> writes:

> On 07/02/2024 15:30, Ben Bacarisse wrote:
>> David Brown <david...@hesbynett.no> writes:
>>

>>> On 07/02/2024 11:04, Ben Bacarisse wrote:
>>>> David Brown <david...@hesbynett.no> writes:
>>>>

>>>>> Making some "temp" variables and re-using them was also common for some
>>>>> people in idiomatic C90 code, where all your variables are declared at the
>>>>> top of the function.
>>>> The comma suggests (I think) that it is C90 that mandates that all one's
>>>> variables are declared at the top of the function. But that's not the
>>>> case (as I am sure you know).
>>>
>>> Yes.
>>>
>>>> The other reading -- that this is done in
>>>> idiomatic C90 code -- is also something that I'd question, but not
>>>> something that I'd want to argue.
>>>
>>> "Idiomatic" is perhaps not the best word. (And "idiotic" is too strong!)
>>> I mean written in a way that is quite common in C90 code.

>> The most common meaning of "idiomatic", and the one I usually associate
>> with it in this context, is "containing expressions that are natural and
>> correct". That's not how I would describe eschewing declarations in
>> inner blocks.
>>
> No. It means writing the code in a way which is common in C and has certain
> advantages, but is not so in other languages.

Where do you get your superior knowledge of English from, and is there a
way anyone else can hope to achieve your level of competence?

--
Ben.

David Brown

unread,

Feb 8, 2024, 7:01:48 AMFeb 8

to

So I was correct that you were mixing things up, and can't provide a
reference in the C standards?

You are correct that there are no lexical elements that explicitly
control linkage, and that storage-class specifiers can affect linking.
That does not mean linkage is determined solely by storage-class
specifiers, nor does it mean all storage-class specifiers affect
linkage. They cover related, but separate concepts. (It's a bit like
scope and lifetime - they are related, but they are not the same thing.)

>
> * The tokens include 'typedef extern static'

They also include _Thread_local, auto and register.

And pay attention to 6.7.1p5 :
"""
The typedef specifier is called a "storage-class specifier" for
syntactic convenience only;

"""

>
> 6.2.2p3 says: "If the declaration of a file scope identifier for an
> object or a function contains the storageclass specifier static, the
> identifier has internal linkage."
>
> So it talks about statics as having linkage of some kind. What did I
> say? I said statics will never be linked to anything.

They have "internal linkage". This means they are allocated space (in
ram for objects, code space for functions) by the linker, but static
symbols from one translation unit do not link to the same names from
other units.

This is distinct from "external linkage", where space is allocated,
identical symbols from different units are linked together, you must
have no more than one definition (but you can have multiple
declarations), and the definition and declarations must match in type.

And it is distinct from "no linkage" - things that are not involved in
the linking process, and have no connection between the identifier and
an item in memory (code or data). Things like typedef names, non-static
local variables, struct tags, and macro names are amongst the things
that have no linkage.

So static objects, with internal linkage, /are/ linked - the use of the
identifier in the source code is linked to the linker-allocated memory
address (relative or absolute, depending on the kind of linking and kind
of target).

>
> 6.6.2p6 excludes typedefs (by omission). Or rather it says they have 'no
> linkage', which is one of the three kinds of linkage (external,
> internal, none).

It is not by omission - it is covered in 6.2.2p6.

>
> So as far as I can see, statics and typedef are still lumped in to the
> class of entities that have a form of linkage, and are part of the set
> of tokens that control linkage.

No, you are mistaken. But I can understand how you got it wrong, and I
hope my post here can help clear it up.

Your mixup stems from a limited view of "linking". You are viewing the
term to mean something like "linking identical global identifiers from
different units so that they refer to the same object". That is part of
the process, but it /also/ means "linking identifiers and references
with code or data memory areas". That applies equally to static data
(with C "internal linkage") as to "global" data (with "external
linkage") - but it does /not/ apply to things with "no linkage".

>
> ---------------------------------------------------
>
> This is to me is all a bit mixed up. Much as you dislike other languages
> being brought in, they can give an enlightening perspective.

They can't give much help with terms unless they are established
languages, and even then the terms can vary significantly between languages.

This is about your misunderstanding of the term "linkage" - at most,
references to your language could illustrate what you have got wrong.
But I think that has already been established and does not need extra help.

>
> So for me, linking applies to all named entities that occupy memory,

Yes.

> and
> that have global/export scope.

No. It is that extra restriction here that is wrong.

(And "global scope" is a /really/ bad term to use. Scope is about when
an identifier is visible in a program, and is not the same as linkage or
lifetime. I know what you are trying to say here, but it is not an
accurate term.)

Malcolm McLean

unread,

Feb 8, 2024, 7:10:22 AMFeb 8

to

On 08/02/2024 11:37, Ben Bacarisse wrote:
> bart <b...@freeuk.com> writes:
>
>> That sounds like your opinion against mine. It's nothing to do with spin,
>> whatever that means.
>
> It's spin, because the term is emotive. "Cluttering up" is how you feel
> about it. The phrase is just a mildly pejorative one about appearances.
> There's no substance there. To make a technical point you would have to
> explain how, for example,
>
> struct item *items;
> ...
> n_elements = get_number_of_items(...);
> items = malloc(n_elements * sizeof *items);
> ...
>
> is technically better than
>
> n_elements = get_number_of_items(...);
> struct item *items = malloc(n_elements * sizeof *items);
>
> I've explained (more than once) how I find reasoning about the direct
> initialise at first use style easier with fewer distractions.
>
items = malloc(n_elements * sizeof *items);

is shorter than

struct item *items = malloc(n_elements * sizeof *items);

and that is an objective statement about which there can be no dispute.

Malcolm McLean

unread,

Feb 8, 2024, 7:15:22 AMFeb 8

to

On 08/02/2024 11:45, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>

>> No. It means writing the code in a way which is common in C and has certain
>> advantages, but is not so in other languages.
>
> Where do you get your superior knowledge of English from, and is there a
> way anyone else can hope to achieve your level of competence?
>

Degree in English literature.
Places are difficult to obtain but not impossible. You need to convince
the dons that you deserve one as many other people will be after them.

David Brown

unread,

Feb 8, 2024, 7:24:50 AMFeb 8

to

But that is not the comparison.

struct item *items = malloc(n_elements * sizeof *items);

is shorter than:

struct item *items;

items = malloc(n_elements * sizeof *items);

You have to define the variable somewhere. Doing so when you initialise
it when you first need it, is, without doubt, objectively shorter.
Opinions may differ on whether it is clearer, or "cluttered", but which
is shorter is not in doubt. (What relevance that might have, is much
more in doubt.)