What attributes of a programming language simplify its use?

gah4

unread,

Dec 1, 2022, 7:18:24 PM12/1/22

to

We had the "What attributes of a programming language simplify its implementation?" discussion.

It seems, though, that languages are implemented a small number of
times, and used many times. So, designing for ease of use, instead of
ease of implementation makes more sense.

(Especially if you want a lot of people to want to use it.)

One feature that I find makes them easier to use, and harder to implement, is no reserved words.

For almost 50 years now, my favorite name for an otherwise unnamed
program is "this". (That is, where many people seem to use "foo", and
years before I knew about "foo".)

That worked fine, until Java came along with reserved word "this".
(Second choice, "that", fortunately isn't reserved in Java.)
[I take your point, but in PL/I you can say:

IF THEN = ELSE THEN BEGIN = IF; ELSE END = IF;

COBOL famously has too many reserved words but PL/I overreacted. -John]

gah4

unread,

Dec 2, 2022, 11:50:29 AM12/2/22

to

On Thursday, December 1, 2022 at 4:18:24 PM UTC-8, gah4 wrote:

(snip)

> One feature that I find makes them easier to use, and harder to implement, is no reserved words.

(snip)

> [I take your point, but in PL/I you can say:
>
> IF THEN = ELSE THEN BEGIN = IF; ELSE END = IF;
>
> COBOL famously has too many reserved words but PL/I overreacted. -John]

COBOL had many useful English words used.

As well as I know it, the idea for PL/I was that you shouldn't need to know
about the parts of the language that you weren't using, even know the words.

But also, PL/I more than other languages has simple rules, instead
of arbitrary restrictions. You can use any expression in any place where
an expression is allowed.

Fortran has always, and still does, have unobvious restrictions on how
expressions can be used. Some it seems intentionally to make it harder
for the programmer. That is, to discourage practices that some don't like.

My favorite complaint is about using REAL variables in DO loops.
Yes there are good reasons not to do it, but it isn't up to the language
to decide that.

The actual reason I follow this one, is that many (many!) years ago
I would sometimes translate BASIC programs to Fortran, where all
variables are Fortran REAL.

PL/I allowed array expressions from the beginning, Fortran added
them much later. PL/I has a simple rule. The subscripts have the
same value for all arrays in the expression.

Fortran has complicated rules, where some arrays index from one,
even if they are declared with different lower bound. You have to
be extremely careful, which one it is using. And it gets even worse
when you pass arrays to subroutines.

But back to the reserved words. Just because it allows them,
doesn't mean that you should use them. In all cases, the person
writing the program should consider readability. And using IF
related words in an IF statement is bound to be confusing.

Thomas Koenig

unread,

Dec 3, 2022, 12:13:47 PM12/3/22

to

gah4 <ga...@u.washington.edu> schrieb:

> We had the "What attributes of a programming language simplify its implementation?" discussion.
>
> It seems, though, that languages are implemented a small number of
> times, and used many times. So, designing for ease of use, instead of
> ease of implementation makes more sense.

Very much so.

> (Especially if you want a lot of people to want to use it.)
>
> One feature that I find makes them easier to use, and harder to
> implement, is no reserved words.

I think this is more a matter of extensibility than of ease of use,
but both are somewhat intertwined.

Adding a new reserved word is a breaking change, especially if that
word is often used. See "new" in C++, which was something reasonable
to use in C, and is reserved in C++.

The life cycle of a programming language will have many revisions (if
it is successful, that is), and not having reserved keywords certainly
helps in two aspects: Existing user programs will continue to work,
and new features can be added in a way that is easier to read than
having to add special characters, so a new feature looks like a cat
walked over the keyboard, with capslock on.

Yes, this is a bit more pain for compiler writers, but far less than,
let's say, having to deal with SIMD.
[There's also the perl approach where you put a "use" at the top of the
program file saying which version of the language you want. -John]

Hans-Peter Diettrich

unread,

Dec 3, 2022, 5:52:17 PM12/3/22

to

On 12/3/22 11:25 AM, Thomas Koenig wrote:
> gah4 <ga...@u.washington.edu> schrieb:

>> One feature that I find makes them easier to use, and harder to
>> implement, is no reserved words.
>
> I think this is more a matter of extensibility than of ease of use,
> but both are somewhat intertwined.
>
> Adding a new reserved word is a breaking change, especially if that
> word is often used. See "new" in C++, which was something reasonable
> to use in C, and is reserved in C++.

IMO C basic syntax is a bad base. As long as declarations and
expressions can be distinguished only by the type of an identifier (type
name or variable name) it's not a good idea to add new keywords that can
be confused with variable or type names. Instead weird constructs like
"long long" for int64_t have been introduced, while "int int" stays
equivalent to "int".

DoDi

Christopher F Clark

unread,

Dec 3, 2022, 7:27:51 PM12/3/22

to

The discussion on reserved words versus keywords reminds me of
decisions we made while building Yacc++. It is worth noting that we
(both of its developers) worked at Pr1me computer where PL/I dialects
were the key programming language used in build both the OS and the
compilers, so we were likely highly influenced by that.

As a result, Yacc++ has very few reserved words, I'm pretty sure the
number is 3 or less. There is only 1 that I can think of "yy_eof"
which is reserved because it is used in the library in a place where
we have hand-written code that we don't want to modify.(*) And, we
specifically reserve all yy_ words for use in the library, although
most can be used in grammars (and code) without any ill effect. And in
doing so, we feel we haven't taken away any common words from the
users' vocabulary, and we have done so in a way that when the words
have special meaning, it is generally the same meaning as traditional
lex/flex/yacc/bison variants.

However, we do have plenty of context sensitive keywords. But we
structured their usage (as keywords) such that they are easily
disambiguated. Thus, left, right, nonassoc don't have special
characters in them, as opposed to yacc where they are %left et al.
Now, %prec we couldn't make unambiguous, so it retains the required %
spelling.

Still, worth noting to make that a possibility, we had to require that
all productions have a terminating semicolon (;}, rather than
depending upon name colon (:) to identify the start of the rule. That
also gets rid of the lexical hack required to make the grammar LALR(1)
not 2. We could have handled LALR(2) grammars, but in our opinion, it
made the error recovery and messages less obvious. Sometimes
simplicity of implementation makes for a simpler and more regular
language.

But to continue this part of the explanation, words like fast, small,
readable are keywords that describe different ways we layout the
tables and in specific contexts have those meanings. But in those
contexts, normal identifiers cannot appear. And, in any context where
a normal identifier can appear, they are simply identifiers and don't
carry any significance and in the library code where we need them to
have special meaning we use yy_fast et al. So, in the declarations
within a grammar where we need them to have special meanings, we don't
need them to be spelled some "special" way (i.e. you don't say yy_fast
yy_tables, you say "fast tables" and it is perfectly clear, but you
can also use "fast" and "tables" in your grammar as tokens or
non-terminals without worrying that you are using a "reserved" word.
In fact, we do so to describe the grammar of Yacc++ grammars.

Thus, we feel like we have most achieved a similar level of balance as
PL/I had, without creating a write-only language. Yes, you can
probably use Yacc++ to write an extensible language that diverges into
a bunch of unique and incomprehensible variants where no two
programmers are using the same language. We haven't made that
impossible, However, the freedom we have allowed does not inherently
contribute to that nor encourage it. It simply let's people write
things slightly more naturally without a lot of "line noise".

------

Now, given that, I want to dispell the illusion that it makes parsing
harder (beyond a very trivial amount). The "trick" (hack) that lets
the grammar deal with keywords that are not reserved is quite simple
(and we document how to do it for users who are designing their own
languages in our manual) and it should apply to most parser generators
and doesn't rely on any special feature of Yacc++, although we have
some features that make doing so easier.

So, for example you have a list of keywords (tokens) that you want
treated like identifiers, say "if" "then" "else" ala PL/I or "left"
"right" "token" ala Yacc++ and you have a token identifier than you
want to define other identifiers as in:

token identifier, if, then, else, left, right, token;
identifier: "a" .. "z" ("a" .. "z" | "0" .. "9")*;
if: "if";
then: "then";
else: "else";
left: "left";
right: "right";
token: "token";

To get the desired property, you simply define a non-terminal (we'll
call it "ident") that you use to represent identifiers in contexts
where you want the keywords to be allowed, as in:

ident: identifier | if | then | else | left | right | token;

Now, simply use ident where you would have used identifier previously,

rule: ident ":" ident* ";" ;

And, use the keywords where they have their special meaning:

if_stmt: if expression then stmt (else stmt)?;
left_decl: left ident ("," ident)* ";" ;

As long as the uses are unambiguous (and the generator uses) the
prefer shift in shift-reduce conflict method of resolution (or you can
force it to with "precedence" declarations), then the grammar will
work as expected.

If it is ambiguous and you want to disallow certain keywords, simply
introduce other non-terminals, such as

ident_not_if: identifier | then | else | left | right | token;

assignment : ident_not_if "=" expression; // if keyword not allowed before =

-----

Now, as I said we have features in Yacc++ that make this easier, but
the principle doesn't require our tool. And, there is much more you
can do. This is just one of the relevant grammar hacks.

*) yy_error might also be a reserved word used as part of error recovery.

--
******************************************************************************
Chris Clark email: christoph...@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

gah4

unread,

Dec 3, 2022, 11:00:21 PM12/3/22

to

On Saturday, December 3, 2022 at 4:27:51 PM UTC-8, christoph...@compiler-resources.com wrote:
> The discussion on reserved words versus keywords reminds me of
> decisions we made while building Yacc++. It is worth noting that we
> (both of its developers) worked at Pr1me computer where PL/I dialects
> were the key programming language used in build both the OS and the
> compilers, so we were likely highly influenced by that.

This is reminding me of some cases in TeX where optional keywords can
arise in unexpected places. There is TeX glue that allows:

\hskip 1cm

or

\hskip 1cm plus 1cm

In normal use, you mix TeX commands and text to be formatted. If a macro
expands to

\hskip 1cm

and is followed by text starting with

plus

you get a surprising error message.

I believe that plus is only a "reserved word" in that specific context.

And a project I was working on some years ago, just happened to run
into that case, however unlikely that might be.

Keith Thompson

unread,

Dec 6, 2022, 1:28:44 PM12/6/22

to

Hans-Peter Diettrich <DrDiet...@netscape.net> writes:
> IMO C basic syntax is a bad base. As long as declarations and
> expressions can be distinguished only by the type of an identifier (type
> name or variable name) it's not a good idea to add new keywords that can
> be confused with variable or type names. Instead weird constructs like
> "long long" for int64_t have been introduced, while "int int" stays
> equivalent to "int".

long long and int64_t are not the same (though int64_t may be the same
type as long long in a given implementation). long long is *at least* 64
bits. int64_t is *exactly* 64 bits, and must have a 2's-complement
representation and no padding bits. "int int" is a syntax error.

(I'm not arguing that C's integer type system isn't overly complicated.)

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

gah4

unread,

Dec 6, 2022, 10:22:21 PM12/6/22

to

On Tuesday, December 6, 2022 at 10:28:44 AM UTC-8, Keith Thompson wrote:

(snip)

> long long and int64_t are not the same (though int64_t may be the same
> type as long long in a given implementation). long long is *at least* 64
> bits. int64_t is *exactly* 64 bits, and must have a 2's-complement
> representation and no padding bits. "int int" is a syntax error.

> (I'm not arguing that C's integer type system isn't overly complicated.)

It seems that many Fortran programmers now assume that KIND=8
(for REAL) is a 64 bit IEEE floating point value, and I suspect for
INTEGER that it is a 64 bit integer.

Fortran makes no claim on the numerical values of KINDs.

It doesn't seem too surprising, then, that some would miss the
distinction between int64_t and long long.

In the early days of 64 bit computing, which I mostly remember from
the DEC Alpha, C compilers made long the 64 bit type.

That, then, broke too much software assuming long was 32 bits.

Much of IP networking evolved when C int was either 16 or 32 bits, but
you didn't really know. When short was reliably 16 bits, and long was
reliably 32 bits.

So, we have things like htonl() and ntohl() for converting 32 bit
values to/from network byte order. (The l stands for long.)

Since networking code, especially cross platform, depends more on
exact lengths than many others, that was one that had to get done
right pretty early. (Cross platform file formats, too.)

So then we got long long as the (close enough to) reliable 64 bit
type.

Maybe in a few years, we will have the long long long 128 bit type.

But C syntax has been confusing due to the reserved words and need for
additions in more than just data types.

There are stories that I don't remember on the different uses of the
word "static" in C.

Though maybe not quite as many as Fortran uses for *.

Anton Ertl

unread,

Dec 7, 2022, 11:44:53 AM12/7/22

to

gah4 <ga...@u.washington.edu> writes:
>In the early days of 64 bit computing, which I mostly remember from
>the DEC Alpha, C compilers made long the 64 bit type.

The early days of 64-bit computing are on the CDC Star and Cray-1, but
C was a minor language for them.

Yes, we got the first mainstream 64-bit Unix with Digital OSF/1 on the
Alpha, and 64-bit APIs and ABIs on Unix had 64-bit long.

>That, then, broke too much software assuming long was 32 bits.

Obviously not, or other Unix vendors would not have also made longs
64-bit in their interfaces.

>So then we got long long as the (close enough to) reliable 64 bit
>type.

GCC introduced long long indepenently of any 64-bit port; this is easy
to see because the original GCC documentation specified that long long
int is twice as long as long int. Later, when the Alpha port (and
later 64-bit ports) came, the porter decided to make long long 64-bit,
i.e., the same size as long; I don't know if the Alpha API/ABI had a
requirement on the size of long long, or if the people responsible for
the Alpha port did deviate from the documentation for some other
reason. When we reported this as a bug, the fix was to change the
documentation to say that long long is twice as long as int.

Concerning IL32P64, i.e., 32-bit longs with 64-bit pointers, that
seems to be a specialty of 64-bit Windows. Fortunately, I don't have
to deal with this API (64-bit Cygwin supports the Unix API, i.e.,
64-bit long).

>Maybe in a few years, we will have the long long long 128 bit type.

GCC has supported 128-bit integers for a while, originally we wrote,
e.g.:

typedef int int128_t __attribute__((__mode__(TI)));

(makes me wonder how the compiler sees the "TI"; it's not a keyword,
and it's not a defined name in any of the name spaces; gcc tends to
pass such things as literal strings (cf. extended asm), but here it
does not).

Nowadays it seems to (also) have __int128_t as an
implementation-specific keyword. I see no motions in the direction of
long long long (and, looking at history, it would only have 64 bits in
length:-).

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Hans-Peter Diettrich

unread,

Dec 7, 2022, 11:45:13 AM12/7/22

to

On 12/6/22 6:56 PM, Keith Thompson wrote:
> Hans-Peter Diettrich <DrDiet...@netscape.net> writes:
>> IMO C basic syntax is a bad base. As long as declarations and
>> expressions can be distinguished only by the type of an identifier (type
>> name or variable name) it's not a good idea to add new keywords that can
>> be confused with variable or type names.

Nobody seems to disagree with my opinion?

> Instead weird constructs like
>> "long long" for int64_t have been introduced, while "int int" stays
>> equivalent to "int".
>
> long long and int64_t are not the same (though int64_t may be the same
> type as long long in a given implementation). long long is *at least* 64
> bits. int64_t is *exactly* 64 bits, and must have a 2's-complement
> representation and no padding bits.

You are right, my sloppy wording was not appropriate in this NG :-(

> "int int" is a syntax error.

I could not find in the (older) C++ grammar why "int int" should be a
*syntax* error. Aren't both "int" and "long" simple-type-specifier's
which can occur multiple times in a decl-specifier-seq?

It looks to me like additional rules apply which decide that
"long int"
"long long int"
"long int long" //what's that?
are all valid while
"long int long int"
throws an "two or more data types..." error.

In former times it was much easier to decide with a single basic type id
(int...) and type modifiers (long...).

DoDi

Keith Thompson

unread,

Dec 8, 2022, 1:57:55 PM12/8/22

to

Hans-Peter Diettrich <DrDiet...@netscape.net> writes:
> On 12/6/22 6:56 PM, Keith Thompson wrote:

[...]

>> "int int" is a syntax error.
>
> I could not find in the (older) C++ grammar why "int int" should be a
> *syntax* error. Aren't both "int" and "long" simple-type-specifier's
> which can occur multiple times in a decl-specifier-seq?

No, there are specific rules that specify the way they can be used.
In the 2011 ISO C standard standard (I use the draft from
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), the valid
type specifiers are listed in section 6.7.2.

At least one type specifier shall be given in the declaration
specifiers in each declaration, and in the specifier-qualifier
list in each struct declaration and type name. Each list of type
specifiers shall be one of the following multisets (delimited
by commas, when there is more than one multiset per item);
the type specifiers may occur in any order, possibly intermixed
with the other declaration specifiers.

The list includes entries:
- int, signed, or signed int
- long long, signed long long, long long int, or signed long long int
among others. There are no entries in which the keyword "int" appears
more than once.

(The C99 standard incorrectly referred to these as "sets" rather than
"multisets", which preserve the number of times each element occurs.
The word "sets" was correct in C90, which didn't have long long.)

[...]

> In former times it was much easier to decide with a single basic type id
> (int...) and type modifiers (long...).

int and long are both keywords that are type specifiers, but they differ
in how they can be used. It's tempting to think that the "long" in
"long int" qualifies the type name "int", and that's probably how it
originated, but that's now how the language standard specifies it.

C++ has equivalent rules, stated a bit differently. (C introduced
long long in 1999, C++ in 2011.)

Hans-Peter Diettrich

unread,

Dec 8, 2022, 3:45:04 PM12/8/22

to

On 12/8/22 2:53 AM, Keith Thompson wrote:
> Hans-Peter Diettrich <DrDiet...@netscape.net> writes:
>> On 12/6/22 6:56 PM, Keith Thompson wrote:
> [...]
>>> "int int" is a syntax error.
>>
>> I could not find in the (older) C++ grammar why "int int" should be a
>> *syntax* error. Aren't both "int" and "long" simple-type-specifier's
>> which can occur multiple times in a decl-specifier-seq?
>
> No, there are specific rules that specify the way they can be used.
> In the 2011 ISO C standard standard (I use the draft from
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), the valid
> type specifiers are listed in section 6.7.2.

Thanks for the link :-)

> At least one type specifier shall be given in the declaration
> specifiers in each declaration, and in the specifier-qualifier
> list in each struct declaration and type name. Each list of type
> specifiers shall be one of the following multisets (delimited
> by commas, when there is more than one multiset per item);
> the type specifiers may occur in any order, possibly intermixed
> with the other declaration specifiers.

So let me repeat my questions:

- Why is "int int" a syntax error? "At least one..." allows for more
than one type-specifier in declaration-specifiers (6.7).

- What's "long int long"? My current (Arduino) C++ compiler doesn't flag
it as an error.

DoDi
[This is getting close to comp.lang.c but I'm OK with a little more
discussion of the design decisions in C's very messy declarations. -John]

gah4

unread,

Dec 8, 2022, 5:53:47 PM12/8/22

to

On Thursday, December 8, 2022 at 12:45:04 PM UTC-8, Hans-Peter Diettrich wrote:

(snip)

> So let me repeat my questions:

> - Why is "int int" a syntax error? "At least one..." allows for more
> than one type-specifier in declaration-specifiers (6.7).

> - What's "long int long"? My current (Arduino) C++ compiler doesn't flag
> it as an error.

> DoDi
> [This is getting close to comp.lang.c but I'm OK with a little more
> discussion of the design decisions in C's very messy declarations. -John]

Now we can get closer to compilers.

I suspect that it isn't a syntax error, though it will depend on how the
compiler is written.

The compiler (parser) can accept any combination of the specifiers,
and even more than one of them, and then later the compiler decides
that the ones give are not valid.

There was a story many years ago, about a compiler with only one error
message: "SYNTAX ERROR". (Likely in the days of upper case only.)

In any case, it is often easier to write the parser more general than
the actual language, and then flag them later.

But also, the same can be done for the language standard.

As well as I know it, in early C variables default to int. Later, it
was required that they be declared, but the default type was still
int. You could declare:

auto i;

which declares i as automatic, and (by default) int.

It gets more interesting in Fortran, where you can give variables
attributes in separate statements:

INTEGER I
DIMENSION I(10,10)
PUBLIC I
ALLOCATABLE I
ASYNCHRONOUS I
CONTIGUOUS I
INTENT(IN) I
OPTIONAL I
POINTER I
PROTECTED I
SAVE I
TARGET I
VOLATILE I

All might be legal syntax separately, but not legal in all combinations.

Keith Thompson

unread,

Dec 10, 2022, 4:24:50 PM12/10/22

to

Hans-Peter Diettrich <DrDiet...@netscape.net> writes:
> On 12/8/22 2:53 AM, Keith Thompson wrote:
>> Hans-Peter Diettrich <DrDiet...@netscape.net> writes:
>>> On 12/6/22 6:56 PM, Keith Thompson wrote:
>> [...]
>>>> "int int" is a syntax error.
>>>
>>> I could not find in the (older) C++ grammar why "int int" should be a
>>> *syntax* error. Aren't both "int" and "long" simple-type-specifier's
>>> which can occur multiple times in a decl-specifier-seq?
>>
>> No, there are specific rules that specify the way they can be used.
>> In the 2011 ISO C standard standard (I use the draft from
>> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), the valid
>> type specifiers are listed in section 6.7.2.
>
> Thanks for the link :-)
>
>> At least one type specifier shall be given in the declaration
>> specifiers in each declaration, and in the specifier-qualifier
>> list in each struct declaration and type name. Each list of type
>> specifiers shall be one of the following multisets (delimited
>> by commas, when there is more than one multiset per item);
>> the type specifiers may occur in any order, possibly intermixed
>> with the other declaration specifiers.
>
> So let me repeat my questions:
>
> - Why is "int int" a syntax error? "At least one..." allows for more
> than one type-specifier in declaration-specifiers (6.7).

Sorry, my mistake. It's not a syntax error (a violation of a syntax
rule). It's a constraint violation. "int int" and "long long" are both
permitted by the grammar. The "Each list of type specifiers shall be
one of the following ..." wording forbids "int int" while allowing "long
long".

The paragraph I quoted above is a *constraint*. The C standard requires
at least one diagnostic for any program that violates either a
constraint or a syntax rule.

C compilers aren't required to treat the two kinds of violations
differently. As gah4 points out, a compiler might use a more permissive
grammar and flag some syntax errors as if they were constraint
violations, particularly for constructs that it might accept with more
permissive options.

> - What's "long int long"? My current (Arduino) C++ compiler doesn't flag
> it as an error.

Because it isn't an error. "long int long" is equivalent to "long long
int", since "the type specifiers may occur in any order". (The
individual keywords "int" and "long" are type specifiers; "long long
int" is a list of 3 type specifiers, making up a *type name* (6.7.7).

(Of course the fact that you *can* write "long int long" or "double
_Complex long" doesn't mean you *should*.)

> DoDi
> [This is getting close to comp.lang.c but I'm OK with a little more
> discussion of the design decisions in C's very messy declarations. -John]

Agreed. Early C was simpler, though the general principle that
declaration (more or less) follows usage made the declaration syntax a
bit unusual. As new features like typedef and long long were added and
wedged into the existing syntax, the rules had to be made more
elaborate. The standard has formal rules that describe features that
evolved rather informally (like the fact that "typedef" is syntactically
a storage class).

I'm sure the language would be different if backward compatibility were
not an issue.

David Brown

unread,

Dec 10, 2022, 4:25:26 PM12/10/22

to

To my reading of the C standard (I'm looking at C11 at the moment, the
same version as Keith - if you don't have a copy, it is freely available
from Keith's link), "int int" is /not/ a syntax error. It is a
/constraint violation. "int" and "long" are syntactically classified as
"type specifiers", and the syntax for declarations allows any number of
type specifiers. But the constrains given in 6.7.2 put limits on the
combinations that are allowed. "int int" is not on that list, so it is
a constraint error. "long long int" /is/ on the list, and the type
specifiers can be re-ordered, so "long int long" is fine (it's a 64-bit
signed integer type on the Arduino).

I believe the difference between syntax errors and constrain violations
in a case like this is purely for historical reasons.

C++ has the same rules, but the standards are not as explicit about
syntax and constraints.

(For an alternative way to consider handling of "long" and "short" in C,
you might enjoy this proposal:
<https://www.open-std.org/JTC1/sc22/wg21/docs/papers/2018/p0989r0.pdf>)

Hans-Peter Diettrich

unread,

Dec 10, 2022, 4:29:21 PM12/10/22

to

On 12/8/22 11:44 PM, gah4 wrote:
> On Thursday, December 8, 2022 at 12:45:04 PM UTC-8, Hans-Peter Diettrich wrote:
>
> (snip)
>
>> So let me repeat my questions:
>
>> - Why is "int int" a syntax error? "At least one..." allows for more
>> than one type-specifier in declaration-specifiers (6.7).
>
>> - What's "long int long"? My current (Arduino) C++ compiler doesn't flag
>> it as an error.
>
>> DoDi
>> [This is getting close to comp.lang.c but I'm OK with a little more
>> discussion of the design decisions in C's very messy declarations. -John]
>
> Now we can get closer to compilers.
>
> I suspect that it isn't a syntax error, though it will depend on how the
> compiler is written.

IMO it's not an error WRT the formal (syntax) grammar, but may be a
*semantic* error.

> The compiler (parser) can accept any combination of the specifiers,
> and even more than one of them, and then later the compiler decides

> that the ones given are not valid.

Here I'd distinguish between definite and accidental freedom of
compilation, where *definite* is typically marked "implementation
specific". Consider the "short" and "long" modifiers, where a compiler
writer can check for any of both but may miss the case when both are
given at the same time. The result will be kind of a modifier precedence
defined by the compiler writer. If such a problem was ever recognized,
should it result in an error message or warning only?

> There was a story many years ago, about a compiler with only one error
> message: "SYNTAX ERROR". (Likely in the days of upper case only.)
>
> In any case, it is often easier to write the parser more general than
> the actual language, and then flag them later.

That's common practice, but do we have to live with it?

> But also, the same can be done for the language standard.

I'm not sure. Can constructs like the "dangling else" be resolved in a
*formal* grammar, without changing the *language*? In this case an
informal note "applies to closest ... if" can definitely fix the issue.
[You know what i mean]

> As well as I know it, in early C variables default to int. Later, it
> was required that they be declared, but the default type was still
> int.

These were the days of separate *basic types* and *type modifiers*. At
most one basic type was allowed, and if none was given it *defaulted* to
int. Like today "long" and "long int" mean the same type.
Similarly an int could be used as a pointer, and at least in one of my
old C compilers it *defaulted* to "char*".

DoDi
[Dangling else is not hard to fix in the grammar, but I would not want to
see the combinatorial explosion if you tried to put the C type constaints
into the grammar. I have often noted that if you make a grammar more
permissive and reject invalid combinations later, you generally end up
with a compiler that is easier to understand and has better error messages,
e.g. rather than SYNTAX ERROR, "too many `int' qualifiers." -John]

marb...@yahoo.co.uk

unread,

Dec 10, 2022, 4:29:55 PM12/10/22

to

On Saturday, 3 December 2022 at 22:52:17 UTC, Hans-Peter Diettrich wrote:
> Instead weird constructs like
> "long long" for int64_t have been introduced, while "int int" stays
> equivalent to "int".

(Sorry, not following this thread till I noticed "long long" :-) )
Another feature C's pinched off Algol 68?
(When I designed and partly-implemented a language in 2006, I called my types
"s8", "u8", "s16", and "u16". (That's as far as I got.) From nearly 40 years
of C programming, I've concluded that having "int" be the "natural" size of
integer is more of a liability than an asset.)

Anton Ertl

unread,

Dec 11, 2022, 1:13:11 PM12/11/22

to

"marb...@yahoo.co.uk" <marb...@yahoo.co.uk> writes:
> From nearly 40 years
>of C programming, I've concluded that having "int" be the "natural" size of
>integer is more of a liability than an asset.)

I have also been programming in C, but also in Forth since the 1980s.
C has the well-known integer type zoo, but evolved from B, which had
only a single type (the machine word, which became "int" in C). On
the integer side Forth has only cells (machine words) and
double-cells, and, in memory, characters (bytes), which are loaded and
processed as cells (like chars in C are promoted to ints).

My experience is that portability bugs are very rare in Forth code,
while they are much more common in C code.

Not only is there a zoo of integer types in C, there are also
additional integer types (off_t, uid_t, time_t, etc.) that are mapped
to different more basic integer types on different platforms; and
there are library functions like printf(), abs/labs/llabs() etc. where
you have to decide on one of a few integer types, but if the type at
hand is not among them, you need to use some cumbersome non-C
machinery like configure to select the code.

By contrast, in Forth most integers are passed as cells. When you pass
it as double-cell, you also pass it on a different platform (with
different cell width) as double-cell, so once you debug the program on
one machine, it almost always works on a machine with a different cell
size.

However, there is one discipline you have to observe: For address
computations you have to use the cell-size-agnostic scaling word CELLS
rather than making use of the knowledge of the cell size on the
platform you are testing on (Forth code from the 16-bit era, before
CELLS was introduced, uses 2* instead, and is not portable to systems
with wider cells).

Christopher F Clark

unread,

Dec 11, 2022, 1:16:20 PM12/11/22

to

Trying to bring this back to compilers (and their implementation).
Over the years, I have noticed a couple of things in the various
languages I have used.
These are just my opinions and observations.

----------

LL(1) parsing is good for statements. And a good rule of thumb is
that every statement (except perhaps 1) should start with a keyword.
The except perhaps 1 case is often assignment statements where you
have lhs-expression assign-op rhs-expression. But, once you have that
no other statement should start with an expression (without a
keyword). You can make the keywords reserved in that context without
undue burden to the user. PL/I's "decl" statement and Pascal's "var",
"function", "procedure", etc statements are good examples of this.

Curiously if you want a series of keywords to begin a statement, you
should make the "reserved" keyword be last in the list or have
something else that separates the list of keywords from the normal
identifiers. In Yacc++ we have a variety of declarations that define
tokens that are keywords, the reserved word for those declaration is
"keyword" but we have a bunch of other words that aren't reserved that
can modify keyword. Those words all must appear before keyword in the
declaration. That way you can distinguish them from usage as
identifiers. Doing that is easier with an LR grammar.

e.g.

case sensitive substring keyword keyword /* that keyword is an
identifier */, case /* so is case */, substring /* and substring */;

The first 4 words in the above declaration are all keywords, but then
after the special keyword "keyword" those simply become identifiers,
and the LR grammar has no issues telling those apart.

An alternative formation might look like this:

keyword keyword, case, sensitive : case sensitive substring;

The colon (a reserved token) separates the modifying keywords from the
list of identifiers.

Note if I were doing a language like Pascal I might do it like:
("var"|"const") identifier (("," identifier)* (":" type-expr)? ("="
init-expr)? ("@" locatiion-expr)?)+ ";"

Then in a type-expr, keywords like "int" and "float" become reserved,
but not elsewhere.
And after the at words like "static", or "heap" or "stack" would be reserved.

---------

Languages with balance "parenthesizing" keywords are generally less
ambiguous. if expr then stmt (else stmt)? fi where the if and fi
match gets rid of dangling else problems and variations like if expr
then stmt (else if expr then stmt)* (else stmt)? fi still don't have
an issue. Note that in this case, you probably want "then" and "else"
to be reserved words in your grammar or do something if "(" expr ")
stmt (";" stmt)? fi // where ";" is a clear reserved token or if "("
expr ")" "{" stmt "}" ("{" stmt "}")? fi where the parens and braces
balance also works.

Curiously, from C I learned that single character parentheses have
their advantages. Thus () [] {}, but not really << >> or even
"begin" and "end". However, the convention of ''' (3 of the relevant
quote/paren) for multi-line bracketed items does seem to work well.
And, 3 for that is better than 2. Backslash conventions may be a
necessary evil, but they are not very friendly. Quoted strings where
the same quote starts and ends the string also tend to be error prone,
but they are so much a part of the heritage that it is another
necessary evil.

In fact, the worst part of error detection and recovery from my
experience is "single character" errors that radically change the
program. It is too easy for a single character to get inserted and
break the program in a way that is easy to overlook.

-------

Another thing which works poorly is having both prefix and suffix
operators. If you have them, they should not be at the same level of
precedence, that almost always results in ambiguity.

-------

gah4

unread,

Dec 12, 2022, 11:49:15 AM12/12/22

to

On Tuesday, December 6, 2022 at 10:28:44 AM UTC-8, Keith Thompson wrote:

(big snip)

> (I'm not arguing that C's integer type system isn't overly complicated.)

One reason for that, as noted above, is reserved words.

Adding new reserved words risks invalidating existing programs.

I do notice that Java has a reserved word "goto" without a defined use.
Someone was planning ahead.

C could have reserved some words for future use, if someone thought about it.

So adding new types is complicated.
[I think this is where you use #pragma to say which new keywords you're
using. Yes, it's a kludge. -John]