Octal escape sequences

Bob Kline Phoenix Contract

unread,

May 4, 1995, 3:00:00 AM5/4/95

to

I notice in the most recent draft that a change has been made in
octal escape sequences, which used to be a backslash followed by
one, two, or three octal digits. The draft now appears to place
no limit on the number of digits which can make up an octal escape
sequence. The ANSI C standard introduced a similar change for
hex escape sequences, which broke quite a bit of existing code
(which relied on the assumption that the translator would stop
folding characters into the escape sequence after two hex digits
had been seen). So, for example, "\x1eFourier Series" has a
different interpretation with an ANSI C compiler than it did
with an older compiler. When the problem was created by the
new standard there were (at least) two solutions available.
One solution was to take advantage of the new requirement that
adjacent string literals be concatenated by the compiler; thus
the example could be re-written as "\x1e" "Fourier Series",
preventing the F of Fourier from being swallowed into the escape
sequence. The second solution was to use octal escape sequences,
which were guaranteed to contain no more than 3 octal digits.
This solution would write the example as "\036Fourier Series".
Of the two solutions, the second (octal) solution offers the
advantage that the result was correctly interpreted by all C
compilers (including older compilers which don't handle the
concatenation of adjacent strings). It is therefore disappointing
to learn that code which used this solution to the problem
introduced by ANSI C will now be broken. It is also puzzling,
since the change does not allow the programmer to do anything
which could not be accomplished with the ANSI C scheme. Since
one of the published goals is to break as little existing code
as possible, and only do so when there is no alternate means
of introducing a significant new needed feature to the language,
perhaps this change is inadvertant and will be corrected before
the standard is approved and published. The change is even more
drastic than that introduced by ANSI C, because it breaks code
which relies on an assumption which has been valid from the earliest
versions of C, whereas the hex escape sequences were not part of
K&R C, and there was therefore less existing code to break. Can
anyone give us the rationale for the change? Thanks.

--
/*----------------------------------------------------------------------*/
/* Bob Kline Stream International */
/* bob_...@csof.com formerly Corporate Software, Inc. */
/* voice: (703) 522-0820 x-311 fax: (703) 522-5407 */
/*----------------------------------------------------------------------*/

John Max Skaller

unread,

May 7, 1995, 3:00:00 AM5/7/95

to

In article <1995May4.1...@nlm.nih.gov>,

Bob Kline Phoenix Contract <bkline%occs.nlm.nih.gov> wrote:
>I notice in the most recent draft that a change has been made in
>octal escape sequences, which used to be a backslash followed by
>one, two, or three octal digits. The draft now appears to place
>no limit on the number of digits which can make up an octal escape
>sequence.

I'd like ISO C (next standard) and ISO C++ (this standard
if at all possible, or an addendum for alignment with the next
ISO C Standard otherwise) to fix this and other related nasty problems.

My conception is as follows.

1) There are 4 numerical bases of importance:

a) Decimal
b) Hexadecimal
c) Binary
d) Octal

-- in order of importance IMHO :-)

2) The following letters are associated with each base:

d -- decimal
x -- hex
y -- binary
o -- octal

The reason for choosing these becomes apparant below.

3) integral constants of the forms

0d9999
0xFFFF
0y1010101
0o77777

are permitted. The extremely bad defaulting of

07777

to octal is deprecated.

4) Optionally an underscore can separate digits or
be placed between the base letter and preceding or subsequent digit:

0d_999_999_88
123_456_88
0x_FFFF_FFFF
0y_1010_1111_1110_0001
0_12377

5) as usual a long value or unsigned value can be denoted with
a suffix.
12324L
FFFFUL

6) A character or wide constant may use any base letter:

'\d99999'
'\x_FF_FF'
'\o124'
'\y11111111111_0000000'

in which a slosh replaces the leading 0, no underscore
is permitted before the slosh and base character. Now you know why
for BINARy the letter 'y' is chosen -- 'b' isn't available
because '\b' is already used. Also, 'y' is close to 'x'
in the English alphabet, reminding us the two forms
are closely related. Using 'o' for octal is cute -- it is
not only the first letter, but

'\0123' and '\o123'

mean the same despite the visual difficulty distinguishing
them. 0o123 also means the same as the deprecated 0123 --
or 00123 or 0_123 or 0_0123 or 0_o123 or 0o_123 or 0_o_123.

The extremely bad defaulting of

'\123'

to octal is deprecated.

7) Strings. This is the hard one. The existing
situation is untenable. A numeric escape sequence in a string
should _have_ to be terminated. I have no ready made
clear solution. However something like

"abc\x[ FF FF FF ]Fred"

suggests itself. This means the same as

"abc\xFF\xFF\xFFFred"

except that there is no ambiguity of the last F in the first form.
Note also any white spaces can be used in the first form:

"Start\x[
12 34 fA 25 79
12 23 7F e8 99
]end"

is perfectly acceptable.

8) The point of this suggestion is to regularise
the lexicology. All the changes suggested -- except
deprecation of octal defaulting -- are "upwards
compatible" extensions AFAIK.

Suggestions along these lines were rejected by the
C++ committee in part because it was thought that this
was a C issue.

--
JOHN (MAX) SKALLER, INTERNET:max...@suphys.physics.su.oz.au
Maxtal Pty Ltd,
81A Glebe Point Rd, GLEBE Mem: SA IT/9/22,SC22/WG21
NSW 2037, AUSTRALIA Phone: 61-2-566-2189

Stephen Baynes

unread,

May 9, 1995, 3:00:00 AM5/9/95

to

John Max Skaller (max...@Physics.usyd.edu.au) wrote:
: In article <1995May4.1...@nlm.nih.gov>,

: Bob Kline Phoenix Contract <bkline%occs.nlm.nih.gov> wrote:
: >I notice in the most recent draft that a change has been made in
: >octal escape sequences, which used to be a backslash followed by
: >one, two, or three octal digits. The draft now appears to place
: >no limit on the number of digits which can make up an octal escape
: >sequence.

: I'd like ISO C (next standard) and ISO C++ (this standard
: if at all possible, or an addendum for alignment with the next
: ISO C Standard otherwise) to fix this and other related nasty problems.

: My conception is as follows.

: 1) There are 4 numerical bases of importance:

: a) Decimal
: b) Hexadecimal
: c) Binary
: d) Octal

: -- in order of importance IMHO :-)

: 2) The following letters are associated with each base:

: d -- decimal
: x -- hex
: y -- binary
: o -- octal

-snip-
Looks good to me. I suggested something like a subset of this in a recent
issue of C Vu.

: 6) A character or wide constant may use any base letter:
-snip-

: The extremely bad defaulting of

: '\123'

: to octal is deprecated.

I assume that you intend it now to default to decimal - so that '\0' which is
used quite a bit stays the same. If it does not default to decimal then a
standard way of indicating the zero character as used to terminate strings is
useful (I find x[3] = '\0'; tells one something that x[3] = 0; does not.)
I would suggest one of:
1: Keep '\0' as a specific escape
2: Use '\z'
3: #define NIL as (char)0 [ or should that be (int)0 ? ]
[Do not suggest using NULL - there are too many programers
out there that don't know the difference between
*p = 0; and p = 0; for char *p; ]
Can anyone think of a better name?
1 Is backwards compatible which is an advantage.
2 Would give a possible future ability to implementations to change the
terminating character. Also means that most programs can avoid using
numeric escapes completely.
3 As 2. The disadvantage is it can't be put in a string (but then should
the string terminator occur within a string anyway?).

--
Stephen Baynes bay...@mulsoc2.serigate.philips.nl
Philips Semiconductors Ltd
Southampton My views are my own.
United Kingdom

Jonathan de Boyne Pollard

unread,

May 9, 1995, 3:00:00 AM5/9/95

to

Stephen Baynes (bay...@ukpsshp1.serigate.philips.nl) wrote:
: 3 As 2. The disadvantage is it can't be put in a string (but then should

: the string terminator occur within a string anyway?).

YES!!!!!

There are APIs that I call (and I know that other people on other platforms
will have equivalents) which demand an array of consecutive NUL-terminated
strings, terminated by a zero-length string.

The easiest way to hard-code this is :

"blah blah\0more blah\0yet more blah\0\0"

Now if you seriously want to `#define NUL (char)0' then you'll have to
come up with some idiom for the above that is equally easy to code ...

Bob Kline Phoenix Contract

unread,

May 12, 1995, 3:00:00 AM5/12/95

to

John Max Skaller (max...@Physics.usyd.edu.au) wrote:

: [...]

: I'd like ISO C (next standard) and ISO C++ (this standard
: if at all possible, or an addendum for alignment with the next
: ISO C Standard otherwise) to fix this and other related nasty problems.

: My conception is as follows.

: [...]

Interesting suggestions! I don't share your aversion to the existing
octal notation, and I'd say I lean more heavily in the direction of
breaking as little existing code as possible than you might, but I
would certainly welcome some support for binary notation.

: [...] All the changes suggested -- except deprecation of octal

: defaulting -- are "upwards compatible" extensions AFAIK.

: [...]

I believe the deprecation of octal defaulting is not the only suggested
change which would break existing code. Your suggestion that "numeric
escape sequences in strings MUST be terminated" would break far more
code than the latest draft does with its expansion of the number of
octal digits which can appear in an escape sequence (which was probably
an inadvertant slipup -- or so I was informed after my original post by
one of the committee members).

Clive D.W. Feather

unread,

May 12, 1995, 3:00:00 AM5/12/95

to

In article <D86zw...@ucc.su.OZ.AU>,

John Max Skaller <max...@Physics.usyd.edu.au> wrote:
>> I notice in the most recent draft that a change has been made in
>> octal escape sequences, which used to be a backslash followed by
>> one, two, or three octal digits. The draft now appears to place
>> no limit on the number of digits which can make up an octal escape
>> sequence.

Argh. Does this mean C++ is going to be subtly incompatible with C
*again* ?

> 3) integral constants of the forms
> 0d9999
> 0xFFFF
> 0y1010101
> 0o77777
> are permitted.

Instead, why not use a more generic notation (stolen from Algol 68 and
HP's SDL):

Decimal: 10r9999
Hex: 16rFFFF
Binary: 2r1010101
Octal: 8r77777
Duodecimal: 12R47b96

This copes with bases up to 36 very simply, and larger ones by making an
arbitrary decision as to what the representations of the digits should be.

> The extremely bad defaulting of

> 07777
> to octal is deprecated.

I don't think you could ever safely remove this in the future, so don't
depreciate it. Similarly, keep 0x for backwards compatibility.

> 4) Optionally an underscore can separate digits or
> be placed between the base letter and preceding or subsequent digit:

[without affecting the value]
Good idea.

> 5) as usual a long value or unsigned value can be denoted with
> a suffix.
> 12324L
> FFFFUL

Um, my scheme does have problems with that. But they're fixable
(e.g. 8ur77777, 16lrFFFF).

> "abc\x[ FF FF FF ]Fred"
> suggests itself. This means the same as

Also quite a good idea.

> Note also any white spaces can be used in the first form:
> "Start\x[
> 12 34 fA 25 79
> 12 23 7F e8 99
> ]end"

Less so. Currently strings can't extend over a logical line; this is a
simple and useful sanity check.

--
Clive D.W. Feather | If you lie to the compiler, it will get its revenge.
cl...@stdc.demon.co.uk | - Henry Spencer

Ian T Zimmerman

unread,

May 16, 1995, 3:00:00 AM5/16/95

to

In article <3oo2ok$c...@silver.jba.co.uk>,

Jonathan de Boyne Pollard <JdeBP%uto...@jba.co.uk> wrote:
>Stephen Baynes (bay...@ukpsshp1.serigate.philips.nl) wrote:
>: 3 As 2. The disadvantage is it can't be put in a string (but then should
>: the string terminator occur within a string anyway?).
>
>YES!!!!!
>
>There are APIs that I call (and I know that other people on other platforms
>will have equivalents) which demand an array of consecutive NUL-terminated
>strings, terminated by a zero-length string.
>

Hmmm...let me guess...Windoze? :-) Seriously, without denying your
point, those APIs are _evil_. Their designers obviously never heard of
execv().

>The easiest way to hard-code this is :
>
> "blah blah\0more blah\0yet more blah\0\0"
>

That will of course give you _three_ terminating nulls :-)

>Now if you seriously want to `#define NUL (char)0' then you'll have to
>come up with some idiom for the above that is equally easy to code ...

--
Ian T Zimmerman +-------------------------------------------+
P.O. Box 13445 I With so many executioners available, I
Berkeley, California 94712 I suicide is a really foolish thing to do. I
USA <i...@rahul.net> +-------------------------------------------+

John Max Skaller

unread,

May 17, 1995, 3:00:00 AM5/17/95

to

In article <D8HGp...@stdc.demon.co.uk>,

Clive D.W. Feather <cl...@stdc.demon.co.uk> wrote:
>In article <D86zw...@ucc.su.OZ.AU>,
>John Max Skaller <max...@Physics.usyd.edu.au> wrote:
>>> I notice in the most recent draft that a change has been made in
>>> octal escape sequences, which used to be a backslash followed by
>>> one, two, or three octal digits. The draft now appears to place
>>> no limit on the number of digits which can make up an octal escape
>>> sequence.
>
>Argh. Does this mean C++ is going to be subtly incompatible with C
>*again* ?

I hope not -- there seems no particularly strong
reason for them to differ here.

>
>> 3) integral constants of the forms
>> 0d9999
>> 0xFFFF
>> 0y1010101
>> 0o77777
>> are permitted.
>
>Instead, why not use a more generic notation (stolen from Algol 68 and
>HP's SDL):
>
> Decimal: 10r9999
> Hex: 16rFFFF
> Binary: 2r1010101
> Octal: 8r77777
> Duodecimal: 12R47b96
>
>This copes with bases up to 36 very simply, and larger ones by making an
>arbitrary decision as to what the representations of the digits should be.

Two reasons. One -- its completely new, what I'm suggesting
follows the old pattern. The second reason is that 36 bases aren't
useful. 0,1,2,3,8,16 are the only ones I think are worth bothering with.

However the generality of your suggestion has appeal.
I'd not oppose it.

>> The extremely bad defaulting of
>> 07777
>> to octal is deprecated.
>
>I don't think you could ever safely remove this in the future, so don't
>depreciate it. Similarly, keep 0x for backwards compatibility.

0x is OK. But 0 leading octal is a BAD idea because
people used to more modern languages like COBOL <grin> might
expect C to be sensible and permit columns of left zero filled
decimal numbers: a common technique form days of punch cards.
Were this illegal or a warning generated one could fix a bug --
but if 0777 is silently taken as octal when decimal was intended
it would be a hard bug to track down -- not something an non-C
programmer would think of looking for.

Deprecation gives people at least 5 more years to give up
a bad habit -- and the committee a chance to review the issue
even then.

>> 4) Optionally an underscore can separate digits or
>> be placed between the base letter and preceding or subsequent digit:
>
>[without affecting the value]

Yes.

>Good idea.

Very useful for binary machine interfacing -- reading

0y10101010111100110110101111011101
or
2r10101010111100110110101111011101

is kind of hard (are there really 32 bits there?)

>> 5) as usual a long value or unsigned value can be denoted with
>> a suffix.
>> 12324L
>> FFFFUL
>
>Um, my scheme does have problems with that. But they're fixable
>(e.g. 8ur77777, 16lrFFFF).

Yes. But it's a bit more foreign again. Which is not
to say bad, but something closer to what we have now --
a "completion" rather than something new -- might be the go
politically. The issue isn't important enough to spend much
time on, but worth doing one way or the other -- IMHO.

>> "abc\x[ FF FF FF ]Fred"
>> suggests itself. This means the same as
>
>Also quite a good idea.

Hm. I'd like to see more suggestions on this.

>
>> Note also any white spaces can be used in the first form:
>> "Start\x[
>> 12 34 fA 25 79
>> 12 23 7F e8 99
>> ]end"
>
>Less so. Currently strings can't extend over a logical line; this is a
>simple and useful sanity check.

OK. Writing

"Start\x[ 12 34 fa 25 79 ]"
"\x[ 12 23 7F e8 99 ] end"

isn't so bad.

John Max Skaller

unread,

May 17, 1995, 3:00:00 AM5/17/95

to

In article <1995May12.1...@nlm.nih.gov>,

Bob Kline Phoenix Contract <bkline%occs.nlm.nih.gov> wrote:

[binary literals etc]

>
>I believe the deprecation of octal defaulting is not the only suggested
>change which would break existing code. Your suggestion that "numeric
>escape sequences in strings MUST be terminated" would break far more
>code than the latest draft does with its expansion of the number of
>octal digits which can appear in an escape sequence

Yes .. whoa!! I think I must have meant that in order to
write PORTABLE code there MUST be a way of writing arbitrary strings
in which implementor defined limits on the number of digits escaped
didn't change the meaning: actually

"AB" "\x1a" "\x20" "CDEFG"

seems to do this already despite being ugly so perhaps I'm wrong
a change is necessary here. The format I suggested is shorter,
but perhaps not enough to bother with a change.

Kevin Lentin

unread,

May 18, 1995, 3:00:00 AM5/18/95

to

John Max Skaller (max...@Physics.usyd.edu.au) wrote:

> >> Note also any white spaces can be used in the first form:
> >> "Start\x[
> >> 12 34 fA 25 79
> >> 12 23 7F e8 99
> >> ]end"
> >
> >Less so. Currently strings can't extend over a logical line; this is a
> >simple and useful sanity check.
>
> OK. Writing
>
> "Start\x[ 12 34 fa 25 79 ]"
> "\x[ 12 23 7F e8 99 ] end"
>
> isn't so bad.

And...

"Start\x[ 12 34 fa 25 79 "

" 12 23 7F e8 99 ] end"

might even be prefered. If my reading of the phases of compilation is
correct, would the compiler not see one string of characters anyway?

--
[==================================================================]
[ Kevin Lentin |___/~\__/~\___/~~~~\__/~\__/~\_| ]
[ kev...@cs.monash.edu.au |___/~\/~\_____/~\______/~\/~\__| ]
[ Macintrash: 'Just say NO!' |___/~\__/~\___/~~~~\____/~~\___| ]
[==================================================================]

Stephen Baynes

unread,

May 18, 1995, 3:00:00 AM5/18/95

to

John Max Skaller (max...@Physics.usyd.edu.au) wrote:

: In article <D8HGp...@stdc.demon.co.uk>,

: Clive D.W. Feather <cl...@stdc.demon.co.uk> wrote:
: >In article <D86zw...@ucc.su.OZ.AU>,
: >John Max Skaller <max...@Physics.usyd.edu.au> wrote:

: >> 3) integral constants of the forms

: >> 0d9999
: >> 0xFFFF
: >> 0y1010101
: >> 0o77777
: >> are permitted.
: >
: >Instead, why not use a more generic notation (stolen from Algol 68 and
: >HP's SDL):
: >
: > Decimal: 10r9999
: > Hex: 16rFFFF
: > Binary: 2r1010101
: > Octal: 8r77777
: > Duodecimal: 12R47b96
: >
: >This copes with bases up to 36 very simply, and larger ones by making an
: >arbitrary decision as to what the representations of the digits should be.

: Two reasons. One -- its completely new, what I'm suggesting
: follows the old pattern. The second reason is that 36 bases aren't
: useful. 0,1,2,3,8,16 are the only ones I think are worth bothering with.

The imp language (Edinburgh Univesity) used a similar scheame to allow
numbers in different bases and permited up to base 36. In a new release
of the compiler it was decided to limit it to upto base 16. It was then
found out that someone had used base 32 (it was to do a clever scheame
for coding assembler opcodes.) On the other hand thats rather obscure and
they might have been better off doing it in binary with '_' every 7th bit.
That would have been more understandable to the average programer. So
despite the counter example I actually agree with you that only a limited
number of bases are worth bothering with. However I would change your
list. I would add bases 4, 10 and perhaps base 12. I would also like to know
if (and how) you intend to make practical use of bases 0 and 1 :-)

Jeremy Fitzhardinge

unread,

May 18, 1995, 3:00:00 AM5/18/95

to

In <3pelmk$3...@harbinger.cc.monash.edu.au> kev...@fangorn.cs.monash.edu.au (Kevin Lentin) writes:
>And...
> "Start\x[ 12 34 fa 25 79 "
> " 12 23 7F e8 99 ] end"
>
>might even be prefered. If my reading of the phases of compilation is
>correct, would the compiler not see one string of characters anyway?

The "\x[ xx xx xx ]" looks like one long multi-character token which
evaluates to a number of bytes. Splitting a token between multiple
strings would be interesting to deal with, since if consistently
applied, it suggests that any multicharacter token could be split by
constant string concatenation. It would break the ("\012" "123abc")
method of delimiting long escape sequences, as this would be seen as
"\012123abc" before the escapes are dealt with.

Note, I haven't looked through ISO 9899 to see exactly how
multicharacter tokens are currently dealt with in strings, in
particular what happens to them when they are subject to
concatenation.

J

Magao

unread,

May 19, 1995, 3:00:00 AM5/19/95

to

In <D8qKu...@ucc.su.OZ.AU> max...@Physics.usyd.edu.au (John Max Skaller) writes:

>>> Note also any white spaces can be used in the first form:
>>> "Start\x[

>>> 12 34 fA 25 79

>>> 12 23 7F e8 99
>>> ]end"
>>

>>Less so. Currently strings can't extend over a logical line; this is a
>>simple and useful sanity check.

> OK. Writing

> "Start\x[ 12 34 fa 25 79 ]"
> "\x[ 12 23 7F e8 99 ] end"

>isn't so bad.

Other than this it would be possible to write

"start\x[ 12 34 FA 25 79"
"12 23 7F E8 99] end"

just as easily. However, what I look at it your way actually looks
better ;)

Tim Delaney ((TCD Software)
ct...@uow.edu.au

Ulf Schuenemann

unread,

May 24, 1995, 3:00:00 AM5/24/95

to

In article <D86zw...@ucc.su.OZ.AU>,
John Max Skaller <max...@Physics.usyd.edu.au> wrote:

[..]

> 3) integral constants of the forms
> 0d9999
> 0xFFFF
> 0y1010101

^^^^^^^^^
(1) binary numbers
Yes, yes! At least! When programming bitwise operations, it's always a loss
of brain-resources to concentrate on the bitpatterns of 0xDB etc.
[ I can't find your original post any more, Max - What was your reason
for 0y.. instead of 0b..? ]

(2) octal numbers
> 0o77777
and '\o77' (== '\077')
[..]

> The extremely bad defaulting of
> 07777
> to octal is deprecated.

I and many of my collegues could never understand how one can come to
the idea that a leading zero should denote octal numbers. This rule
causes repeated confusion as in everyday live leading zeros have
NO meaning at all. Reading code with octal numbers I've to be very
carefull not to mix them up with decimal numbers. 0o77777 would
surely be a much better alternative.

As there is supposely much (old ?) code around using octal numbers,
it would be a good compromise to depricate the leading-zero-is-octal rule.

(3) grouping digits

> 4) Optionally an underscore can separate digits or
> be placed between the base letter and preceding or subsequent digit:

Good idea. Increases readability of long numbers (most necessarry for
binary numbers but also good for all the others).

(4)
[..]

> "abc\x[ FF FF FF ]Fred"
> suggests itself.

It would be a good idea, but I'm afraid it's too 'complicated' to have
a chance. [ Disclaimer: I've no multibyte-characters experiance, so
maybe I'm not competent enough to make a judgement ].

Ulf Schuenemann

--------------------------------------------------------------------
Ulf Schünemann
Fakultät für Informatik, Technische Universität München, Germany.
email: schu...@informatik.tu-muenchen.de

Bob Kline Phoenix Contract

unread,

May 26, 1995, 3:00:00 AM5/26/95

to

John Max Skaller (max...@Physics.usyd.edu.au) wrote:

: In article <1995May12.1...@nlm.nih.gov>,

: Bob Kline Phoenix Contract <bkline%occs.nlm.nih.gov> wrote:

: [binary literals etc]

: >
: >I believe the deprecation of octal defaulting is not the only suggested
: >change which would break existing code. Your suggestion that "numeric
: >escape sequences in strings MUST be terminated" would break far more
: >code than the latest draft does with its expansion of the number of
: >octal digits which can appear in an escape sequence

: Yes .. whoa!! I think I must have meant that in order to
: write PORTABLE code there MUST be a way of writing arbitrary strings
: in which implementor defined limits on the number of digits escaped
: didn't change the meaning: actually

: "AB" "\x1a" "\x20" "CDEFG"

: seems to do this already despite being ugly so perhaps I'm wrong
: a change is necessary here. The format I suggested is shorter,
: but perhaps not enough to bother with a change.

You are correct. This notation succeeds in removing the ambiguity.
However, as I pointed out at the start of this thread, the only
solution which works with *every* compiler (including older ones
which don't know about string splicing) is the octal notation, using
the rules which existed before the most recent draft. Fortunately,
I am assured that the change to the rule for octal escape sequences
was an inadvertant slip-up, and will be fixed this summer.

--
/*----------------------------------------------------------------------*/
/* Bob Kline Stream International */

/* bob_...@stream.com formerly Corporate Software, Inc. */

Karl Heuer

unread,

May 26, 1995, 3:00:00 AM5/26/95

to

I missed the beginning of this thread, but it sounds like my old article from
the days of the second public review may be of interest. (X3J11 rejected
this due to lack of prior art.)

Karl W. Z. Heuer (ka...@kelp.boston.ma.us), The Walking Lint

Proposal #1

Add new escape sequence \c.

Summary

This proposal cleans up two warts in the language: initializing a
character array without adding a null character, and terminating
a hexadecimal escape which might be followed by a valid hexade-
cimal digit. It also allows the user to explicitly document when
a null character is unnecessary, e.g. write(1,"\n\c",1).

Justification

I presume the Committee is already aware of the need for non-
null-terminated character arrays, since the January Draft makes a
special case for them in 3.5.7. However, the mechanism requires
the user to count the characters himself in order to make sure
that he doesn't leave room for the null characters; this is a
maintenance nightmare. My proposal is a cleaner way to accom-
plish this.

It has been suggested that although an escape to suppress the
null character is useful, the termination of hex escapes is not
an issue because it is handled by string literal pasting.

String pasting is useful for line continuation without
backslash-newline, and for constructing string literals in mac-
ros, but using it to indicate the end of a hex escape is a botch.
This is nearly as bad as suggesting that the whole string be
written in hex.

Moreover, it's very C-specific; one could not advertise a program
that `accepts all the C escapes' as input, without first solving
the hex-termination problem all over again.

Also, it doesn't handle character constants. The example in
3.1.3.4 is clearly a kludge--it suggests replacing the hex escape
with octal. This won't always be possible on an architecture
with 12-bit bytes, for example.

Finally, if the \c escape is added anyway for the null-
suppression feature, the additional change of insisting that it
be a no-op in other contexts is minor.

Specific changes

In 3.1.3.4, page 29, line 10, add \c to the list of escapes. Add
the description: `The \c escape at the end of a string literal
suppresses the trailing null character that would normally be ap-
pended. If \c appears in a character constant, or anywhere in a
string literal other than at the end, then it is ignored, but may
serve to separate an octal or hexadecimal escape from a following
digit.'

In 3.1.3.4, page 30, line 35, change '\0223' to '\x12\c3'.

In 3.1.4, page 31, line 29, after `A null character is then ap-
pended' add `unless the string literal ended with \c'. Make a
similar change to line 31. Add the sentence `If a character
string literal or a wide string literal has zero length, the
behavior is undefined'. Add to footnote 16 the text `or it may
lack a trailing null character because of \c'.

In 3.1.4, page 31, line 41, add `This string may also be denoted
by "\x12\c3"'.

In 3.5.7, page 73, line 23, replace `if there is room or if the
array is of unknown size' with `if it has one'. (The ability to
initialize a non-null-terminated array without using \c may be
listed as a Common Extension.)

[An afterthought not mentioned in the original proposal: the spelling
`\c' was chosen because USG echo already uses that escape to mean
`suppress the terminator', and it seemed a reasonable analogy even
though the terminator is a null character rather than a newline.]

John Max Skaller

unread,

May 28, 1995, 3:00:00 AM5/28/95

to

In article <3pvoff$9...@hpsystem1.informatik.tu-muenchen.de>,

Ulf Schuenemann <schu...@informatik.tu-muenchen.de> wrote:
>
>In article <D86zw...@ucc.su.OZ.AU>,
>John Max Skaller <max...@Physics.usyd.edu.au> wrote:
>[..]
>> 3) integral constants of the forms
>> 0d9999
>> 0xFFFF
>> 0y1010101
> ^^^^^^^^^
>(1) binary numbers
>Yes, yes! At least! When programming bitwise operations, it's always a loss
>of brain-resources to concentrate on the bitpatterns of 0xDB etc.
>[ I can't find your original post any more, Max - What was your reason
>for 0y.. instead of 0b..? ]

'\b' is already specified (backspace)

>> to octal is deprecated.
>
>I and many of my collegues could never understand how one can come to
>the idea that a leading zero should denote octal numbers.

You aren't old enough to have use PDP-11 assembler?
In the days of punch cards and paper tape, it was easier to
extend the lexical analyser of the assembler -- which already
detected that numbers started with digits -- to handle octal
_within_ the "number decoder" by checking for a leading zero.

>This rule
>causes repeated confusion as in everyday live leading zeros have
>NO meaning at all.

Oh yes they do -- in COBOL and some Fortran
it is conventional to fill fixed width numeric fields with digits
rather than blanks. Again, this is related to punch card technology,
but the opposite meaning of leading 0 ensues.

[underscores]

>Good idea. Increases readability of long numbers (most necessarry for
>binary numbers but also good for all the others).

Including hex for 64 and 128 bit integers and beyond. Try

0x12345678 vs 0x_1224_5678

Apparently the human brain can cope with 5 objects at once, some
people can handle 7. I doubt many people would recognize visually
a missing digit in an 8 digit number. I often check this sort of
thing by moving my fingers across the screen :-)

>
>(4)
>[..]
>> "abc\x[ FF FF FF ]Fred"
>> suggests itself.
>
>It would be a good idea, but I'm afraid it's too 'complicated' to have
>a chance. [ Disclaimer: I've no multibyte-characters experiance, so
>maybe I'm not competent enough to make a judgement ].

OK. Probably right.