Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why's of C++ -- Part 3 (string discussion)

9 views
Skip to first unread message

Greg Brewer

unread,
Aug 16, 1999, 3:00:00 AM8/16/99
to
One of the things that impressed me most when I first started using C was
the power of the standard C library. It occurred to me that a lot of these
functions were useful in writting the C compiler. I found the idea of
writting a C compiler in C somewhat amusing. Over the years, I have gotten
a lot of use out of functions suchs as stpcpy, strcspn, etc.

A few years ago, I had to modify a DOS application to allow the user to
enter a printer initialization/deinitialization string and save that as a
default. I was surprised by two shortcomings: that the string functions
does not a function to translate strings from their code representation (eg
"abc\t\a\n") to their internal representation and that there is no line code
sequence for representing the escape character (ie '\e'). I know there is a
hex, octal, etc sequence for that represents escape; the same is true for
tab, bell, and the rest.

Once upon a time, escape was so common for controlling printers and remote
terminals that I don't understand why it is not included in the \ character
set. Since the function for translating strings from their "visual" form is
necessary for the compiler to operate, why isn't the function in the
standard C library.

Are escape sequences so passe that this code is unnecessary? Any comments?
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]

Francis Glassborow

unread,
Aug 17, 1999, 3:00:00 AM8/17/99
to
In article <7p1u5o$ipi$1...@news.hal-pc.org>, Greg Brewer
<nospam...@hal-pc.org> writes

>Once upon a time, escape was so common for controlling printers and remote
>terminals that I don't understand why it is not included in the \ character
>set. Since the function for translating strings from their "visual" form is
>necessary for the compiler to operate, why isn't the function in the
>standard C library.
>
>Are escape sequences so passe that this code is unnecessary? Any comments?

What has any of this to do with C++? In addition I think that the
escape character is a specific code in a particular character set. OTOH
\t, etc. refer to a representation of specific control characters
regardless as to the character set in use.


Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Barry Margolin

unread,
Aug 17, 1999, 3:00:00 AM8/17/99
to
In article <7p1u5o$ipi$1...@news.hal-pc.org>,

Greg Brewer <nospam...@hal-pc.org> wrote:
>Once upon a time, escape was so common for controlling printers and remote
>terminals that I don't understand why it is not included in the \ character
>set. Since the function for translating strings from their "visual" form is
>necessary for the compiler to operate, why isn't the function in the
>standard C library.

These days such things are almost always done using libraries like curses,
so it's not necessary to hard-code escape sequences into applications.
Therefore, they don't generally need to mention the ESC character
explicitly. Even the library doesn't usually need to refer to it, because
they're generally table-driven; the ESC character will only appear in the
terminal control database files (e.g. termcap or terminfo).

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Greg Brewer

unread,
Aug 17, 1999, 3:00:00 AM8/17/99
to

Francis Glassborow <fra...@robinton.demon.co.uk> wrote in message
news:KrOnmwAF...@robinton.demon.co.uk...


> In article <7p1u5o$ipi$1...@news.hal-pc.org>, Greg Brewer

> What has any of this to do with C++? In addition I think that the
> escape character is a specific code in a particular character set. OTOH
> \t, etc. refer to a representation of specific control characters
> regardless as to the character set in use.

What does it have to do with C++? Well, the programming language I was
using was C++.

I'm trying to remember my IBM days dealing with EBCDIC. If I remember
correctly, that character set has no tab, newline, carriage return, or alarm
character. I could easily be wrong, it has been almost 10 years.

Greg Brewer

Max TenEyck Woodbury

unread,
Aug 17, 1999, 3:00:00 AM8/17/99
to

Greg Brewer wrote:
>
> I'm trying to remember my IBM days dealing with EBCDIC. If I remember
> correctly, that character set has no tab, newline, carriage return, or alarm
> character. I could easily be wrong, it has been almost 10 years.
>

This is getting off subject...

EBCDIC has HT, LF, CR and BEL defined. I'm not absolutely positive but I
think FF was also defined. I'd have to dig out an old green card to get
you their values. It's in a box with a bunch of other stuff from the '60s
and '70s so it would be a real tough hunt...

mt...@cds.duke.edu

Pete Becker

unread,
Aug 17, 1999, 3:00:00 AM8/17/99
to
Francis Glassborow wrote:
>
> In addition I think that the
> escape character is a specific code in a particular character set. OTOH
> \t, etc. refer to a representation of specific control characters
> regardless as to the character set in use.
>

Let me put that a bit more strongly. The effect of using \t, etc. is
described in the language definition: \t produces a tab, \a produces
some sort of alert, etc. That is, they have observable consequences, and
those consequences are what the standard describes. What are the
observable consequences of escape?

--
Pete Becker
Dinkumware, Ltd.
http://www.dinkumware.com
---

David R Tribble

unread,
Aug 24, 1999, 3:00:00 AM8/24/99
to
Pete Becker wrote:
>
> Francis Glassborow wrote:
> >
> > In addition I think that the
> > escape character is a specific code in a particular character set.
> > OTOH \t, etc. refer to a representation of specific control
> > characters regardless as to the character set in use.
>
> Let me put that a bit more strongly. The effect of using \t, etc. is
> described in the language definition: \t produces a tab, \a produces
> some sort of alert, etc. That is, they have observable consequences,
> and those consequences are what the standard describes. What are the
> observable consequences of escape?

Yeah, that was more or less the committee response when I suggested
that we needed an '\e' character sequence years ago. Apparently,
no one has been clever enough to come up with a written description
of the semantics of the control character ESC, even though everyone
damn well knows what it means.

Some of the ISO multibyte character sets are defined in terms of
"shift in" and "shift out" sequences, and VT-100 terminal protocol
(which uses ESC extensively) is something of a de facto standard, so
these could probably serve as a guiding example of how to define the
semantics of ESC on paper. It really isn't that mystical, is it?

And don't get me started on "newline" versus "linefeed"...

-- David R. Tribble, da...@tribble.com --

David R Tribble

unread,
Aug 24, 1999, 3:00:00 AM8/24/99
to
Greg Brewer wrote:
>
> Francis Glassborow <fra...@robinton.demon.co.uk> wrote

> > In article <7p1u5o$ipi$1...@news.hal-pc.org>, Greg Brewer
> > What has any of this to do with C++? In addition I think that the

> > escape character is a specific code in a particular character set.
> > OTOH \t, etc. refer to a representation of specific control
> > characters regardless as to the character set in use.
>
> What does it have to do with C++? Well, the programming language I
> was using was C++.
>
> I'm trying to remember my IBM days dealing with EBCDIC. If I remember
> correctly, that character set has no tab, newline, carriage return,
> or alarm character. I could easily be wrong, it has been almost 10
> years.

You're wrong. Perhaps you're remembering that your 3270 terminal
didn't have keys for those characters, which is partially correct,
but EBCDIC does indeed have them. EBCDIC has all of the control
characters that ASCII has, in fact (and about 30 more, to boot).

Char ASCII EBCDIC
NUL 00 00
BEL 07 2F
BS 08 16
HT 09 05
LF 0A 25
VT 0B 0B
FF 0C 0C
CR 0D 0D
NL - 15
DEL 7F 07
etc.

(Much as I hate EBCDIC, I am forced to admit that it did a slightly
better job by providing an explicit "newline" (NL) character;
C/C++ grabbed "linefeed" (LF) for this purpose, which is logically
only half of the motion of a printer head. Oh, well.)

John Hauser

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to

David R Tribble wrote:
>
> Apparently,
> no one has been clever enough to come up with a written description
> of the semantics of the control character ESC, even though everyone
> damn well knows what it means.

> [...]


> VT-100 terminal protocol
> (which uses ESC extensively) is something of a de facto standard, so
> these could probably serve as a guiding example of how to define the
> semantics of ESC on paper.

For the record, the VT-100 use of ESC was based on official ANSI
standards, now known as ISO 2022 and ISO 6429.

- John Hauser

Antoine Leca

unread,
Aug 25, 1999, 3:00:00 AM8/25/99
to

David R Tribble wrote:

>
> Pete Becker wrote:
> >
> > What are the observable consequences of escape?
>
> Yeah, that was more or less the committee response when I suggested
> that we needed an '\e' character sequence years ago. Apparently,

> no one has been clever enough to come up with a written description
> of the semantics of the control character ESC, even though everyone
> damn well knows what it means.
>
> Some of the ISO multibyte character sets are defined in terms of
> "shift in" and "shift out" sequences, and VT-100 terminal protocol

> (which uses ESC extensively) is something of a de facto standard, so
> these could probably serve as a guiding example of how to define the
> semantics of ESC on paper. It really isn't that mystical, is it?

I wonder if the VT-100 terminal protocol do use plain ESC.
My impression was that it does use CSI and other characters that
happen to be defined in the C1 set, thus need to use ESC as a
fall-back mechanism to cooperate with 7-bit medias (the C1 set
is normaly using '\200' to '\237', but can also be obtained by
ESC followed by '\000' to '\037').


Anyway, the real definition of ESC is in ISO 2022 a.k.a. ECMA-35
(available online at <URL:http://www.ecma.ch/stand/ECMA-035.HTM>),
and its main use is to switch character sets when using that standard.
If this is an observable consequence or not is outside my realm.


Antoine

Ken Hagan

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
Greg Brewer wrote in message <7p1u5o$ipi$1...@news.hal-pc.org>...
>...that the string functions do not include a function to translate

>strings from their code representation (eg "abc\t\a\n") to their
> internal representation...

In K&R's original compiler, this was almost certainly done by the
lexer, which was almost certainly generated by lex. Consequently,
they would have had no reason to add such a function to their library.
---

Jerry Leichter

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to

| (Much as I hate EBCDIC, I am forced to admit that it did a slightly
| better job by providing an explicit "newline" (NL) character;
| C/C++ grabbed "linefeed" (LF) for this purpose, which is logically
| only half of the motion of a printer head. Oh, well.)

Getting far away from C++, but ... C (and Unix) had a reasonable basis
in the existing standards for doing this. ASCII(10) always had two
interpretations: As LF (Linefeed) or as NL (New Line). I'm pretty sure
the VT100 allowed you to determine how ASCII(10) would be treated when
received by the terminal; I think the setting was one of those defined
in the ANSI terminal standard, X.34 or something like that.

Why the ambiguity? When ASCII was created, there were two kinds of
"printing" devices in the world: Teletype-like things and line
printers. For Teletype-like things - which included the early video
terminals, then widely describe as "glass TTY's" - LF made sense, was
easy to implement, and could even be useful. Not only that, but it made
the mechanicals simpler if each character only did one thing: LF moved
the platen, CR moved the print element back to the left edge.

For line printers, LF was often too expensive to implement - the line
buffering logic was very simple-minded, of necessity (discrete
transistors and such). New Line meant dump the current buffer and start
filling a new one from the left hand edge. Line printers didn't
implement BACKSPACE either, for the same reason. (Line printers did
generally allow you to control the platen advance separately, so you
could over-print an entire line - software could do things like
underlining or fake bolding by overprinting appropriately in only some
positions - or double space, or skip to the top of a page.) The whole
FORTRAN printing model - with the magic first position - was based on
line printers (and was always a pain to match to terminals).

ASCII tried to cater to both styles of devices with its double meaning
for ASCII(10). In practice, dumb line printers vanished - but their
interpretation of ASCII(10), as chosen for Unix, lives on.

One of the oddities of the evolution of this business....

-- Jerry

Ron Natalie

unread,
Aug 26, 1999, 3:00:00 AM8/26/99
to
Ken Hagan wrote:

> In K&R's original compiler, this was almost certainly done by the
> lexer, which was almost certainly generated by lex. Consequently,
> they would have had no reason to add such a function to their library.

Certainly it was handled by the lexical analyzer, but the K&R
compiler predates YACC and YACC predates LEX.

-Ron
---

Douglas A. Gwyn

unread,
Aug 30, 1999, 3:00:00 AM8/30/99
to
David R Tribble wrote:
> Yeah, that was more or less the committee response when I suggested
> that we needed an '\e' character sequence years ago. Apparently,
> no one has been clever enough ...

I don't know about the C++ group, but the main reason against \e
standardization in C is that ESC is specific to certain codesets
and need not have a representation in others. Odds are that what
you *really* want in all environments where you plan to use ESC
sequences is the numeric value, e.g. #define ESC 27. You can do
that already in Standard C.

There are already standards (e.g. X.64) for the semantics of
escape sequences. Note that an actual VT100 has several weird
quirks (in its microcode) that make it less than an ideal model.

Larry Jones

unread,
Aug 31, 1999, 3:00:00 AM8/31/99
to
Douglas A. Gwyn (DAG...@null.net) wrote:
>
> I don't know about the C++ group, but the main reason against \e
> standardization in C is that ESC is specific to certain codesets
> and need not have a representation in others.

Which, it has been pointed out, is a completely specious argument. The
fact is that all popular, extant codesets have an ESC character, as do
most, if not all, of the unpopular ones. It is at least as common as
BEL (\a), if not more so.

> Odds are that what
> you *really* want in all environments where you plan to use ESC
> sequences is the numeric value, e.g. #define ESC 27. You can do
> that already in Standard C.

In some contexts that's true, but in at least as large a number of
contexts you want the ESC character in the native codeset, not the ASCII
ESC character reguardless of the native codeset.

> There are already standards (e.g. X.64) for the semantics of
> escape sequences. Note that an actual VT100 has several weird
> quirks (in its microcode) that make it less than an ideal model.

In my experience, an actual VT100 has many fewer quirks than most of the
devices that purport to emulate it. ;-)

-Larry Jones

The real fun of living wisely is that you get to be smug about it. -- Hobbes

Barry Margolin

unread,
Aug 31, 1999, 3:00:00 AM8/31/99
to

In article <7qeoku$j...@nfs0.sdrc.com>,

Larry Jones <larry...@sdrc.com> wrote:
>Douglas A. Gwyn (DAG...@null.net) wrote:
>>
>> I don't know about the C++ group, but the main reason against \e
>> standardization in C is that ESC is specific to certain codesets
>> and need not have a representation in others.
>
>Which, it has been pointed out, is a completely specious argument. The
>fact is that all popular, extant codesets have an ESC character, as do
>most, if not all, of the unpopular ones. It is at least as common as
>BEL (\a), if not more so.

But what can you do *portably* with \e? ESC is normally used as the first
character of an escape sequence, and the rest of the escape sequence is
very dependent on the codeset. Since the rest of it will have to be
conditionalized (probably table driven, as in Unix termcap or terminfo), is
it really necessary to have the first character part of the standard?

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Paul D. DeRocco

unread,
Sep 1, 1999, 3:00:00 AM9/1/99
to
Barry Margolin wrote:
>
> But what can you do *portably* with \e? ESC is normally used as the first
> character of an escape sequence, and the rest of the escape sequence is
> very dependent on the codeset. Since the rest of it will have to be
> conditionalized (probably table driven, as in Unix termcap or terminfo), is
> it really necessary to have the first character part of the standard?

It isn't necessary, but it is such a commonly used character that one gets
tired of typing \33 or \x1B. What's more, those numeric forms don't work in
strings if followed by valid digits, so you wind up having to break strings,
e.g., "\x1B" "A" instead of "\eA". Ugly and hard to read.

--

Ciao, Paul D. DeRocco
Paul mailto:pder...@ix.netcom.com
---

Larry Jones

unread,
Sep 1, 1999, 3:00:00 AM9/1/99
to
Barry Margolin (bar...@bbnplanet.com) wrote:
>
> But what can you do *portably* with \e? ESC is normally used as the first
> character of an escape sequence, and the rest of the escape sequence is
> very dependent on the codeset. Since the rest of it will have to be
> conditionalized (probably table driven, as in Unix termcap or terminfo), is
> it really necessary to have the first character part of the standard?

The rest of the escape sequence is very dependent on the codeset of the
*device*, not the codeset of the *program*. If I could write:

printf("\e[H\e[JHello, world!\n");

I would be confident that running that program with the output directed
to an ANSI terminal would clear the screen and write "Hello, world!" in
the top left corner, reguardless of the native codeset of the processor
running the program. That is, the program could well be running on an
EBCDIC system with the output directed to an ASCII device with some
hardware and/or software handling the ASCII/EBCDIC translation. As it
is, I can't write such code since I have to use \033 in ASCII and \047
in EBCDIC.

-Larry Jones

He's just jealous because I accomplish so much more than he does. -- Calvin

Barry Margolin

unread,
Sep 1, 1999, 3:00:00 AM9/1/99
to

In article <37CCE33D...@ix.netcom.com>,

Paul D. DeRocco <pder...@ix.netcom.com> wrote:
>It isn't necessary, but it is such a commonly used character that one gets
>tired of typing \33 or \x1B. What's more, those numeric forms don't work in
>strings if followed by valid digits, so you wind up having to break strings,
>e.g., "\x1B" "A" instead of "\eA". Ugly and hard to read.

Why are you hard-coding escape sequences into C programs, rather than using
libraries like termcap? Why should the C standard be extended to support
something that's only useful for bad programming style? I realize that C
has a number of features that promote poor programming (e.g. gets()) but
these are legacies that we're stuck with for compatibility, not things that
have been added.

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Douglas A. Gwyn

unread,
Sep 3, 1999, 3:00:00 AM9/3/99
to
Larry Jones wrote:
> printf("\e[H\e[JHello, world!\n");

Since this code is already very device-dependent, what is the beef
about it being codeset dependent as well? I.e. the termcap library
should already be responsible for mapping device properties to the
proper strings for output, and it is therefore a perfect place to
translate an abstract notions of "ESC character" to whatever code
is actually needed.

There are a *lot* of special characters, commonly supported in
popular codesets, that are not in the C basic character set and
that have no \-escape sequence defined for them in the C standard.
Some of them I find more useful than ESC, these days. As of C9x,
there is even a standard way to denote them in source code (UCN
notation). I don't think we need to add individual kludges when
there is a sufficient general mechanism.
---

Paul D. DeRocco

unread,
Sep 3, 1999, 3:00:00 AM9/3/99
to

Barry Margolin wrote:
>
> Why are you hard-coding escape sequences into C programs, rather than using
> libraries like termcap? Why should the C standard be extended to support
> something that's only useful for bad programming style? I realize that C
> has a number of features that promote poor programming (e.g. gets()) but
> these are legacies that we're stuck with for compatibility, not things that
> have been added.

Because I'm not writing for Unix, so I have no "termcap" anywhere in sight.
When I'm using escape sequences, I'm writing mostly MS-DOS command line
utilities.

Now that computers can generate music, and not merely "beep", it seems to me
that the alert character is no more peculiar than an escape. And I don't think
I've _ever_ used a piece of hardware that understood vertical tab.

--

Ciao, Paul D. DeRocco
Paul mailto:pder...@ix.netcom.com

David R Tribble

unread,
Sep 3, 1999, 3:00:00 AM9/3/99
to

"Douglas A. Gwyn" wrote:
>
> Larry Jones wrote:
> > printf("\e[H\e[JHello, world!\n");
>
> Since this code is already very device-dependent, what is the beef
> about it being codeset dependent as well? I.e. the termcap library
> should already be responsible for mapping device properties to the
> proper strings for output, and it is therefore a perfect place to
> translate an abstract notions of "ESC character" to whatever code
> is actually needed.

Assuming you're even using the termcap library.

> There are a *lot* of special characters, commonly supported in
> popular codesets, that are not in the C basic character set and
> that have no \-escape sequence defined for them in the C standard.
> Some of them I find more useful than ESC, these days. As of C9x,
> there is even a standard way to denote them in source code (UCN
> notation). I don't think we need to add individual kludges when
> there is a sufficient general mechanism.

I suppose I can live with having to type "...\u001B..." instead of
"...\e...". The only possible drawback I can see is that \u001B is
specifically the ASCII/Unicode ESC character, which could
concievably translate into a character that is not the appropriate
"escape" character on some non-ASCII/Unicode systems (whatever that
means). QoI?

The standard, BTW, does not prohibit implementations from supporting
extensions such as "\E" or even "\ESC\DEL\NUL". They're just not
portable.

-- David R. Tribble, da...@tribble.com --

Larry Jones

unread,
Sep 6, 1999, 3:00:00 AM9/6/99
to
David R Tribble (da...@tribble.com) wrote:
>
> I suppose I can live with having to type "...\u001B..." instead of
> "...\e...". The only possible drawback I can see is that \u001B is
> specifically the ASCII/Unicode ESC character, which could
> concievably translate into a character that is not the appropriate
> "escape" character on some non-ASCII/Unicode systems (whatever that
> means). QoI?

A bigger drawback is the constraint in 6.4.3 that "A universal character
name shall not specify a character whose short identifier is less than
00A0...".

-Larry Jones

Hmph. -- Calvin
---

Douglas A. Gwyn

unread,
Sep 6, 1999, 3:00:00 AM9/6/99
to

Larry Jones wrote:
> A bigger drawback is the constraint in 6.4.3 that "A universal
> character name shall not specify a character whose short identifier
> is less than 00A0...".

I had temporarily forgotten about that.
It makes sense to limit the range of short-identifiers that can be
used in identifiers, but remind me again why some punctuation,
space, and control characters can be expressed via UCN and others
cannot. If we meant to exclude *only* the C basic source
characters then that's what we should have said.

Adam Spragg

unread,
Sep 7, 1999, 3:00:00 AM9/7/99
to ncar.UCAR.EDU

Larry Jones wrote:
>
> A bigger drawback is the constraint in 6.4.3 that "A universal character
> name shall not specify a character whose short identifier is less than
> 00A0...".

Huh? What's the rationale behind that one?

Adam

--
Why is it that the smaller and easier a bug is to fix, the less I want
to actually fix it?

----------------
The opinions expressed in this email are mine alone, and do not
neccesarily represent those of my employer, my parents, or the people
who wrote the email software I use.

Clive D.W. Feather

unread,
Sep 7, 1999, 3:00:00 AM9/7/99
to

In article <37D3FC7D...@null.net>, Douglas A. Gwyn
<DAG...@null.net> writes

>> A bigger drawback is the constraint in 6.4.3 that "A universal
>> character name shall not specify a character whose short identifier
>> is less than 00A0...".

>If we meant to exclude *only* the C basic source


>characters then that's what we should have said.

We meant to exclude:
- the C basic source characters
- the C0 and C1 control spaces (0 to 31, 127 to 159).
The cited words are the minimum that achieve that goal.

--
Clive D.W. Feather | Internet Expert | Work: <cl...@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd. | Home: <cl...@davros.org>
Fax: +44 20 8371 1037 | | Web: <http://www.davros.org>
Written on my laptop; please observe the Reply-To address

Max TenEyck Woodbury

unread,
Sep 8, 1999, 3:00:00 AM9/8/99
to
Larry Jones wrote:
>
> Barry Margolin (bar...@bbnplanet.com) wrote:
>>
>> But what can you do *portably* with \e? ESC is normally used as the first
>> character of an escape sequence, and the rest of the escape sequence is
>> very dependent on the codeset. Since the rest of it will have to be
>> conditionalized (probably table driven, as in Unix termcap or terminfo), is
>> it really necessary to have the first character part of the standard?
>
> The rest of the escape sequence is very dependent on the codeset of the
> *device*, not the codeset of the *program*. If I could write:
>
> printf("\e[H\e[JHello, world!\n");
>
> I would be confident that running that program with the output directed
> to an ANSI terminal would clear the screen and write "Hello, world!" in
> the top left corner, reguardless of the native codeset of the processor
> running the program. That is, the program could well be running on an
> EBCDIC system with the output directed to an ASCII device with some
> hardware and/or software handling the ASCII/EBCDIC translation. As it
> is, I can't write such code since I have to use \033 in ASCII and \047
> in EBCDIC.
>
> -Larry Jones

The best way I have found to handle this kind of thing is to build
a header file with the appropriate character sequences defined as
macros. For example in ASCII.H define:

#define STR_NUL "\000"
#define CHAR_NUL ('\000)
...
#define STR_BEL "\007"
#define CHAR_BEL ('\007)
#define STR_BS "\010"
#define CHAR_BS ('\010)
#define STR_HT "\011"
#define CHAR_HT ('\011)
#define STR_LF "\012"
#define CHAR_LF ('\012)
#define STR_VT "\013"
#define CHAR_VT ('\013)
#define STR_FF "\014"
#define CHAR_FF ('\014)
#define STR_CR "\015"
#define CHAR_CR ('\015)
...
#define STR_ESC "\033"
#define CHAR_ESC ('\033)
...
#define STR_H "\110"
#define CHAR_H ('\110)
...
#define STR_J "\112"
#define CHAR_J ('\112)
...
#define STR_OSB "\134"
#define CHAR_OSB ('\134)

and in VT100.H define something like:

...
#if ('\n == CHAR_LF)
#define TERMCAP_NL "\n"
#else
#define TERMCAP_NL STR_CR STR_LF
#endif
...
#define TERMCAP_INTRO STR_ESC STR_OSB
...
#define TERMCAP_HOME TERMCAP_INTRO STR_H
...
#define TERMCAP_CLEAR_EOS TERMCAP_INTRO STR_J
...
#define TERMCAP_CLS TERMCAP_HOME TERMCAP_CLEAR_EOS
...

then the code would look like:

#include "ASCII.H"
#include "VT100.H"

...

printf( TERMCAP_CLS "Hello, world!" TERMCAP_NL);

Given this approach, it becomes clear that very few of the
output formatting characters belong in the 'C' standard. In
fact, only the escaped characters needed to define the input
language are really needed. \a and \b are included mainly
because a number of old programs use them.

Note that the contents of files like ASCII.H and VT100.H
are 'bindings' of other standards on 'C' and could be required
as part of those other standard, but do NOT belong in the 'C'
standard.

In any case there is no NEED to add \e. Including it would
encourage writing non-portable code.

mt...@cds.duke.edu
---

John Hauser

unread,
Sep 8, 1999, 3:00:00 AM9/8/99
to

Larry Jones:


> A bigger drawback is the constraint in 6.4.3 that "A universal
> character name shall not specify a character whose short identifier
> is less than 00A0...".

Douglas Gwyn:


> If we meant to exclude *only* the C basic source
> characters then that's what we should have said.

Clive Feather:


> We meant to exclude:
> - the C basic source characters
> - the C0 and C1 control spaces (0 to 31, 127 to 159).
> The cited words are the minimum that achieve that goal.

What about \u0024 ($), \u0040 (@), and \u0060 (`)?

- John Hauser

Paul D. DeRocco

unread,
Sep 9, 1999, 3:00:00 AM9/9/99
to

David R Tribble wrote:
>
> I would wager that >50% of all C/C++ code is not intended to be
> portable.

And 99% of all C/C++ code is not intended to be portable to machines that
don't speak ASCII.

--

Ciao, Paul D. DeRocco
Paul mailto:pder...@ix.netcom.com

Clive D.W. Feather

unread,
Sep 9, 1999, 3:00:00 AM9/9/99
to

In article <37D5A7F6...@cs.berkeley.edu>, John Hauser
<jha...@cs.berkeley.edu> writes

>Larry Jones:
>> A bigger drawback is the constraint in 6.4.3 that "A universal
>> character name shall not specify a character whose short identifier
>> is less than 00A0...".

>Clive Feather:


>> We meant to exclude:
>> - the C basic source characters
>> - the C0 and C1 control spaces (0 to 31, 127 to 159).
>> The cited words are the minimum that achieve that goal.
>
>What about \u0024 ($), \u0040 (@), and \u0060 (`)?

The *cited* words (as opposed to the quoted ones) continue with "...
other than 0024 ($), 0040 (@), or 0060 (`),".

They then add: "nor one in the range D800 through DFFF inclusive.",
which exclude the multi-Unicode encoding values.

--
Clive D.W. Feather | Internet Expert | Work: <cl...@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd. | Home: <cl...@davros.org>
Fax: +44 20 8371 1037 | | Web: <http://www.davros.org>
Written on my laptop; please observe the Reply-To address

Douglas A. Gwyn

unread,
Sep 9, 1999, 3:00:00 AM9/9/99
to
David R Tribble wrote:

> Max TenEyck Woodbury wrote:
> > In any case there is no NEED to add \e. Including it would
> > encourage writing non-portable code.
> So? What if I need to?

Lack of \e as a standard escape doesn't prevent you from writing
a non-portable equivalent. I don't see how to make \e portable
anyway, without restricting the run-time codeset choices more
that we ever have before.
---

Paul D. DeRocco

unread,
Sep 9, 1999, 3:00:00 AM9/9/99
to

"Douglas A. Gwyn" wrote:
>
> Lack of \e as a standard escape doesn't prevent you from writing
> a non-portable equivalent. I don't see how to make \e portable
> anyway, without restricting the run-time codeset choices more
> that we ever have before.

If the standard requires BEL (which is meaningless in printed output) or VT
(which I've never found support for in any printer I've used since the
70's), why can't it require ESC? The only codesets I'm aware of are ASCII
and EBCDIC. They both have codes for CR, LF, HT, VT, FF, BS and BEL. And
they both have codes for ESC (0x27 in EBCDIC).

Is it any more likely that some obscure codeset out there doesn't have an
ESC character, than that it doesn't have a BEL or VT character? Are there
any real-world codesets out that have CR, LF, HT, VT, FF, BS and BEL, but
don't have ESC? If there exists a codeset that doesn't have, say, BEL, and
the compiler deals with \a in some nonstandard manner (perhaps with an
error message), it could do the same for \e if it doesn't have ESC.

The objections to \e seem awfully farfetched, compared to its obvious
advantages in typability and readability.

--

Ciao, Paul D. DeRocco
Paul mailto:pder...@ix.netcom.com

Max TenEyck Woodbury

unread,
Sep 10, 1999, 3:00:00 AM9/10/99
to
"Paul D. DeRocco" wrote:
>
> David R Tribble wrote:
> >
> > I would wager that >50% of all C/C++ code is not intended to be
> > portable.
>
> And 99% of all C/C++ code is not intended to be portable to machines that
> don't speak ASCII.

And 90%+ of all portability problems arise because a program that was
never intended to be ported has to be ported. The whole Y2K mess has
come up for similar reasons. This kind of thinking should NOT be
encouraged!

mt...@cds.duke.edu
---

Pete Becker

unread,
Sep 10, 1999, 3:00:00 AM9/10/99
to

"Paul D. DeRocco" wrote:
>
> "Douglas A. Gwyn" wrote:
> >
> > Lack of \e as a standard escape doesn't prevent you from writing
> > a non-portable equivalent. I don't see how to make \e portable
> > anyway, without restricting the run-time codeset choices more
> > that we ever have before.
>
> If the standard requires BEL (which is meaningless in printed output) or VT

The standard doesn't require BEL or VT. It requires '\a' and '\f' and
describes what effects they have.

--
Pete Becker
Dinkumware, Ltd.
http://www.dinkumware.com

Ken Hagan

unread,
Sep 10, 1999, 3:00:00 AM9/10/99
to ncar.UCAR.EDU

Max TenEyck Woodbury wrote in message

>And 90%+ of all portability problems arise because a program that was
>never intended to be ported has to be ported. The whole Y2K mess has
>come up for similar reasons. This kind of thinking should NOT be
>encouraged!


I think the Y2K mess is a result of "not thinking", rather than "thinking
badly". C++ is not one of those fascist languages which teaches you
"good programming".

Constructs which produce unportable code are already quite common.
(reinterpret_cast?) "\e" is being advocated as a convenience for those
(many) people working on systems which have such a beast. I think I
could make a stronger case for \e than for \a.

Steve Clamage

unread,
Sep 10, 1999, 3:00:00 AM9/10/99
to

"Ken Hagan" <K.H...@thermoteknix.co.uk> writes:

>Max TenEyck Woodbury wrote in message
>>And 90%+ of all portability problems arise because a program that was
>>never intended to be ported has to be ported. The whole Y2K mess has
>>come up for similar reasons. This kind of thinking should NOT be
>>encouraged!

>I think the Y2K mess is a result of "not thinking", rather than "thinking
>badly".

You have to put yourself in the position of a programmer in
the late 1960's. Mainframe computers had a few hundred Kbytes of
memory. Minicomputers had maybe 64K of memory. Disk drives were
physically large and expensive, used only for system programs,
swap space, and spooling. (A 10Mb drive was considered big.)
Computers used banks of tape drives for data.

You are writing a program to handle millions of records of
data, each containing several dates. If you can save 2 bytes
per date, it can make the difference between acceptable and
unacceptable overall performance.

You're not stupid. You know that a 2-digit date interface will
fail after 1999. But you know for sure that your program will
no longer be in use that far in the future. Why give up
performance to avoid a problem that will never happen?

That isn't "not thinking" or "thinking badly". It's sound
engineering.

Now put yourself in the position of a DP manager in the 1970's
or 1980's. Assume you know about the 2-digit date problem.
You've got a set of programs that are reliable enough. Replacing
them would cost several large fortunes, and you don't have the
budget for it. Year 2000 is still a long way off, and the
cost in money, time, and possible lost reliability while a
new set of programs mature has no short-term or medium-term
payoff. You could never sell it to top management even if you
thought it was a good idea. Well, you'll be retired before
1999, and it will be somebody else's problem.

I think Max's evaluation is correct.

--
Steve Clamage, stephen...@sun.com

Max TenEyck Woodbury

unread,
Sep 10, 1999, 3:00:00 AM9/10/99
to

"Paul D. DeRocco" wrote:
>
> If the standard requires BEL (which is meaningless in printed output) or VT
> (which I've never found support for in any printer I've used since the
> 70's), why can't it require ESC? The only codesets I'm aware of are ASCII
> and EBCDIC. They both have codes for CR, LF, HT, VT, FF, BS and BEL. And
> they both have codes for ESC (0x27 in EBCDIC).
>
> Is it any more likely that some obscure codeset out there doesn't have an
> ESC character, than that it doesn't have a BEL or VT character? Are there
> any real-world codesets out that have CR, LF, HT, VT, FF, BS and BEL, but
> don't have ESC? If there exists a codeset that doesn't have, say, BEL, and
> the compiler deals with \a in some nonstandard manner (perhaps with an
> error message), it could do the same for \e if it doesn't have ESC.
>
> The objections to \e seem awfully farfetched, compared to its obvious
> advantages in typability and readability.

At best you have argued here that \v and \a should be dropped from the
standard, not that \e should be added. The old values are kept because
old programs use them and because \v is part of the input language. A
compiler that refused to compile a program with an \a in it would not
be a conforming compiler. You will gain nothing with this line of argument.

If you want to make your case, you'll need to explain why \e has to be added,
not why other escaped characters are useless. The only positive argument I've
seen so far is that you would find it convenient and you don't want to use
existing portable methods. That's not enough to mandate such a change and
require all implementations to comply.

mt...@cds.duke.edu

Clive D.W. Feather

unread,
Sep 10, 1999, 3:00:00 AM9/10/99
to

In article <7rbbgb$sem$1...@engnews1.eng.sun.com>, Steve Clamage
<cla...@eng.sun.com> writes

>>I think the Y2K mess is a result of "not thinking", rather than "thinking
>>badly".

It's neither, as Steve partially explained:

>You have to put yourself in the position of a programmer in
>the late 1960's. Mainframe computers had a few hundred Kbytes of
>memory.

[...]


>You are writing a program to handle millions of records of
>data, each containing several dates. If you can save 2 bytes
>per date, it can make the difference between acceptable and
>unacceptable overall performance.

Furthermore, I seem to recall reading (in _The_Mythical_Man_Month,
perhaps) that mainframe memory was *rented* for sums of around $1000 per
kilobyte per month. Suppose your software handed 500 dates in memory at
a time (not unreasonable when merging data) and was responsible for 20%
of machine use (also not unreasonable for a major application). Then the
2 bytes for the century cost *two hundred dollars per month*. Prices
have gone up by at least a factor of 10 by then.

>You're not stupid. You know that a 2-digit date interface will
>fail after 1999. But you know for sure that your program will
>no longer be in use that far in the future.

Even if it is, you can afford to wait several years and let those $200
per month build up to pay the changeover cost in 1995.

>Now put yourself in the position of a DP manager in the 1970's
>or 1980's. Assume you know about the 2-digit date problem.
>You've got a set of programs that are reliable enough. Replacing
>them would cost several large fortunes,

[...]

And it's not unlikely that you'll be moving to a new machine or even
architecture before then. So you can put the rewrite off until the
porting work happens. Again, sound business practice.

--
Clive D.W. Feather | Internet Expert | Work: <cl...@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd. | Home: <cl...@davros.org>
Fax: +44 20 8371 1037 | | Web: <http://www.davros.org>
Written on my laptop; please observe the Reply-To address

Paul D. DeRocco

unread,
Sep 10, 1999, 3:00:00 AM9/10/99
to

Pete Becker wrote:
>
> "Paul D. DeRocco" wrote:
> >
> > "Douglas A. Gwyn" wrote:
> > >
> > > Lack of \e as a standard escape doesn't prevent you from writing
> > > a non-portable equivalent. I don't see how to make \e portable
> > > anyway, without restricting the run-time codeset choices more
> > > that we ever have before.
> >
> > If the standard requires BEL (which is meaningless in printed output)
> > or VT
>
> The standard doesn't require BEL or VT. It requires '\a' and '\f' and
> describes what effects they have.

The standard requires "alert" and "vertical tab", though. (2.2 paras 1 and
3) I was merely using their common ASCII (and EBCDIC) abbreviations. Who's
to say that a compiler writer is more likely to encounter some obscure
execution character set that has no "escape", than that has no "vertical
tab" or "alert"?

What's more, it's obvious what a compiler writer should do when presented
with an execution character set that doesn't contain "escape", or for that
matter "vertical tab" or "alert". If the program uses \e, or \v or \a,
issue an error message, because the attempt to use _any_ character not
actually in the execution character set is a logical error.

--

Ciao, Paul D. DeRocco
Paul mailto:pder...@ix.netcom.com

Paul D. DeRocco

unread,
Sep 11, 1999, 3:00:00 AM9/11/99
to
Max TenEyck Woodbury wrote:
>
> "Paul D. DeRocco" wrote:
> >
> > And 99% of all C/C++ code is not intended to be portable to machines
> > that don't speak ASCII.
>
> And 90%+ of all portability problems arise because a program that was
> never intended to be ported has to be ported. The whole Y2K mess has
> come up for similar reasons. This kind of thinking should NOT be
> encouraged!

I think the two questions are quite different. Adding \e to the legal list
of C++ escapes could be cleanly diagnosed as an error should someone try to
port a program containing it to a character set other than ASCII or EBCDIC
that has no ESC character. And the compiler error message would correctly
reflect an underlying logical problem that would need to be fixed. Y2K bugs
don't have anything to do with porting; they cannot be caught by a
compiler, and wind up being bugs in the code, which is quite another thing.

--

Ciao, Paul D. DeRocco
Paul mailto:pder...@ix.netcom.com

---

Douglas A. Gwyn

unread,
Sep 11, 1999, 3:00:00 AM9/11/99
to

"Paul D. DeRocco" wrote:
> The standard requires "alert" and "vertical tab", though. ... I was

> merely using their common ASCII (and EBCDIC) abbreviations.

Here is a challenge for you: Write a functional specification for
the character which would probably be mapped to ESC in an ASCII
implementation, along the lines of the specifications for other
device control functions. (If we *were* to add \e, we'd have to
come up with such wording.) Then maybe you'll see why this is
not likely to be standardized *at this level*.

Douglas A. Gwyn

unread,
Sep 11, 1999, 3:00:00 AM9/11/99
to

Max TenEyck Woodbury wrote:
> At best you have argued here that \v and \a should be dropped from the
> standard, not that \e should be added. The old values are kept because
> old programs use them and because \v is part of the input language.

Indeed, I would probably support changes along those lines,
even though some miniscule portion of existing source code
might contain them. In environments where they have a use,
I would expect C implementations to continue to support them
as an extension. (This would be allowed if such a change to
the spec were made properly.)

However, since nobody seems to be beating down our door
complaining about these, common sense says that they should
be left alone. There are more important things to worry about.

Al Stevens

unread,
Sep 11, 1999, 3:00:00 AM9/11/99
to

>You have to put yourself in the position of a programmer in
>the late 1960's.

That's what I was doing then. Started in the late 1950s actually.

>Disk drives were
>physically large and expensive, used only for system programs,
>swap space, and spooling.

That's not entirely true. We were using disk drives for random access file
applications starting as early as 1959. As I remember, the software
technology (hashing, indexed-sequential, etc.) for doing that was well
understood even then, so someone must have been doing it even before we did.
Many commercial applications used sequential master files and were indeed
stored on tape, although a typical system might load them to disk to process
them, but we had interactive database queries and updates against permanent
disk files in the late 1950s.

>You're not stupid. You know that a 2-digit date interface will
>fail after 1999. But you know for sure that your program will
>no longer be in use that far in the future.

I have specific memories of thinking exactly that at the time.

>That isn't "not thinking" or "thinking badly". It's sound
>engineering.

Thanks. I've been feeling guilty for the past couple of years.

>Well, you'll be retired before
>1999, and it will be somebody else's problem.

Well, I didn't.

Paul D. Smith

unread,
Sep 11, 1999, 3:00:00 AM9/11/99
to

[ moderator's note: This discussion has gone completely outside the
charter for comp.std.c++, however fascinating the history of
application program development may be. Further discussion that
does not in some small way relate to standard C++ should not
include comp.std.c++ in the list of newsgroups. -sdc ]

%% "Clive D.W. Feather" <cl...@on-the-train.demon.co.uk> writes:

cdwf> In article <7rbbgb$sem$1...@engnews1.eng.sun.com>, Steve Clamage
cdwf> <cla...@eng.sun.com> writes

>>> I think the Y2K mess is a result of "not thinking", rather than "thinking
>>> badly".

cdwf> It's neither, as Steve partially explained:

>> You have to put yourself in the position of a programmer in

>> the late 1960's. Mainframe computers had a few hundred Kbytes of
>> memory.

cdwf> [...]


>> You are writing a program to handle millions of records of
>> data, each containing several dates. If you can save 2 bytes
>> per date, it can make the difference between acceptable and
>> unacceptable overall performance.

cdwf> Furthermore, I seem to recall reading (in
cdwf> _The_Mythical_Man_Month, perhaps) that mainframe memory was
cdwf> *rented* for sums of around $1000 per kilobyte per
cdwf> month. Suppose your software handed 500 dates in memory at a
cdwf> time (not unreasonable when merging data) and was responsible
cdwf> for 20% of machine use (also not unreasonable for a major
cdwf> application). Then the 2 bytes for the century cost *two hundred
cdwf> dollars per month*. Prices have gone up by at least a factor of
cdwf> 10 by then.

I can't believe this is the real reason.

If it really were a case of data storage constraints and money, why
weren&#