A few years ago, I had to modify a DOS application to allow the user to
enter a printer initialization/deinitialization string and save that as a
default. I was surprised by two shortcomings: that the string functions
does not a function to translate strings from their code representation (eg
"abc\t\a\n") to their internal representation and that there is no line code
sequence for representing the escape character (ie '\e'). I know there is a
hex, octal, etc sequence for that represents escape; the same is true for
tab, bell, and the rest.
Once upon a time, escape was so common for controlling printers and remote
terminals that I don't understand why it is not included in the \ character
set. Since the function for translating strings from their "visual" form is
necessary for the compiler to operate, why isn't the function in the
standard C library.
Are escape sequences so passe that this code is unnecessary? Any comments?
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
What has any of this to do with C++? In addition I think that the
escape character is a specific code in a particular character set. OTOH
\t, etc. refer to a representation of specific control characters
regardless as to the character set in use.
Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
These days such things are almost always done using libraries like curses,
so it's not necessary to hard-code escape sequences into applications.
Therefore, they don't generally need to mention the ESC character
explicitly. Even the library doesn't usually need to refer to it, because
they're generally table-driven; the ESC character will only appear in the
terminal control database files (e.g. termcap or terminfo).
--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
Francis Glassborow <fra...@robinton.demon.co.uk> wrote in message
news:KrOnmwAF...@robinton.demon.co.uk...
> In article <7p1u5o$ipi$1...@news.hal-pc.org>, Greg Brewer
> What has any of this to do with C++? In addition I think that the
> escape character is a specific code in a particular character set. OTOH
> \t, etc. refer to a representation of specific control characters
> regardless as to the character set in use.
What does it have to do with C++? Well, the programming language I was
using was C++.
I'm trying to remember my IBM days dealing with EBCDIC. If I remember
correctly, that character set has no tab, newline, carriage return, or alarm
character. I could easily be wrong, it has been almost 10 years.
Greg Brewer
This is getting off subject...
EBCDIC has HT, LF, CR and BEL defined. I'm not absolutely positive but I
think FF was also defined. I'd have to dig out an old green card to get
you their values. It's in a box with a bunch of other stuff from the '60s
and '70s so it would be a real tough hunt...
Let me put that a bit more strongly. The effect of using \t, etc. is
described in the language definition: \t produces a tab, \a produces
some sort of alert, etc. That is, they have observable consequences, and
those consequences are what the standard describes. What are the
observable consequences of escape?
--
Pete Becker
Dinkumware, Ltd.
http://www.dinkumware.com
---
Yeah, that was more or less the committee response when I suggested
that we needed an '\e' character sequence years ago. Apparently,
no one has been clever enough to come up with a written description
of the semantics of the control character ESC, even though everyone
damn well knows what it means.
Some of the ISO multibyte character sets are defined in terms of
"shift in" and "shift out" sequences, and VT-100 terminal protocol
(which uses ESC extensively) is something of a de facto standard, so
these could probably serve as a guiding example of how to define the
semantics of ESC on paper. It really isn't that mystical, is it?
And don't get me started on "newline" versus "linefeed"...
-- David R. Tribble, da...@tribble.com --
You're wrong. Perhaps you're remembering that your 3270 terminal
didn't have keys for those characters, which is partially correct,
but EBCDIC does indeed have them. EBCDIC has all of the control
characters that ASCII has, in fact (and about 30 more, to boot).
Char ASCII EBCDIC
NUL 00 00
BEL 07 2F
BS 08 16
HT 09 05
LF 0A 25
VT 0B 0B
FF 0C 0C
CR 0D 0D
NL - 15
DEL 7F 07
etc.
(Much as I hate EBCDIC, I am forced to admit that it did a slightly
better job by providing an explicit "newline" (NL) character;
C/C++ grabbed "linefeed" (LF) for this purpose, which is logically
only half of the motion of a printer head. Oh, well.)
David R Tribble wrote:
>
> Apparently,
> no one has been clever enough to come up with a written description
> of the semantics of the control character ESC, even though everyone
> damn well knows what it means.
> [...]
> VT-100 terminal protocol
> (which uses ESC extensively) is something of a de facto standard, so
> these could probably serve as a guiding example of how to define the
> semantics of ESC on paper.
For the record, the VT-100 use of ESC was based on official ANSI
standards, now known as ISO 2022 and ISO 6429.
- John Hauser
I wonder if the VT-100 terminal protocol do use plain ESC.
My impression was that it does use CSI and other characters that
happen to be defined in the C1 set, thus need to use ESC as a
fall-back mechanism to cooperate with 7-bit medias (the C1 set
is normaly using '\200' to '\237', but can also be obtained by
ESC followed by '\000' to '\037').
Anyway, the real definition of ESC is in ISO 2022 a.k.a. ECMA-35
(available online at <URL:http://www.ecma.ch/stand/ECMA-035.HTM>),
and its main use is to switch character sets when using that standard.
If this is an observable consequence or not is outside my realm.
Antoine
In K&R's original compiler, this was almost certainly done by the
lexer, which was almost certainly generated by lex. Consequently,
they would have had no reason to add such a function to their library.
---
Getting far away from C++, but ... C (and Unix) had a reasonable basis
in the existing standards for doing this. ASCII(10) always had two
interpretations: As LF (Linefeed) or as NL (New Line). I'm pretty sure
the VT100 allowed you to determine how ASCII(10) would be treated when
received by the terminal; I think the setting was one of those defined
in the ANSI terminal standard, X.34 or something like that.
Why the ambiguity? When ASCII was created, there were two kinds of
"printing" devices in the world: Teletype-like things and line
printers. For Teletype-like things - which included the early video
terminals, then widely describe as "glass TTY's" - LF made sense, was
easy to implement, and could even be useful. Not only that, but it made
the mechanicals simpler if each character only did one thing: LF moved
the platen, CR moved the print element back to the left edge.
For line printers, LF was often too expensive to implement - the line
buffering logic was very simple-minded, of necessity (discrete
transistors and such). New Line meant dump the current buffer and start
filling a new one from the left hand edge. Line printers didn't
implement BACKSPACE either, for the same reason. (Line printers did
generally allow you to control the platen advance separately, so you
could over-print an entire line - software could do things like
underlining or fake bolding by overprinting appropriately in only some
positions - or double space, or skip to the top of a page.) The whole
FORTRAN printing model - with the magic first position - was based on
line printers (and was always a pain to match to terminals).
ASCII tried to cater to both styles of devices with its double meaning
for ASCII(10). In practice, dumb line printers vanished - but their
interpretation of ASCII(10), as chosen for Unix, lives on.
One of the oddities of the evolution of this business....
-- Jerry
> In K&R's original compiler, this was almost certainly done by the
> lexer, which was almost certainly generated by lex. Consequently,
> they would have had no reason to add such a function to their library.
Certainly it was handled by the lexical analyzer, but the K&R
compiler predates YACC and YACC predates LEX.
-Ron
---
I don't know about the C++ group, but the main reason against \e
standardization in C is that ESC is specific to certain codesets
and need not have a representation in others. Odds are that what
you *really* want in all environments where you plan to use ESC
sequences is the numeric value, e.g. #define ESC 27. You can do
that already in Standard C.
There are already standards (e.g. X.64) for the semantics of
escape sequences. Note that an actual VT100 has several weird
quirks (in its microcode) that make it less than an ideal model.
Which, it has been pointed out, is a completely specious argument. The
fact is that all popular, extant codesets have an ESC character, as do
most, if not all, of the unpopular ones. It is at least as common as
BEL (\a), if not more so.
> Odds are that what
> you *really* want in all environments where you plan to use ESC
> sequences is the numeric value, e.g. #define ESC 27. You can do
> that already in Standard C.
In some contexts that's true, but in at least as large a number of
contexts you want the ESC character in the native codeset, not the ASCII
ESC character reguardless of the native codeset.
> There are already standards (e.g. X.64) for the semantics of
> escape sequences. Note that an actual VT100 has several weird
> quirks (in its microcode) that make it less than an ideal model.
In my experience, an actual VT100 has many fewer quirks than most of the
devices that purport to emulate it. ;-)
-Larry Jones
The real fun of living wisely is that you get to be smug about it. -- Hobbes
But what can you do *portably* with \e? ESC is normally used as the first
character of an escape sequence, and the rest of the escape sequence is
very dependent on the codeset. Since the rest of it will have to be
conditionalized (probably table driven, as in Unix termcap or terminfo), is
it really necessary to have the first character part of the standard?
--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
It isn't necessary, but it is such a commonly used character that one gets
tired of typing \33 or \x1B. What's more, those numeric forms don't work in
strings if followed by valid digits, so you wind up having to break strings,
e.g., "\x1B" "A" instead of "\eA". Ugly and hard to read.
--
Ciao, Paul D. DeRocco
Paul mailto:pder...@ix.netcom.com
---
The rest of the escape sequence is very dependent on the codeset of the
*device*, not the codeset of the *program*. If I could write:
printf("\e[H\e[JHello, world!\n");
I would be confident that running that program with the output directed
to an ANSI terminal would clear the screen and write "Hello, world!" in
the top left corner, reguardless of the native codeset of the processor
running the program. That is, the program could well be running on an
EBCDIC system with the output directed to an ASCII device with some
hardware and/or software handling the ASCII/EBCDIC translation. As it
is, I can't write such code since I have to use \033 in ASCII and \047
in EBCDIC.
-Larry Jones
He's just jealous because I accomplish so much more than he does. -- Calvin
Why are you hard-coding escape sequences into C programs, rather than using
libraries like termcap? Why should the C standard be extended to support
something that's only useful for bad programming style? I realize that C
has a number of features that promote poor programming (e.g. gets()) but
these are legacies that we're stuck with for compatibility, not things that
have been added.
--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
Since this code is already very device-dependent, what is the beef
about it being codeset dependent as well? I.e. the termcap library
should already be responsible for mapping device properties to the
proper strings for output, and it is therefore a perfect place to
translate an abstract notions of "ESC character" to whatever code
is actually needed.
There are a *lot* of special characters, commonly supported in
popular codesets, that are not in the C basic character set and
that have no \-escape sequence defined for them in the C standard.
Some of them I find more useful than ESC, these days. As of C9x,
there is even a standard way to denote them in source code (UCN
notation). I don't think we need to add individual kludges when
there is a sufficient general mechanism.
---
Because I'm not writing for Unix, so I have no "termcap" anywhere in sight.
When I'm using escape sequences, I'm writing mostly MS-DOS command line
utilities.
Now that computers can generate music, and not merely "beep", it seems to me
that the alert character is no more peculiar than an escape. And I don't think
I've _ever_ used a piece of hardware that understood vertical tab.
--
Ciao, Paul D. DeRocco
Paul mailto:pder...@ix.netcom.com
Assuming you're even using the termcap library.
> There are a *lot* of special characters, commonly supported in
> popular codesets, that are not in the C basic character set and
> that have no \-escape sequence defined for them in the C standard.
> Some of them I find more useful than ESC, these days. As of C9x,
> there is even a standard way to denote them in source code (UCN
> notation). I don't think we need to add individual kludges when
> there is a sufficient general mechanism.
I suppose I can live with having to type "...\u001B..." instead of
"...\e...". The only possible drawback I can see is that \u001B is
specifically the ASCII/Unicode ESC character, which could
concievably translate into a character that is not the appropriate
"escape" character on some non-ASCII/Unicode systems (whatever that
means). QoI?
The standard, BTW, does not prohibit implementations from supporting
extensions such as "\E" or even "\ESC\DEL\NUL". They're just not
portable.
-- David R. Tribble, da...@tribble.com --
A bigger drawback is the constraint in 6.4.3 that "A universal character
name shall not specify a character whose short identifier is less than
00A0...".
-Larry Jones
Hmph. -- Calvin
---
I had temporarily forgotten about that.
It makes sense to limit the range of short-identifiers that can be
used in identifiers, but remind me again why some punctuation,
space, and control characters can be expressed via UCN and others
cannot. If we meant to exclude *only* the C basic source
characters then that's what we should have said.
Huh? What's the rationale behind that one?
Adam
--
Why is it that the smaller and easier a bug is to fix, the less I want
to actually fix it?
----------------
The opinions expressed in this email are mine alone, and do not
neccesarily represent those of my employer, my parents, or the people
who wrote the email software I use.
>If we meant to exclude *only* the C basic source
>characters then that's what we should have said.
We meant to exclude:
- the C basic source characters
- the C0 and C1 control spaces (0 to 31, 127 to 159).
The cited words are the minimum that achieve that goal.
--
Clive D.W. Feather | Internet Expert | Work: <cl...@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd. | Home: <cl...@davros.org>
Fax: +44 20 8371 1037 | | Web: <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
The best way I have found to handle this kind of thing is to build
a header file with the appropriate character sequences defined as
macros. For example in ASCII.H define:
#define STR_NUL "\000"
#define CHAR_NUL ('\000)
...
#define STR_BEL "\007"
#define CHAR_BEL ('\007)
#define STR_BS "\010"
#define CHAR_BS ('\010)
#define STR_HT "\011"
#define CHAR_HT ('\011)
#define STR_LF "\012"
#define CHAR_LF ('\012)
#define STR_VT "\013"
#define CHAR_VT ('\013)
#define STR_FF "\014"
#define CHAR_FF ('\014)
#define STR_CR "\015"
#define CHAR_CR ('\015)
...
#define STR_ESC "\033"
#define CHAR_ESC ('\033)
...
#define STR_H "\110"
#define CHAR_H ('\110)
...
#define STR_J "\112"
#define CHAR_J ('\112)
...
#define STR_OSB "\134"
#define CHAR_OSB ('\134)
and in VT100.H define something like:
...
#if ('\n == CHAR_LF)
#define TERMCAP_NL "\n"
#else
#define TERMCAP_NL STR_CR STR_LF
#endif
...
#define TERMCAP_INTRO STR_ESC STR_OSB
...
#define TERMCAP_HOME TERMCAP_INTRO STR_H
...
#define TERMCAP_CLEAR_EOS TERMCAP_INTRO STR_J
...
#define TERMCAP_CLS TERMCAP_HOME TERMCAP_CLEAR_EOS
...
then the code would look like:
#include "ASCII.H"
#include "VT100.H"
...
printf( TERMCAP_CLS "Hello, world!" TERMCAP_NL);
Given this approach, it becomes clear that very few of the
output formatting characters belong in the 'C' standard. In
fact, only the escaped characters needed to define the input
language are really needed. \a and \b are included mainly
because a number of old programs use them.
Note that the contents of files like ASCII.H and VT100.H
are 'bindings' of other standards on 'C' and could be required
as part of those other standard, but do NOT belong in the 'C'
standard.
In any case there is no NEED to add \e. Including it would
encourage writing non-portable code.
Larry Jones:
> A bigger drawback is the constraint in 6.4.3 that "A universal
> character name shall not specify a character whose short identifier
> is less than 00A0...".
Douglas Gwyn:
> If we meant to exclude *only* the C basic source
> characters then that's what we should have said.
Clive Feather:
> We meant to exclude:
> - the C basic source characters
> - the C0 and C1 control spaces (0 to 31, 127 to 159).
> The cited words are the minimum that achieve that goal.
What about \u0024 ($), \u0040 (@), and \u0060 (`)?
- John Hauser
And 99% of all C/C++ code is not intended to be portable to machines that
don't speak ASCII.
--
Ciao, Paul D. DeRocco
Paul mailto:pder...@ix.netcom.com
>Clive Feather:
>> We meant to exclude:
>> - the C basic source characters
>> - the C0 and C1 control spaces (0 to 31, 127 to 159).
>> The cited words are the minimum that achieve that goal.
>
>What about \u0024 ($), \u0040 (@), and \u0060 (`)?
The *cited* words (as opposed to the quoted ones) continue with "...
other than 0024 ($), 0040 (@), or 0060 (`),".
They then add: "nor one in the range D800 through DFFF inclusive.",
which exclude the multi-Unicode encoding values.
--
Clive D.W. Feather | Internet Expert | Work: <cl...@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd. | Home: <cl...@davros.org>
Fax: +44 20 8371 1037 | | Web: <http://www.davros.org>
Written on my laptop; please observe the Reply-To address