Works Digest Monday, 22 November 1982 Volume 2 : Issue 88
Notes - Communications Breakthrough,
Hardware - The Corvus Concept,
Programming - Extending the ASCII character set (6 msgs)
Mail-from: ARPANET site PARC-MAXC rcvd at 10-Nov-82 2233-EST
Date: 10-Nov-82 19:36:21 PST (Wednesday)
From: Hamilton.es at PARC-MAXC
Subject: Communications Breakthrough
To: Human-nets @ Rutgers
Mail-from: Arpanet host CMU-10A rcvd at 10-NOV-82 0826-PST
Date: 10 November 1982 1126-EST (Wednesday)
From: James.Morris at CMU-10A
To: csl^ at PARC-MAXC, isl^ at PARC-MAXC, junk^ at PARC-MAXC
Subject: Communications Breakthrough
Message-Id: <10Nov82 112614 JM90@CMU-10A>
Because you can't see the person who is sending you electronic
mail you are sometimes uncertain whether they are serious or
joking. Recently, Scott Fahlman at CMU devised a scheme for
annotating one's messages to overcome this problem. If you turn
your head sideways to look at the three characters :-) they look
sort of like a smiling face. Thus, if someone sends you a
message that says "Have you stopped beating your wife?:-)" you
know they are joking. If they say "I need to talk to you :-(",
be prepared for trouble.
Since Scott's original proposal, many further symbols have bee
(:-) for messages dealing with bicycle helmets
@= for messages dealing with nuclear war
<:-) for dumb questions
oo for somebody's head-lights are on messages
o>-<|= for messages of interest to women
~= a candle, to annotate flaming messages
So you see, bit-map displays are really quite unnecessary :->
Date: 21 November 1982 05:51-EST
From: "James Lewis Bean, Jr." <BEAN at MIT-MC>
Subject: The Corvus Concept
Does anyone have anything good to say about one of these machines?
I went to see one at a "Computer Store" and seemed to know more
about the machine than the sales people did, oh well. It looks
great! The price is great. The problem seems to be there is no
documentation on the software that exists. And there doesn't appear
to be much software for it.
For those of you who do not know what a corvus concept is...
The concept is an under $5K workstation.
68000 based with 256K standard.
15 inch 35mhz video
Bit mapped 720x560 display.
120x56 in landscape mode
90x72 in portrait mode
Two serial ports.
One omninet interface (1mbps serial rs422)
Detached Keyboard with lots of extra keys.
Anyone know where I can get unix for it?
bean at mit-mc
Date: 13 Nov 1982 09:12:38-PST
From: mo at LBL-UNIX (Mike O'Dell [system])
Subject: SUPER ASCII
The folks who are grossly misspeaking about what ASCII is and is not
should look at the full specification. A while back there was a
series of articles in IEEE Computer written by one of the prime
movers in the ASCII effort talking about just such things. All the
(serious?) extension people have mentioned [mathematics, other
(non-English!!) languages, some graphics symbology] are included
in the ISO code structure, of which 7-bit ASCII is only a TINY
subset. There are defined codes for escaping to a great many
alphabets, and if you layer NAPLS on top (which does so quite
cleanly as it was designed with full knowledge of the ISO code
structure) you can even define the glyphs represented by new codes.
There are in fact TWO issues here - do you seriously want to define
a discrete character code for every character in every font in every
point size?? If so, 32-bit will barely suffice. The ISO code
structure contains provides ways for "escape" mechanisms for getting
to other protocols. One of these other protocols would be a
presentation protocol like NAPLS or something similar which will
specify glyph representation, font, pointsize, etc. The base
alphabet would only represent the character which is presented in
the form "current" in the presentation protocol. This means there
is a tight coupling between the layers when handling real data, but
a cross-product structure is surely more desirable than a flat
enumeration of "all symbols needed for man's knowledge!" (excuse
A good friend of mine has a quote which I dearly love:
"Creativity is no substitute for knowing what you're doing."
Date: 13 November 1982 15:22-EST
From: "Marvin A. Sirbu, Jr." <SIRBU at MIT-MC>
Subject: Character codes
ISO standard 222 (?) specifies a standard method for extending the 8
bit Teletex code by swapping in an alternate 128 character set.
Already 45 different alternate character sets have been registered
with the ISO including Greek, Russian, Arabic, etc. Efforts are now
underway within the CCITT Study Group VIII to define a new extension
technique which would allow for two bytes to specify a single
character -- i.e. a 16-bit code. The primary reason for going to 16
bits is to encode Kanji (Japanese and Chinese) characters.
Date: 15 Nov 1982 at 1100-PST
Subject: Re: Supercodes
From: chesley.tsca at SRI-Unix
One very simple way to allow extended codes is to define the
8th bit as a "select character set" bit. If on, the lower 7 bits
select a new character set (a sort of super shift-out). The default
could be ASCII, thus keeping compatibility with existing standards.
A couple of additional features allow even more expansion:
(1) Reserve one of the character set numbers to mean "select
second level character set"; i.e., next byte has character set
number. This can be repeated indefinitely to allow an arbitrary
number of character sets.
(2) Divide the character sets into groups, each with a
different number of bits per character: 8, 16, 24, 32. This allows
arbitrary size character sets.
Bucky bits can be handled by having "ASCII", "Meta-ASCII",
etc. And it's quite simple at the keyboard interface to translate
top bit set (i.e., meta) into "change-set,character,change-set".
So this could be added to an existing system with very
little work, and people can go off and start defining new character
sets right away...
Date: 16 Nov 82 8:35:40-PST (Tue)
To: works at Rutgers
From: hplabs!hao!csu-cs!bentson at Ucb-C70
Subject: Re: Supercodes
See ANSI standard X3.41 "...Code Extension Techniques..." for a
general structure upon which to hang an extensible character set.
There's also a proposed ISO/DP 6937 "Coded Character Sets for Text
I would like to see more of the "technocrats" in the standards
committees; the last meeting I attended, there was a strong push
towards restricting a "page image format" character set to a great
deal LESS than the 128 char ASCII set because of the number of dumb
terminals in existence! Fortunately it was finally recognized that
the standard wouldn't be published for years so we would be able to
(and were obliged to) look ahead to that time.
If you have a real interest in all this, contact:
Thomas N. Hastings
Character Sets and Coding
Digital Equipment Corp
Merrimack, N.H. 03054
He should be able to point you to the proper working group. I have
been out of the "establishment of standards" activity for some time
now, so I hope the above address is still correct.
Colo State U - Comp Sci
Date: 17 Nov 1982 0657-EST
From: Marc Shapiro <Shapiro at MIT-XX>
A recent message to WorkS discussed the CCITT/Teletex extensions to
the Ascii alphabet, correctly noting that it allows all diareses
combinations (i.e. accents and pronunciation marks on letters) plus
a lot of new symbols (arrows, greek letters, math symbols), colors,
- the standard does *not* require a code per diaresis combination.
Rather, there is a code for the letter + a code for the accent
mark. The same is of course true for underlined text, colored text,
- the message stated that the standard needs 8bits/char., and hence
is unsuitable for systems where the 8th bit is reserved for parity
or doesn't exist (DEC equipment). Not quite true. The standard
provides both for 8-bit and 7-bit encoding, the latter with longer
The standard is compatible with Ascii. Reference:
CCITT yellow book
Volume VII - Fascicule VII.2
Telegraph and Telematic Services Terminal Equipment
Recommendations of the Sand T series.
Presentation Level Protocol
Bell System, May 1981
PS The standard also defines mosaic graphics "a la" bit-map.
Date: 18 Nov 1982 1603-PST
From: Pierre MacKay <MACKAY at WASHINGTON>
Subject: Range of ASCII, alias ISO 646-1973
To: LES at SU-AI
cc: Furuta at WASHINGTON, Binding at WASHINGTON,
Your 8 bit ASCII message of 10 Nov 1982, found its way to me by a
somewhat roundabout route, since I am not on the WorkS list, and,
given the size of my mail file as it is, I am hesitant to get there.
You underestimate the range of even 7-bit ASCII. In conjunction
with the appropriate escape sequences from ISO 2022-1973, alias (for
all practical purposes) ANSI X3.41-1974, the good old 7-bit table
speaks several languages. For instance:
Greek---ISO 5428-1980 (I haven't actually seen this yet.
Japanese---National standard C6220-1969 (katakana only, of
course, and this, in the form JISCII is a true 8-bit code,
with ASCII residing in columns 0..7 and katakana in columns
Russian---GOST 13052-67, a dreadful aberration set up for
the use of SO and SI coding, with the Cyrillic alphabet
scrambled to match the visually similar Latin letters. Why
even a Commissar would want to do that to his own language
is beyond me, but it is AUTHORITATIVE, under the
The Arabic case is chaos. There is no reason why a good, efficient
Arabic script coding table cannot be included in a 7-bit range. I
am working with one now, but it is rather my own invention. It
resembles some of the work done by ISO TC-46 and similar work done
at the Library of Congress. There was a fine suggestion put forward
at Riyadh, Saudi Arabia, about two and a half years ago, but it came
to nothing, and a dreadful Moroccan notion, cobbled up out of a set
of linotype matrices now has a certain currency, in that it has been
registered, whatever that means, as Number 59, dated June 1, 1982
with ISO. It includes 4 ISO 2022 escape sequences to identify G0,
G1, G2, and G3 graphic sets, but does not say what is to be done
with all these alternatives. ECMA has plunged into the same waters
with an entirely different proposal, which may even be worse. They
all seem to assume that all Arabic ligature forms must be shown in
the coding table, rather as if Don Knuth's TeX were to require the
elimination of the open and close brace character positions so that
you could code the double-f ligatures directly. The implications of
microprocessor technology have not yet got through.
Urdu, Pashto and Sindhi would probably overload a 7-bit table, since
you are really dealing with two incompatible alphabets mashed into
one in those cases. Malay and Chinese-Turkish (as seen on the lower
right corner of PRC banknotes) will fit. Persian, of course will
fit easily, as will Ottoman Turkish, a language for which I have a
bizarre atavistic affection. Western Europe and Hungary have
national versions of ISO 646 to account for heavily used
diacriticals. I don't know about Czech, which is a bit overloaded.
Modern Turkish is a nice problem too.
I believe the Sanskrit-derived Indian languages would fit, and the
Tamil family would certainly fit in a 7-bit table. Chinese, and
Japanese Kanji would not. The Japanese use a manageable subset of
Chinese ideographs, and have already established a multi-bit code.
One proposal for Chinese uses the 94 cells available in the Graphic
area of ISO 646 in a three level code. There are 94 books of 94
pages each of 94 characters each, or 94 to the third power possible
characters. That should suffice even for Chinese.
Date: 18 Nov 82 15:30:53-PST (Thu)
From: hplabs!intelqa!omsvax!bc at Ucb-C70
Subject: Re: Supercodes
The address given for Tom Hastings of the ANSI X3L2 character codes
committee was out of date. The correct address is:
Thomas N. Hastings
Character Sets and Codings
Digital Equipment Corp.
146 Main St.
Maynard, Mass. 01754
As member of the standards community (X3H3, computer graphics, and
liaison to X2L2), I implore anyone planning on extending the ASCII
(or any other) character set to contact Tom and get information from
him. There are far too many partly or wholly incompatible
"standards" in the world now, more are not needed. Besides which,
someone out there may have already solved your problem, and saved
you a lot of work. I think it was Robert Heinlein who said that a
good engineer is adept at recognizing good work, and using it, with
the serial numbers filed off as appropriate.
End of WorkS Digest