Apostrophe

Markus Kuhn

unread,

Mar 17, 2000, 3:00:00 AM3/17/00

to

Uriel Wittenberg <uri...@tiac.net> wrote in <37A7B129...@tiac.net>:
> > http://www.htmlhelp.com/reference/charset/iso032-063.html shows ISO
> > 8859-1 character 39 to be a nice-looking, curvy apostrophe. That
> > page displays it using a .GIF file. But Netscape 4.61 displays that
> > character as a vertical mark. Why the difference?

This is not a proper table of ISO 8859-1. Better look instead at

http://www.cl.cam.ac.uk/~mgk25/ucs/Unicode-ASCII.gif

which shows what the ASCII etc. tables *really* look like in all the
recent ANSI/ECMA/ISO/Unicode standards.

Henry Churchyard wrote:
> In the original ASCII-1968 character set standard, 96 and 39 were
> intended as a corresponding open-quote/close-quote pair, but in the
> early 1980's or thereabouts, some ANSI standard or other suggested
> that 39 should be a vertical (or "neuter") single quote, while 96 was
> redefined as a "spacing grave accent" (whatever that means).

A new web page discussing the problem of the apostrophe/grave accent
versus opening and closing quotation mark issue in detail is available
on

http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

The most important bit is that authors should *not* use the ASCII
characters 0x27 and 0x60 as pairs of opening and closing
quotation marks, because that's what they stopped looking like
on most systems. Even in X11 fonts the straight ANSI/ISO/Unicode
apostrophe on 0x27 is now being introduced. If you want to have a
proper curly apostrophe or right quotation mark, then you have to
use the Unicode character ’ which is intended for exactly
that purpose.

A demonstration of Unicode's curly/curved quotation marks is on

http://www.cl.cam.ac.uk/~mgk25/ucs/CP1252.html

By the way, the new Unicode 3.0 standard, which discusses among
many other things also the history and semantics of the curly
and straight apostrophe in detail in section 6.1, has just
been published:

http://www.amazon.com/exec/obidos/ASIN/0201616335/mgk25

It is a very useful reference if you want to have good charset
tables and lots of background information.

Markus

--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>

Thierry Bouche

unread,

Mar 17, 2000, 3:00:00 AM3/17/00

to

mg...@cl.cam.ac.uk (Markus Kuhn) writes:

> This is not a proper table of ISO 8859-1. Better look instead at
>
> http://www.cl.cam.ac.uk/~mgk25/ucs/Unicode-ASCII.gif
>
> which shows what the ASCII etc. tables *really* look like in all the
> recent ANSI/ECMA/ISO/Unicode standards.

Well, Unicode only dealing with charachters, it's up to the font
designer to affect what glyph she wants to what slot! As the "straight
ascii single quote" has no known usage, i don't see the point in
putting that glyph anywhere into an actual font (except as a nice way
for pseudo-typographers to demonstrate their incompetence when using
some dumb wysiwyg softwares...).

Th. B.

Jukka Korpela

unread,

Mar 17, 2000, 3:00:00 AM3/17/00

to

On 17 Mar 2000 14:18:59 GMT, mg...@cl.cam.ac.uk (Markus Kuhn) wrote:

>The most important bit is that authors should *not* use the ASCII
>characters 0x27 and 0x60 as pairs of opening and closing
>quotation marks, because that's what they stopped looking like
>on most systems.

Agreed, but on different grounds: the _meaning_ of 0x60 (U+0060) as
defined in character set standards is 'grave accent'. It is true that
the character has little if any use in its original meaning, but it
has then been taken to secondary uses (e.g. as a "backquote" in some
command or programming languages). It just adds to the confusion if it
is arbitrarily used for other purposes. Some notes on the grave
accent: http://www.hut.fi/u/jkorpela/latin1/3.html#60

>Even in X11 fonts the straight ANSI/ISO/Unicode
>apostrophe on 0x27 is now being introduced.

Interestingly, it seems to me that IE displays or prints the
apostrophe sometimes as vertical, sometimes as curly, with no apparent
logic. The vertical display is the correct one. Admittedly the curly
one is often nice and what the author really wanted, but it's still
incorrect: the apostrophe is defined as a character with a vertical
glyph (yes, the Unicode standard says that - it usually does not
comment on glyph appearance, but here it does).

>If you want to have a
>proper curly apostrophe or right quotation mark, then you have to
>use the Unicode character ’ which is intended for exactly
>that purpose.

Correct, but in practical HTML authoring, ’ is certainly to be
preferred at present - it has a fair chance of getting displayed
correctly under favorable circumstances. (I was surprised at seeing
that IE 5 actually supports the hexadecimal notation too. But Netscape
4.5 doesn't for example.)

Usual caveats apply - one needs to weigh the better typographic
appearance (and more logical use of characters) against the practical
risk of making an HTML document messy in this respect. Using an
apostrophe is _safe_. My advice is to use characters like ’ only
if there is some _other_, more compelling reason to use &#bignumber;
references. It might be acceptable to reduce universal readability if
you need to be able to present a wide repertoire of characters (say,
mathematical and other special symbols), but _merely_ typographic
reasons are not good enough, IMHO. (Unless your document is _about_
typography or the use of punctuation characters.)

(On the other hand, I think was a very unfortunate decision by the
Unicode consortium to define the punctuation apostrophe to be the same
character as the right single quotation mark. They are logically quite
distinct characters and occur in similar contexts too.)
--
Yucca, http://www.hut.fi/u/jkorpela/
Brevis esse laboro, obscurus fio.

Alan J. Flavell

unread,

Mar 17, 2000, 3:00:00 AM3/17/00

to

On Fri, 17 Mar 2000, Jukka Korpela wrote:

[much with which I agreed]

> (On the other hand, I think was a very unfortunate decision by the
> Unicode consortium to define the punctuation apostrophe to be the same
> character as the right single quotation mark. They are logically quite
> distinct characters and occur in similar contexts too.)

I think so too.

And an HTML-specific issue (f'ups narrowed), it's a great pity that
browser makers didn't take up the HTML+/HTML3.0 idea of having a
<Q>...</Q> markup, which the client agent would render with the best
pair of quotation characters that it had at its disposal and according
to the appropriate language/locale rules.

But at this late stage, where <Q> markup is just as likely to
disappear without trace, the markup is so good as unusable in a WWW
authoring context, unfortunately.

Stefan Zingg

unread,

Mar 17, 2000, 3:00:00 AM3/17/00

to

Thierry Bouche wrote:
>
> As the "straight
> ascii single quote" has no known usage, i don't see the point in
> putting that glyph anywhere into an actual font (except as a nice way
> for pseudo-typographers

I'm using it for inch and second (and the single straight quote for foot
and minute). Well, I guess I'll have to live with the attribute "pseudo".

Stefan

Char

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

Thierry Bouche wrote:
... the "straight ascii single quote" has no known usage ...

On the contrary,the straight single and double quotes are the ONLY
characters that
are proper for indicating the units feet and inches or minutes and
seconds of arc.

Dennis

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

"Char" <char@cter.s> wrote in message news:38D2D8D3.F5B728E8@cter.s...

Per usual I may be missing the thrust of this, but I was told that
for a quote *within* a quote, you use apostrophes.
--
----------------------------------------------------------------------------
Note: Change NoSpam to wildtrumpet for e-mail replies
----------------------------------------------------------------------------

Markus Kuhn

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

Thierry Bouche <Thierry...@ujf-grenoble.fr> writes:
>Well, Unicode only dealing with charachters, it's up to the font

>designer to affect what glyph she wants to what slot! As the "straight

>ascii single quote" has no known usage, i don't see the point in
>putting that glyph anywhere into an actual font (except as a nice way

>for pseudo-typographers to demonstrate their incompetence when using
>some dumb wysiwyg softwares...).

It is not true that the U+0027 straight single quotation mark has no use.
It is the proper glyph shape to use in environments like typewriter
text where you have no directional quotation marks and therefore use the
same glyph on both the left and the right side of a quotation.
Using two right curly quotation marks to open and close a quotation
looks rather ugly to me (even though I understand that this is
proper typography for Swedish, as Unicode 3.0 claims on page 152).

For example in computer programming languages such as C, I am forced
by the language to use the same single quotation mark character to open
and close a character constant, and there I personally think that the
straight one looks considerably better here than the right curly one.

Similarly, if you write heights in archaic English units (e.g., 6'3")
and the single quote is curly while the double quote is straight, this
also looks tremendously ugly to my eyes, while Unicode's ASCII
glyphs go here very well next to each other.

I agree that it is somewhat debatable why ANSI X3.4 changed 0x27 to
be the single quote, but I guess it is far too late now to change this
back and I strongly disagree that the character as such is useless.
It is certainly a good thing that Unicode provides us now with single
and double quotes in both left, right, and neutral shapes, such
that we can pick whatever is most appropriate. For printing programming
language source code and other typewriter-style output, the neutral
forms are definitely more appropriate, and ASCII certainly was
historically intended to control electric typewriters and not
fancy typesetters.

http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

Kai Henningsen

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

mg...@cl.cam.ac.uk (Markus Kuhn) wrote on 17.03.00 in <8atesj$i6l$3...@pegasus.csx.cam.ac.uk>:

> Uriel Wittenberg <uri...@tiac.net> wrote in <37A7B129...@tiac.net>:
> > > http://www.htmlhelp.com/reference/charset/iso032-063.html shows ISO
> > > 8859-1 character 39 to be a nice-looking, curvy apostrophe. That
> > > page displays it using a .GIF file. But Netscape 4.61 displays that
> > > character as a vertical mark. Why the difference?
>

> This is not a proper table of ISO 8859-1. Better look instead at
>
> http://www.cl.cam.ac.uk/~mgk25/ucs/Unicode-ASCII.gif
>
> which shows what the ASCII etc. tables *really* look like in all the
> recent ANSI/ECMA/ISO/Unicode standards.
>

> Henry Churchyard wrote:
> > In the original ASCII-1968 character set standard, 96 and 39 were
> > intended as a corresponding open-quote/close-quote pair, but in the
> > early 1980's or thereabouts, some ANSI standard or other suggested
> > that 39 should be a vertical (or "neuter") single quote, while 96 was
> > redefined as a "spacing grave accent" (whatever that means).

I don't think I have *ever* seen a table for ASCII or related sets where '
was *not* supposed to be straight.

Kai
--
http://www.westfalen.de/private/khms/
"... by God I *KNOW* what this network is for, and you can't have it."
- Russ Allbery (r...@stanford.edu)

Kai Henningsen

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

Jukka....@hut.fi (Jukka Korpela) wrote on 17.03.00 in <hlr4dsg9n7u9m64nr...@4ax.com>:

> (On the other hand, I think was a very unfortunate decision by the
> Unicode consortium to define the punctuation apostrophe to be the same
> character as the right single quotation mark. They are logically quite
> distinct characters and occur in similar contexts too.)

Historical accident. Unicode and even ISO 10646 *MUST* be as compatible
with ASCII as possible, for a number of technical and political reasons.

It might have been a nicer charset without that requirement, but it would
probably also have been dead as a doornail.

Alan

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

On 17 Mar 2000 14:18:59 GMT, mg...@cl.cam.ac.uk (Markus Kuhn) wrote:

>
> http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

>
>The most important bit is that authors should *not* use the ASCII
>characters 0x27 and 0x60 as pairs of opening and closing
>quotation marks, because that's what they stopped looking like

>on most systems. Even in X11 fonts the straight ANSI/ISO/Unicode
>apostrophe on 0x27 is now being introduced. If you want to have a

>proper curly apostrophe or right quotation mark, then you have to
>use the Unicode character ’ which is intended for exactly
>that purpose.
>

Problem is that I don't have a key on my keyboard that generates
’ whereas I do have a '. Even more so the `, which is another 8
character code for the lquote.

Which idiots decided that we need access to the grave accent and foot
mark more than we need normal punctuation used in almost every
sentence of text written?

Why on earth didn't they put the grave and foot mark on the weird
placing and allow people to use a convenient key for a normal
character?

This is not a trivial matter*. Has anyone noticed that in more and
more typesetting you now see ' instead of rquote, and even " instead
of l & r doublequote? Even on expensive large colour print ads, and
book covers? It's because the quotation marks needed for normal text
aren't where people expect them to be. A related degradation is the
use of hyphens instead of dashes. Why do I have `'[]{}<>~^_|\ on my
keyboard? Because they're used by computer programmers. That's
convenient for the 1% of the population using computers who program.
The rest can't find the normal characters used in writing English
because some geeks don't know or care about typography, but had the
arrogance to design the character sets that have to be used for it.

*If you care about type

Andreas Prilop

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

In article <8avhhs$mgf$2...@pegasus.csx.cam.ac.uk>,
mg...@cl.cam.ac.uk (Markus Kuhn) wrote:

> Similarly, if you write heights in archaic English units (e.g., 6'3")
> and the single quote is curly while the double quote is straight, this
> also looks tremendously ugly to my eyes, while Unicode's ASCII
> glyphs go here very well next to each other.

If you mean a length, it should read
6 ft + 3 in
If you mean an angle, it should read
6′ + 3″

<http://www.unics.uni-hannover.de/ntr/russisch/andere_einheiten.html>
<http://www.unics.uni-hannover.de/ntr/russisch/alte_einheiten.html>

--
Change "invalid" to "de" in e-mail address.

Dennis

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

"Alan" <sargent*@*iohk.com> wrote

> Why do I have `'[]{}<>~^_|\ on my
> keyboard? Because they're used by computer programmers. That's
> convenient for the 1% of the population using computers who program.
> The rest can't find the normal characters used in writing English
> because some geeks don't know or care about typography, but had the
> arrogance to design the character sets that have to be used for it.

Sounds like you've designed a market niche. Why don't you develop
and patent a keyboard for type geeks. I vote to keep the *greater than*,
*less than*, because I often use them to convey a lack of seriousness.
<grin>

Andreas Prilop

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

In article <38D2D8D3.F5B728E8@cter.s>,
Char <char@cter.s> wrote:

> On the contrary,the straight single and double quotes are the ONLY
> characters that
> are proper for indicating the units feet and inches or minutes and
> seconds of arc.

No: foot = ft, inch = in, minute = ′, second = ″

Jan Roland Eriksson

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

On Sat, 18 Mar 2000 23:46:29 +0800, Alan <sargent*@*iohk.com> wrote:

>Which idiots decided that we need access to the grave accent and foot
>mark more than we need normal punctuation used in almost every
>sentence of text written?

Maybe the same "idiots" that recognized the fact that
people of anglo-saxon origin and native language are
a minority in this world :-)

I can promise you that in everyday Swedish typing
accent's are used quite often.

We have words like idé (meaning "idea")
just to pick one out of a bunch.

If you where to write just ide it would mean
a place where bears sleep through the winter ;-)

...

>Why do I have `'[]{}<>~^_|\ on my keyboard?
>Because they're used by computer programmers.

...

>The rest can't find the normal characters used
>in writing English because some geeks don't
>know or care about typography,

Take this free tip for your Win machine at least.
Get hold of a keyboard with a Swedish key layout,
install the appropriate driver for that and I think
you will be happy to find your typographical keys
to be in quite comfortable places.

You would still have full and direct access to all of
the "programmers" keys of course since on our keyboards
there is a separate key to the right of the space bar
named "Alt Gr" and several of our keys have a third
legend on them that gets activated by "Alt Gr"

Alt Gr keys: | @ £ $ { [ ] } \ ~
As third legend on: < 2 3 4 7 8 9 0 + ¨ (non shift)
> " # ¤ / ( ) = ? ^ (shifted)

--
Jan Roland Eriksson <jre...@newsguy.com>

Andreas Prilop

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

In article <ev97ds411bpkadqba...@4ax.com>,

Jan Roland Eriksson <jre...@newsguy.com> wrote:

> We have words like idé (meaning "idea")
> just to pick one out of a bunch.

This is OK but the stupid German, Spanish, Swedish ... keyboard layouts
tempt people to type ´ instead of ' . A much better idea would be
to have ' as a dead key and to type ' + e -> é instead of
´ + e -> é.

Char

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

OK, what are &#8242 and &#8243?
(Are they the "prime" and "double prime" symbols? That is, the slanted –
not curly – single and double quotes?)

In which case you're right, and in the US, they would also be the proper
symbols for ft. and in.

Andreas Prilop wrote:
>
> In article <38D2D8D3.F5B728E8@cter.s>,
> Char <char@cter.s> wrote:
>
> > On the contrary,the straight single and double quotes are the ONLY
> > characters that
> > are proper for indicating the units feet and inches or minutes and
> > seconds of arc.
>
> No: foot = ft, inch = in, minute = ′, second = ″
>

Eric Fischer

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

Kai Henningsen <kaih=7a71d...@khms.westfalen.de> wrote:

> I don't think I have *ever* seen a table for ASCII or related sets where '
> was *not* supposed to be straight.

Well, then, you haven't seen the tables beginning October 29, 1963,
when ISO TC 97/SC 2 altered character 2/7 so that it could serve
either as an apostrophe or as an acute accent.

eric

Andreas Prilop

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

In article <8b0i0s$30q3$1...@news.enteract.com>,
Eric Fischer <e...@pobox.com> wrote:

> Well, then, you haven't seen the tables beginning October 29, 1963,

^^^^

Was that A.D. or B.C.? ;-)

Andreas Prilop

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

In article <38D3CB02.834F7C14@cter.s>,
Char <char@cter.s> wrote:

> OK, what are &#8242 and &#8243?
> (Are they the "prime" and "double prime" symbols?

Exactly. Have a look at <http://charts.unicode.org/Web/U2000.html>.
8242 = x2032

Jukka Korpela

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

On 18 Mar 2000 13:30:00 +0200, kaih=7a71f...@khms.westfalen.de (Kai
Henningsen) wrote:

>Jukka....@hut.fi (Jukka Korpela) wrote on 17.03.00 in <hlr4dsg9n7u9m64nr...@4ax.com>:
>
>> (On the other hand, I think was a very unfortunate decision by the
>> Unicode consortium to define the punctuation apostrophe to be the same
>> character as the right single quotation mark. They are logically quite
>> distinct characters and occur in similar contexts too.)
>
>Historical accident. Unicode and even ISO 10646 *MUST* be as compatible
>with ASCII as possible, for a number of technical and political reasons.

So? They are compatible with ASCII in this issue simply because the
apostrophe character has its traditional position in them. This does
not prevent from defining other characters in other positions (outside
the ASCII range of course) for less ambiguous usage. In particular,
the punctuation character is already there. Whether it's distinct from
some other character (outside ASCII) does not affect compatibility
with ASCII.

Eric Fischer

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

Markus Kuhn <mg...@cl.cam.ac.uk> wrote:

> I agree that it is somewhat debatable why ANSI X3.4 changed 0x27 to
> be the single quote, but I guess it is far too late now to change this
> back and I strongly disagree that the character as such is useless.

As far as I can tell, ANSI X3.4 has never defined character 2/7 as
simply a single quote. In the various revisions of the standard,
I see the character with the following names:

X3.4-1963 APOS.
X3.4-1965 Apostrophe (closing single quotation mark; acute accent)
X3.4-1967 Apostrophe (Closing Single Quotation Mark; Acute Accent)
X3.4-1968 Apostrophe (Closing Single Quotation Mark; Acute Accent)
X3.4-1977 Apostrophe (Closing Single Quotation Mark; Acute Accent)
X3.4-1986 APOSTROPHE, RIGHT SINGLE QUOTATION MARK, ACUTE ACCENT

Granted this is a lot closer to calling it a single quote than the
current revisions of ISO 646, CCITT International Reference Alphabet
(formerly International Alphabet No. 5), and ISO 8859, all of which
call it just an apostrophe.

eric

Dave Fawthrop

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

Markus Kuhn <mg...@cl.cam.ac.uk> wrote in message
news:8atesj$i6l$3...@pegasus.csx.cam.ac.uk...

>
>
> By the way, the new Unicode 3.0 standard, which discusses among
> many other things also the history and semantics of the curly
> and straight apostrophe in detail in section 6.1, has just
> been published:
>
> http://www.amazon.com/exec/obidos/ASIN/0201616335/mgk25
>

For people in England like Markus it is also on
Amazon.co.uk:

http://www.amazon.co.uk/exec/obidos/ASIN/0201616335/qid=953415008/sr=1-2/026
-2942823-7630214

Using Amazon in America costs the earth and takes forever :-(

--
Dave Fawthrop <hyp...@hyphenologist.co.uk> <http://www.hyphenologist.co.uk>
Computer Hyphenation Ltd, Hyphen House, 8 Cooper Grove, Halifax HX3 7RF,
UK, Tel/Fax/Answer +44 (0)1274 691092. **2000, 15th Anniversary Year**
Hyphenologist is sold as C source code and splits 50 languages.
Also on site: VDU Glasses, Wordlists FAQ, Celtic Spiral Font

Daniel R. Tobias

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

Andreas Prilop wrote:
>
> In article <8b0i0s$30q3$1...@news.enteract.com>,
> Eric Fischer <e...@pobox.com> wrote:
>
> > Well, then, you haven't seen the tables beginning October 29, 1963,
> ^^^^
>
> Was that A.D. or B.C.? ;-)

That should be "C.E." or "B.C.E.", to be sensitive to the non-Christians
in the audience...

--
--Dan
Dan's Web Tips: http://www.dantobias.com/webtips/

John Jenkins

unread,

Mar 18, 2000, 3:00:00 AM3/18/00

to

in article aqPTOI0J143XUQ...@4ax.com, Alan at sargent*@*iohk.com
wrote on 3/18/00 7:46 AM:

> Why do I have `'[]{}<>~^_|\ on my

> keyboard? Because they're used by computer programmers. That's
> convenient for the 1% of the population using computers who program.

And they were 100% of the population using computers when ASCII was
developed in the early 1960's.

=====
John H. Jenkins
jen...@apple.com
ts...@blueneptune.com
http://www.blueneptune.com/~tseng

Kai Henningsen

unread,

Mar 19, 2000, 3:00:00 AM3/19/00

to

e...@pobox.com (Eric Fischer) wrote on 18.03.00 in <8b0i0s$30q3$1...@news.enteract.com>:

> Kai Henningsen <kaih=7a71d...@khms.westfalen.de> wrote:
>
> > I don't think I have *ever* seen a table for ASCII or related sets where '
> > was *not* supposed to be straight.
>

> Well, then, you haven't seen the tables beginning October 29, 1963,

> when ISO TC 97/SC 2 altered character 2/7 so that it could serve
> either as an apostrophe or as an acute accent.

Maybe - but I've seen a lot of such tables.

Kai Henningsen

unread,

Mar 19, 2000, 3:00:00 AM3/19/00

to

nhtc...@rrzn-user.uni-hannover.invalid (Andreas Prilop) wrote on 18.03.00 in <nhtcapri-ya0240800...@newsserver.rrzn.uni-hannover.de>:

> In article <ev97ds411bpkadqba...@4ax.com>,
> Jan Roland Eriksson <jre...@newsguy.com> wrote:
>
> > We have words like idé (meaning "idea")
> > just to pick one out of a bunch.
>
> This is OK but the stupid German, Spanish, Swedish ... keyboard layouts
> tempt people to type instead of ' . A much better idea would be
> to have ' as a dead key and to type ' + e -> é instead of
> + e -> é.

That's how it was before the non-dead version got a position of it's own.
I lived with that for quite a while; it was bloody awful. Never had so
many accents that should have been apostrophes before (no dead keys) or
after (separate apostrophe key).

IMO, this change was one of the few that had *only* good points; I
certainly was never tempted to misuse the accent key afterwards.

Of course, my keyboard *drivers* do not currently produce U+00B4 from that
particular dead key, but instead produce the same U+0027 that the non-dead
version gives ...

Kai Henningsen

unread,

Mar 19, 2000, 3:00:00 AM3/19/00

to

jen...@apple.com (John Jenkins) wrote on 18.03.00 in <B4F98B8D.B047%jen...@apple.com>:

> in article aqPTOI0J143XUQ...@4ax.com, Alan at sargent*@*iohk.com
> wrote on 3/18/00 7:46 AM:
>
> > Why do I have `'[]{}<>~^_|\ on my
> > keyboard? Because they're used by computer programmers. That's
> > convenient for the 1% of the population using computers who program.
>
> And they were 100% of the population using computers when ASCII was
> developed in the early 1960's.

And when the German (or Skandinavian, or whatever) variant was developed,
it cared for non-geeks over geeks and replaced []{}\| with accented
letters. "aäiü = bäjü ö FLAG_42;" anyone?

Alan

unread,

Mar 19, 2000, 3:00:00 AM3/19/00

to

On Sat, 18 Mar 2000 17:33:11 +0100, Jan Roland Eriksson
<jre...@newsguy.com> wrote:

>On Sat, 18 Mar 2000 23:46:29 +0800, Alan <sargent*@*iohk.com> wrote:
>
>>Which idiots decided that we need access to the grave accent and foot
>>mark more than we need normal punctuation used in almost every
>>sentence of text written?
>
>Maybe the same "idiots" that recognized the fact that
>people of anglo-saxon origin and native language are
>a minority in this world :-)

I don't see how the change of expression of the ` and ' keys is
anything to do with being either Swedish or Anglo-Saxon. Or do you use
different quote marks in Swedish? (I know that German and French do,
for instance, but from what I remember from Ikea stores, the quotes in
Swedish are the same as English).

>I can promise you that in everyday Swedish typing
>accent's are used quite often.
>

>We have words like idé (meaning "idea")
>just to pick one out of a bunch.
>

First, I didn't mention accented characters at all in my post. They're
used in English quite frequently (mostly French-derived words).

Second, that's an acute, not the ` now aka grave accent. And é is a
single character, not a combination of accent + e.

>>Why do I have `'[]{}<>~^_|\ on my keyboard?
>>Because they're used by computer programmers.

>...
>>The rest can't find the normal characters used
>>in writing English because some geeks don't
>>know or care about typography,
>
>Take this free tip for your Win machine at least.
>Get hold of a keyboard with a Swedish key layout,
>install the appropriate driver for that and I think
>you will be happy to find your typographical keys
>to be in quite comfortable places.

Really? It has typographic quotes you can get with a single keystroke?

Alan

unread,

Mar 19, 2000, 3:00:00 AM3/19/00

to

On Sat, 18 Mar 2000 19:37:49 -0800, John Jenkins <jen...@apple.com>
wrote:

>in article aqPTOI0J143XUQ...@4ax.com, Alan at sargent*@*iohk.com
>wrote on 3/18/00 7:46 AM:
>

>> Why do I have `'[]{}<>~^_|\ on my

>> keyboard? Because they're used by computer programmers. That's
>> convenient for the 1% of the population using computers who program.
>
>And they were 100% of the population using computers when ASCII was
>developed in the early 1960's.

Yes, but my peeve is not with 1960s ASCII (see quote below from
earlier in the thread), but with the redefinition that came in with
ANSI in the 80s, when DTP and office software was well and truly
established. "Old ASCII" (eg most DOS) software usually does give you
curly left and right quotes for the ` and ' keys.

Personally, I do most of my DTP with Ventura 3, ca. 1988 vintage, and
that's what it does, and I'm very happy with that. When I get files in
WinWord or PageMaker, the first thing I do is go through it and fix
whatever left, right and straight quotes (and dashes and hyphens and
double quotes) the user has put in to what they should be, and this
can be VERY tedious. And it's all because of this wonderful clever
ANSI, or perhaps failing to remap keyboard definitions. However
logical, as I mentioned, it has had the demonstrated result of causing
incorrect quotemarks to be used in a large amount of printed matter
since; you can see examples every day on books, billboards,
advertisements...

Char

unread,

Mar 19, 2000, 3:00:00 AM3/19/00

to

Alan wrote:
(other good stuff snipped)

>When I get files in WinWord or PageMaker, the first thing I do is go through it and fix whatever left, right >and straight quotes (and dashes and hyphens and double quotes) the user has put in to what they should be, >and this can be VERY tedious.

When I was happily using Ami Pro (now Lotus Word Pro) for publishing a
professional journal, there was either a built-in function or a macro (I
forget which) that did a very reasonable job of scanning a document and
making typographic corrections, including changing typists' two spaces
to single spaces, pairing quotes, converting double-hyphens to n-dashes,
etc. Is there possibly an equivalent available in Word (to run before
you pass it on to Ventura) or in Ventura itself?

Andreas Prilop

unread,

Mar 20, 2000, 3:00:00 AM3/20/00

to

In article <7a85v...@khms.westfalen.de>,
kaih=7a85v...@khms.westfalen.de (Kai Henningsen) wrote:

> That's how it was before the non-dead version got a position of it's own.

My impression is that you completely misunderstood me.

> Of course, my keyboard *drivers* do not currently produce U+00B4 from that
> particular dead key, but instead produce the same U+0027 that the non-dead
> version gives ...

This is exactly what I wrote and what I prefer.

Antoine Leca

unread,

Mar 20, 2000, 3:00:00 AM3/20/00

to Alan

[ groups snipped, mailed copy to the author if he did not read them. ]

> >If you want to have a
> >proper curly apostrophe or right quotation mark, then you have to
> >use the Unicode character ’ which is intended for exactly
> >that purpose.
>
> Problem is that I don't have a key on my keyboard that generates
> ’ whereas I do have a '.

One of the trick I learned for my Mac very long time ago was to
slightly edit my KCHR resource to have the (Mac version of) curly
quote instead of the straight one. And even if I am a programmer,
I always was very happy of that change.

BTW, I know this is easy to do with Windows also (I did it to obtain
easily French oe and quotes).

> Which idiots decided that we need access to the grave accent and foot
> mark more than we need normal punctuation used in almost every
> sentence of text written?

The problem when using the curly quote is that you are going outside
of ASCII. Which in practical term means loss of interoperability
(for example, you cannot post to Usenet ;-)).

> This is not a trivial matter*. Has anyone noticed that in more and
> more typesetting you now see ' instead of rquote, and even " instead
> of l & r doublequote?

[ Fortunately, SuperWord comes to our help with its automatic
replacement of straight quote by smart ones... What a pity that it
is allowed to desactivate such a wonderful mechanism that prevents
any normal user to actualy type in a macro... ;-) ]

> Why do I have `'[]{}<>~^_|\ on my keyboard?

You lucky.
I am a programmer, and I do not have `[]{}~|\ on my keyboard,
even if ` is required to type in my languages (accentuated caps),
even if \ occurs so frequentely when I type filenames that I can't
count them in an hour...

By the way: are you sure _ was for programmers (I thought it was to
underline text)? are you sure | was for programmers (for tables)?
and | (simple math, IIRC)? and [] (legal stuff? or were they <>?)

> Because they're used by computer programmers. That's
> convenient for the 1% of the population using computers who program.

> The rest can't find the normal characters used in writing English

> because some geeks don't know or care about typography, but had the
> arrogance to design the character sets that have to be used for it.

Are you writing that 99% of computer users care about typography?

Even 1% is a big number, when you deal with users. ;-)

Antoine

Eric Fischer

unread,

Mar 20, 2000, 3:00:00 AM3/20/00

to

Antoine Leca <Antoin...@renault.fr> wrote:

> > Why do I have `'[]{}<>~^_|\ on my keyboard?
>

> By the way: are you sure _ was for programmers (I thought it was to
> underline text)? are you sure | was for programmers (for tables)?
> and | (simple math, IIRC)? and [] (legal stuff? or were they <>?)

Here are the rationales for why these characters were originally
included in ASCII or ISO 646:

' originally included for use as the apostrophe; later overloaded to
also mean the acute accent

` originally included for use as an accent; overloaded in the US to
also mean an opening single quotation mark

^ for use as an accent and as a mostly-compatible replacement for the
up-arrow (programming symbol) present in early versions of the code

~ for some reason, a second-class citizen among accents. Originally
proposed as an alternate graphic for "#"; then made an alternate
graphic for "^"; then an alternate graphic for overline (programming
symbol); only later the standard graphic.

_ originally included to serve a dual purpose: to underline text in
ordinary writing, and to serve as an "acknowledge" character in
automatic teleprinter communications. Acknowledge was reassigned
so the character became exclusively for underlining.

[ ] "high usage in ALGOL" and "useful for human-to-human communication"

< > needed for mathematics and for COBOL programming

{ } "consistency of meaning when the last two columns are folded over
the previous two columns; high utility factor in general use
(human communication)"

| originally proposed for inclusion as a mathematical symbol; missing
from the code from late 1961 to late 1963; added back in with the
same rationale as for the curly braces; later explained to refer to
the logical OR operation; then disassociated from logical OR to
appease the SHARE PL/I language committee.

\ "reverse division is a useful device in programming, particularly
for continued fractions" and "adjoining the two types of slashes
in either permutation gives a reasonable representation of the
logical symbols AND and OR in ALGOL. (/\ \/)"

eric

Char

unread,

Mar 20, 2000, 3:00:00 AM3/20/00

to

Antoine Leca wrote:
>
> By the way: are you sure _ was for programmers (I thought it was to
> underline text)? are you sure | was for programmers (for tables)?
> and | (simple math, IIRC)? and [] (legal stuff? or were they <>?)
>

The underscore is there because typewriters had it. Earlier (1945-1970
or so) computerized printing didn't have different modes (bold,
underscore, or even easily used lower case - in fact, 6-bit characters
didn't allow for much of anything)and the way to underline something was
to print the line, NOT do a line feed, and print the underscores.
Needless to say, this wasn't used much.

Parentheses, brackets, and braces have all been used in text and
mathematics for a long time.

The vertical bar was introduced to computers, I believe, with the
Backus-Naur syntax for describing programming languages; it was used in
APL ("A Programming Language") by Ken Iverson (1960's), and was
incorporated into PL/I at about the same time as the OR operator and,
doubled, as the concatenation operator.

We might ask why, on your keyboard, the vertical bar appears as TWO
shorter verticals, when almost all fonts produce a single vertical bar
...

Uwe Waldmann

unread,

Mar 20, 2000, 3:00:00 AM3/20/00

to

Markus Kuhn <mg...@cl.cam.ac.uk> writes:
> I agree that it is somewhat debatable why ANSI X3.4 changed 0x27 to
> be the single quote, but I guess it is far too late now to change this
> back and I strongly disagree that the character as such is useless.

The character 0x27 is ubiquitous in computing and thus it is clearly
not useless. Neither is a straight vertical glyph useless: it's the
stress mark in IPA. The question is, whether it is a good idea to
associate 0x27 with a (straight) vertical glyph, and whether it's
a good idea to make such an association mandatory. After all, 0x27
is not the only multi-purpose and multi-glyph character in ASCII
(others: 0x2A, 0x2D, 0x5E, 0x60, 0x7E), and at least in one case
(002D: HYPHEN-MINUS), Unicode has explicitly acknowledged this.
In my mind, this would be appropriate also for 0x27.

> It is certainly a good thing that Unicode provides us now with single
> and double quotes in both left, right, and neutral shapes, such
> that we can pick whatever is most appropriate. For printing programming
> language source code and other typewriter-style output, the neutral
> forms are definitely more appropriate,

It seems that the inventors and developers of Ada 95, C, C++, Common
Lisp, Eiffel, GNU Emacs, Java, Perl, Smalltalk, the Bourne shell, and
TeX did not share this opinion.

In (Kernighan and Ritchie 1988), (Stroustrup 1991/1997), (Steele 1990),
(Wall et al. 1991/1996), (Goldberg and Robson 1989) and (Bourne 1983),
the apostrophe character 0x27 is represented by a (straight) slanted
glyph, sometimes similar to an acute accent, sometimes to a prime or
minute sign. Meyer (1992), Stallman (1997), Arnold and Gosling (1997),
and Knuth (1986) use a curly ("9"-like) quote; in (Taft and Duff 1997),
both glyphs are used alternatingly. A vertical glyph shaped like an
inverted drop can be found in (Jensen and Wirth 1985) and (Wirth 1980).

References:

K. Arnold, J. Gosling: The Java Programming Language, 2nd. ed.,
Addison-Wesley, 1997.

S. R. Bourne: The UNIX System, Addison-Wesley, 1983.

A. Goldberg, D. Robson: Smalltalk-80, The Language, Addison-Wesley,
1989.

K. Jensen, N. Wirth: Pascal User Manual and Report, 3rd ed.,
Springer-Verlag, 1985.

B. W. Kernighan, D. W. Ritchie: The C Programming Language, 2nd ed.,
Prentice-Hall, 1988.

D. E. Knuth: The TeXbook, Addison-Wesley, 1986.

B. Meyer: Eiffel, The Language, Prentice-Hall, 1992.

R. M. Stallman: GNU Emacs manual, 13th ed., Free Software Foundation,
1997.

G. L. Steele Jr.: Common Lisp, The Language, 2nd. ed., Digital Press,
1990.

B. Stroustrup: The C++ Programming Language, 2nd ed., Addison-Wesley,
1991; 3rd ed., Addison-Wesley, 1997.

S. T. Taft, R. A. Duff, eds.: Ada 95 Reference Manual, Language and
Standard Libraries, Springer-Verlag, 1997.

L. Wall, T. Christiansen, R. L. Schwartz: Programming Perl, 1st ed.,
O'Reilly, 1991; 2nd ed., O'Reilly, 1996.

N. Wirth: MODULA-2, Institut für Informatik, ETH Zürich, 1980.

[Followup-To: comp.std.internat]

--
Uwe Waldmann, Max-Planck-Institut fuer Informatik
Im Stadtwald, D-66123 Saarbruecken, Germany
Phone: +49 681 9325 227, Fax: +49 681 9325 299, E-Mail: u...@mpi-sb.mpg.de

Eric Fischer

unread,

Mar 20, 2000, 3:00:00 AM3/20/00

to

Char <Char@cter.s> wrote:

> We might ask why, on your keyboard, the vertical bar appears as TWO
> shorter verticals, when almost all fonts produce a single vertical bar

The SHARE IBM users' group opposed the adoption of ASCII-1967
because the vertical bar, which, as you say, is the PL/I logical
OR operator, appears in a section of the code reserved for "national
use" characters, and, moreover, in the lower case section of the
code which might not be supported by all devices. SHARE insisted
that there be a vertical bar in the uppercase, international section
of the code, so the X3.2 subcommittee made it acceptable to substitute
a vertical bar symbol for the exclamation point, and broke the
character that was supposed to be a vertical bar in half so it
could not be mistaken for a logical OR symbol. The damage was
repaired in ASCII-1977, but by this time large numbers of devices
were using the broken-bar symbol and the transition back to the
real, solid, international, vertical bar is still not complete.

eric

Char

unread,

Mar 20, 2000, 3:00:00 AM3/20/00

to

Thanks for jogging my memory. I was heavily involved in the PL/I project
during its initial development (when it was technically a subcommittee
of the Share Fortran project). I think it was the late Phil Dorn who
chaired the PL/I committee when this particular imbroglio was stirred
up. (And, as I remember, the passions were much greater than in this
apostrophe thread!)

Eric Fischer wrote:
>
> Char <Char@cter.s> wrote:
>
> > We might ask why, on your keyboard, the vertical bar appears as TWO

> > shorter verticals, when almost all fonts produce a single vertical bar.

Alan

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

Ventura (the version I use, anyway) has basically no WP capability,
but it keeps text in external files in various formats (eg Word,
WordStar). Anyway, I do have a bunch of macros for fixing common
problems. However, as Tolstoy might have said, "Good ms are all alike,
Every bad ms is bad in its own way". I have to read each and work out
just what weird way the user's ignorance and program's defaults have
interacted in each case.

Generally with double quotes, I convert all varieties to ", and then
Ventura makes them into the appropriate typographic ones on importing.
(PageMaker and many other DTP apps can do this too). Single quotes are
a little harder; dashes, I make them -- which Ventura turns into real
dashes. Recently I'm having difficulties with Word, which has a
horrible autocorrect function that seemingly randomly confuses
hyphens, en and em dashes. I have to search for and inspect every one
of these to decide which it is really supposed to be.

The real problem, however, is that many people don't realise there's a
problem, and think that ' is an apostrophe, - is a dash, and these are
more and more going to print.

Alan

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

On 20 Mar 2000 19:15:22 GMT, Eric Fischer <e...@pobox.com> wrote:

>Antoine Leca <Antoin...@renault.fr> wrote:
>
>> > Why do I have `'[]{}<>~^_|\ on my keyboard?
>>

>> By the way: are you sure _ was for programmers (I thought it was to
>> underline text)? are you sure | was for programmers (for tables)?
>> and | (simple math, IIRC)? and [] (legal stuff? or were they <>?)
>

>Here are the rationales for why these characters were originally
>included in ASCII or ISO 646:
>
> ' originally included for use as the apostrophe; later overloaded to
> also mean the acute accent
>
> ` originally included for use as an accent; overloaded in the US to
> also mean an opening single quotation mark

Yes. So why were their use as opening/closing quote marks removed
later, with no obvious way to enter them on the keyboard substituted?

Alan

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

On Mon, 20 Mar 2000 18:34:49 +0100, Antoine Leca
<Antoin...@renault.fr> wrote:
>
>> >If you want to have a
>> >proper curly apostrophe or right quotation mark, then you have to
>> >use the Unicode character ’ which is intended for exactly
>> >that purpose.
>>
>> Problem is that I don't have a key on my keyboard that generates
>> ’ whereas I do have a '.
>
>One of the trick I learned for my Mac very long time ago was to
>slightly edit my KCHR resource to have the (Mac version of) curly
>quote instead of the straight one. And even if I am a programmer,
>I always was very happy of that change.
>
>BTW, I know this is easy to do with Windows also (I did it to obtain
>easily French oe and quotes).
>
>
>> Which idiots decided that we need access to the grave accent and foot
>> mark more than we need normal punctuation used in almost every
>> sentence of text written?
>
>The problem when using the curly quote is that you are going outside
>of ASCII. Which in practical term means loss of interoperability
>(for example, you cannot post to Usenet ;-)).

Yes, I know. The problem is that no matter what system we hack on our
individual desktops, the standard remains, and most people will use it
as-is, and the result is more and more bad typography being printed,

>> Why do I have `'[]{}<>~^_|\ on my keyboard?
>

>You lucky.
>I am a programmer, and I do not have `[]{}~|\ on my keyboard,
>even if ` is required to type in my languages (accentuated caps),
>even if \ occurs so frequentely when I type filenames that I can't
>count them in an hour...
>

>By the way: are you sure _ was for programmers (I thought it was to
>underline text)? are you sure | was for programmers (for tables)?
>and | (simple math, IIRC)? and [] (legal stuff? or were they <>?)
>

[] is used for editor's insertions in quoted text. Don't kow any use
for < and > outside math or programming.

Sure, these all have legitimate uses; it's just that quote marks and
dashes are much more common in normal writing, but aren't as easily
accessible as these characters.

>
>> Because they're used by computer programmers. That's
>> convenient for the 1% of the population using computers who program.
>> The rest can't find the normal characters used in writing English
>> because some geeks don't know or care about typography, but had the
>> arrogance to design the character sets that have to be used for it.
>
>Are you writing that 99% of computer users care about typography?

No, 99% of people are just writing correspondence or the like. But
there's no reason that that can't be set correctly. These days it's
trivial to use proportional type, beautifully-designed fonts,
justified text; but this is cursed with typewriter punctuation.

Also, more people are typesetting on their computers, and sending
files to be printed, using these incorrect characters.

>
>Even 1% is a big number, when you deal with users. ;-)

I did some programming myself; and even some math. But still I use my
keyboard much more for writing prose.

Antoine Leca

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

On 2000-03-20 08:20:48 +0800, Alan <sargent*@*iohk.com> wrote:
>
> On Mon, 20 Mar 2000 18:34:49 +0100, Antoine Leca
> <Antoin...@renault.fr> wrote:
> >
> >The problem when using the curly quote is that you are going outside
> >of ASCII. Which in practical term means loss of interoperability
> >(for example, you cannot post to Usenet ;-)).
>
> Yes, I know. The problem is that no matter what system we hack on our
> individual desktops, the standard remains, and most people will use it
> as-is, and the result is more and more bad typography being printed,

You are correct (and that is sad).

> Sure, these all have legitimate uses; it's just that quote marks and
> dashes are much more common in normal writing, but aren't as easily
> accessible as these characters.

As I said, it is quite easy to remap the existing keyboards
to achieve this.

With the introduction of the Euro here in Europe, we got a new symbol
that needed a new key: at the beginning it was (and still is)
an shifted key (using Option, AltGr or Ctrl+Alt or whatever), but
the ultimate goal of European Union is to have it as first-class
keys, very like $ or the pound symbols are (which seems quite normal).

So if sufficient presure is done, things can change.

Bottom line: there is no sufficient pressure.

> >> Because they're used by computer programmers. That's
> >> convenient for the 1% of the population using computers who program.
> >> The rest can't find the normal characters used in writing English
> >> because some geeks don't know or care about typography, but had the
> >> arrogance to design the character sets that have to be used for it.
> >
> >Are you writing that 99% of computer users care about typography?
>
> No, 99% of people are just writing correspondence or the like. But
> there's no reason that that can't be set correctly. These days it's
> trivial to use proportional type, beautifully-designed fonts,
> justified text; but this is cursed with typewriter punctuation.

Well, in France even the letters are a problem (the oe ligature
is absent from our keyboards, and from most exchange codes;
uppercase E cannot be easily accentuated with an acute accent;
unless you use a Mac).

I would say that it is the written language that is evolving.
I recall linguists explaining that languages tend to simplify
themselves: it seems like this is the case. I do not believe
this is something we can easily change.

Antoine

Thierry Bouche

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

Stefan Zingg <ste...@stefan.imp.com.BOUNCE.COM> writes:

> I'm using it for inch and second (and the single straight quote for foot
> and minute). Well, I guess I'll have to live with the attribute "pseudo".

yep ;-)
minutes, seconds, etc. should be rendered with a `prime' glyph from a
symbol font (you may use an oblicised `straight' quote for the
purpose, but it's a pseudo-prime). Never vertical as in roman fonts.

Thierry Bouche

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

Alan <sargent*@*iohk.com> writes:

> I don't see how the change of expression of the ` and ' keys is
> anything to do with being either Swedish or Anglo-Saxon. Or do you use
> different quote marks in Swedish? (I know that German and French do,

Sure, but they need apostrophes everywhere in a text, never the
straight ascii quote...

John Jenkins

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

in article rrzWOGF0K7ygzY=7b8Y4j...@4ax.com, Alan at sargent*@*iohk.com
wrote on 3/20/00 4:20 PM:

>
> Yes, I know. The problem is that no matter what system we hack on our
> individual desktops, the standard remains, and most people will use it
> as-is, and the result is more and more bad typography being printed,

The Mac has always had curly quotes available on the keyboard, but most
people don't use them. I think this is one of those cases where we're
dealing ultimately with the legacy of people who learned how to type on
typewriters, and with a lot of inertia behind them that makes them reluctant
to change to anything better.

A pity, it is.

Jukka Korpela

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

On Tue, 21 Mar 2000 08:20:48 +0800, Alan <sargent*@*iohk.com> wrote:

>[] is used for editor's insertions in quoted text. Don't kow any use
>for < and > outside math or programming.

How about HTML? After all, < and > are part of the reference concrete
syntax of SGML, therefore widely used in markup languages. (Obviously
they were taken into such use "because they were there", on keyboards
and in ASCII, the mother of most character codes used nowadays.)

>Sure, these all have legitimate uses; it's just that quote marks and
>dashes are much more common in normal writing, but aren't as easily
>accessible as these characters.

Or, effectively, not at all, as is the case in HTML if you wish to
write pages that are accessible to all. (It's not a fault in HTML
specifications of course; it's a browser problem, but serious enough
to affect HTML authoring for the WWW.)

Your notes are correct, and give insight to the problems we live with.
In the early days of computing, character repertoires were limited,
since anything related to computers was very expensive and since
computers were used, well, for computing.

But what can we do? Is this an HTML problem, or a font problem, or a
standardization problem? I was about to set followups to
comp.std.internat, which might be the most adequate group, but it
seems to me that the standards are not the problem here - rather, the
solutions to be used. So despite finding this discussion personally
interesting, I'm afraid it's getting more and more off-topic for _all_
the groups involved. I'd be happy to continue on some other forum.

Antoine Leca

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

John Jenkins wrote:
>
> The Mac has always had curly quotes available on the keyboard,

You mean, "using some obscure Option+key combinations"? (OK, not so
obscure since it is usualy on the same as the straight quote)

Or is there something peculiar to the English keyboard?

Because here in France, there is no curly quotes engraved on
any Mac keyboard I have ever seen.

> A pity, it is.

Certainly.

Antoine

Keith Thompson

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

Jukka Korpela <Jukka....@hut.fi> writes:
> On Tue, 21 Mar 2000 08:20:48 +0800, Alan <sargent*@*iohk.com> wrote:
>
> >[] is used for editor's insertions in quoted text. Don't kow any use
> >for < and > outside math or programming.
>

> How about HTML? [...]

That would be programming, yes?

--
Keith Thompson (The_Other_Keith) k...@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Welcome to the last year of the 20th century.

gr...@apple2.com

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

In article <yeczorr...@king.cts.com>,
Keith Thompson <k...@cts.com> wrote:
>Jukka Korpela <Jukka....@hut.fi> writes:
>>Alan <sar...@iohk.com> wrote:

>>> [] is used for editor's insertions in quoted text. Don't kow any
>>> use for < and > outside math or programming.

>> How about HTML? [...]

> That would be programming, yes?

No, HTML is not programming. It's marking up.

Let's not start this thread up again, please.

--
-- --- <gr...@apple2.com>
-- -- -- ------------------------------------------------------------------
-- -- --- <http://www.war-of-the-worlds.org/>
---

H. Peter Anvin

unread,

Mar 21, 2000, 3:00:00 AM3/21/00

to

Followup to: <7=XUOHz69GgZvxS...@4ax.com>
By author: Alan <sargent*@*iohk.com>
In newsgroup: comp.std.internat

>
> I don't see how the change of expression of the ` and ' keys is
> anything to do with being either Swedish or Anglo-Saxon. Or do you use
> different quote marks in Swedish? (I know that German and French do,

> for instance, but from what I remember from Ikea stores, the quotes in
> Swedish are the same as English).
>

Sometimes. In Swedish typography it is common to use only the upper-right
quotation marks at all times.

-hpa
--
<h...@transmeta.com> at work, <h...@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."

Antoine Leca

unread,

Mar 22, 2000, 3:00:00 AM3/22/00

to

gr...@apple2.com wrote:
>
> In article <yeczorr...@king.cts.com>,
> Keith Thompson <k...@cts.com> wrote:
> >Jukka Korpela <Jukka....@hut.fi> writes:
> >>Alan <sar...@iohk.com> wrote:
>
> >>> [] is used for editor's insertions in quoted text. Don't kow any
> >>> use for < and > outside math or programming.
>
> >> How about HTML? [...]
>
> > That would be programming, yes?
>
> No, HTML is not programming. It's marking up.

I am sorry: it is plain evidence that, in the context of Alan's post,
HTML writing was covered under the term "programming".

I agree HTML cannot be covered by neither "math" nor "programming"
if used in their most restrictive senses (so don't argue about that).
But Alan is not writing an academic thesis over formal languages, so
one should allow for a pinch of self-understanding, particularly since
they are three cross-posted newsgroups with not-so-near fields.

Antoine

gr...@apple2.com

unread,

Mar 22, 2000, 3:00:00 AM3/22/00

to

In article <38D8E5A3...@renault.fr>,
Antoine Leca <Antoin...@renault.fr> wrote:

>gr...@apple2.com wrote:
>>Keith Thompson <k...@cts.com> wrote:
>>>Jukka Korpela <Jukka....@hut.fi> writes:
>>>>Alan <sar...@iohk.com> wrote:

>>>>> [] is used for editor's insertions in quoted text. Don't kow any
>>>>> use for < and > outside math or programming.

>>>> How about HTML?

>>> That would be programming, yes?

>> No, HTML is not programming. It's marking up.

> I am sorry: it is plain evidence that, in the context of Alan's post,
> HTML writing was covered under the term "programming".

(1) It is not in evidence that he knew he was crossposting to ciwah, and
(2) HTML is not programming, so regardless of Alan's categorization, it
still stands that no, "That would" NOT "be programming".

And it was Keith who mistakenly thought HTML was programming.

> I agree HTML cannot be covered by neither "math" nor "programming"

Watch those double negatives. :-)

It also appears that someone's (Markus Kuhn's) newsreader (xrn 9.02)
dropped the References header on his followup, causing the loss of the
real root article of the thread. I've restored it.

Keith Thompson

unread,

Mar 22, 2000, 3:00:00 AM3/22/00

to

gr...@apple2.com writes:
[...]

> And it was Keith who mistakenly thought HTML was programming.

I hereby drop the subject. HTML may or may not be programming, but I
don't care enough to argue about it -- and I suspect most of the
readers of the three newsgroups this is cross-posted to don't either.

Daniel R. Tobias

unread,

Mar 23, 2000, 3:00:00 AM3/23/00

to

Alan wrote:
>
> On 22 Mar 2000 23:44:46 -0800, Keith Thompson <k...@cts.com> wrote:
> I don't care why Greg thinks HTML isn't a programming language, my
> point was that it certainly isn't prose and more particularly that the
> <> characters (and other keyboard characters) are not seen in more
> than a tiny percentage of printed works, whereas correct curly quotes
> are needed in EVERY book, almost every sentence, but cannot be easily
> typed. (Yes, it can be hacked, but most people don't know how to do
> that, and shouldn't have to either.)

To make a contrary point, however, the prevalence of "smart-quote"
features in word processing programs causes an "opposite" problem
frequently: people end up with platform or vendor specific "curly
quotes" in documents without even knowing that's what they have
(Microsoft software, in particular, tends to do this), and then they get
screwed up royally when transferred to different software or platforms
or put on the Internet. Whenever you get an e-mail message with an AE
ligature where an apostrophe should be, you've been victimized by this.

It's much safer to just use the plain ASCII symbols (' and "), as
typographically ugly as they may be.

--
--Dan
Dan's Web Tips: http://www.dantobias.com/webtips/

Alan

unread,

Mar 24, 2000, 3:00:00 AM3/24/00

to

On 21 Mar 2000 18:30:10 -0800, h...@cesium.transmeta.com (H. Peter
Anvin) wrote:

>Followup to: <7=XUOHz69GgZvxS...@4ax.com>
>By author: Alan <sargent*@*iohk.com>
>In newsgroup: comp.std.internat
>>
>> I don't see how the change of expression of the ` and ' keys is
>> anything to do with being either Swedish or Anglo-Saxon. Or do you use
>> different quote marks in Swedish? (I know that German and French do,
>> for instance, but from what I remember from Ikea stores, the quotes in
>> Swedish are the same as English).
>>
>
>Sometimes. In Swedish typography it is common to use only the upper-right
>quotation marks at all times.

For both opening and closing? Still, that isn't what we now get when
typing ' (on a US keyboard, anyway).

Alan

unread,

Mar 24, 2000, 3:00:00 AM3/24/00

to

On Tue, 21 Mar 2000 19:31:30 +0200, Jukka Korpela
<Jukka....@hut.fi> wrote:

>On Tue, 21 Mar 2000 08:20:48 +0800, Alan <sargent*@*iohk.com> wrote:
>
>>[] is used for editor's insertions in quoted text. Don't kow any use
>>for < and > outside math or programming.
>

>How about HTML? After all, < and > are part of the reference concrete

As I said, used in programming.

>syntax of SGML, therefore widely used in markup languages. (Obviously
>they were taken into such use "because they were there", on keyboards
>and in ASCII, the mother of most character codes used nowadays.)
>
>>Sure, these all have legitimate uses; it's just that quote marks and
>>dashes are much more common in normal writing, but aren't as easily
>>accessible as these characters.
>
>Or, effectively, not at all, as is the case in HTML if you wish to
>write pages that are accessible to all. (It's not a fault in HTML
>specifications of course; it's a browser problem, but serious enough
>to affect HTML authoring for the WWW.)

Dashes only got an & code (&emdash;) quite recently; I don't think
quote marks have one at all, not counting a Unicode number ref, which
works on very few browsers.

>
>Your notes are correct, and give insight to the problems we live with.
>In the early days of computing, character repertoires were limited,
>since anything related to computers was very expensive and since
>computers were used, well, for computing.
>
>But what can we do? Is this an HTML problem, or a font problem, or a
>standardization problem? I was about to set followups to
>comp.std.internat, which might be the most adequate group, but it
>seems to me that the standards are not the problem here - rather, the
>solutions to be used. So despite finding this discussion personally
>interesting, I'm afraid it's getting more and more off-topic for _all_
>the groups involved. I'd be happy to continue on some other forum.

Well I was commenting mainly in the DTP context. (I picked this up in
comp.fonts) I can live with the limitations of HTML, stupid and
unnecessary as they are, but that "improvements" in standards have led
to the disappearance of real quote marks in much PRINTED text, a
degradation of typography from what it has been for hundreds of years,
is what I'm angry about.

Alan

unread,

Mar 24, 2000, 3:00:00 AM3/24/00

to

On 22 Mar 2000 23:44:46 -0800, Keith Thompson <k...@cts.com> wrote:

>gr...@apple2.com writes:
>[...]
>> And it was Keith who mistakenly thought HTML was programming.
>
>I hereby drop the subject. HTML may or may not be programming, but I
>don't care enough to argue about it -- and I suspect most of the
>readers of the three newsgroups this is cross-posted to don't either.

I don't care why Greg thinks HTML isn't a programming language, my

He Comes As No Surprise

unread,

Mar 24, 2000, 3:00:00 AM3/24/00

to

In <6jvaOPEKASCbqoMMEN7wI=AD8...@4ax.com>, sargent*@*iohk.com wrote:
> >
> > Sometimes. In Swedish typography it is common to use only the upper-right
> > quotation marks at all times.
>
> For both opening and closing? Still, that isn't what we now get when
> typing ' (on a US keyboard, anyway).

It depends on what device you are using. Typing is younger than
printing, and a typewriter keyboard is full of compromises. On a
standard manual U.S. typewriter, the non-directional apostrophe is used
(correctly) as (1) apostrophe, (2) closing single quote, (3) opening
single quote, (4) prime/foot/minute-mark, (5) the upper portion of an
exclamation point, (6) acute accent, and (7) grave accent. It probably
has other standard uses that I don't know about.

Most electric typewriter keyboards have an exclamation-point key, so
you don't need to overstrike. Some have acute and grave accents too.

ASCII is based on typewriter keyboards. In fact, the characters "@",
"#" (number-sign, in case you're reading this in BritSCII), "$", and
"%" occupy contiguous positions in ASCII just as they do on an electric
typewriter keyboard. The accents were a concession to foreign alphabets.
Obviously a 7-bit character set had no room for all the accented letters.
If you wanted e-grave, you would overstrike e with grave. Using the
same character for apostrophe and acute accent was a stretch!

The trouble with computer typesetting is that the input is ASCII key-
board and the output is print. Appendix F of the _TeXbook_ illustrates
some of the mismatches: Computer Modern Roman has no backslash because
backslashes don't occur in print (that's what ASCII backslashes are
for); its slot is occupied by an em-dash, which doesn't occur in
ASCII. Computer Modern Typewriter Text has the full ASCII set, except
that in place of the apostrophe and grave accent it has closing and
opening single quotes. It also has a typewriter apostrophe in position
0x0D. The double quote is non-directional. (One other discrepancy--
the vertical bar 0x7C is unbroken, as in the ISO 8859 series.)

Troff, one of the oldest typesetting programs, distinguishes similar
keys with backslash escapes:

' apostrophe/closing single quote \' acute accent
` opening single quote \` grave accent
- hyphen \- minus sign

For printed double quotes you originally had to use two single quotes.
With Groff you can use \(lq and \(rq instead. As for dashes, the original
Troff offered only a 3/4-em dash, as a compromise between an em-dash and
an en-dash. Most users just used the minus sign. The C/A/T fonts used
with Troff had no non-directional apostrophe--like most conventional faces.

As you see, Troff, like other well-designed typesetting utilities, inter-
prets an ASCII apostrophe as a printed apostrophe, which is normally what
the user intends. If you use M*******t W**d, this will not happen unless
you specify it explicitly in one of the menus or use one of the bucky
keys (CTRL or ALT--I forget which). This characteristically M*******tian
misdesign accounts for the prevalence of ugly-looking non-directional
apostrophes in advertisements, signs, and corporate logos.

Of course, a non-directional apostrophe is the appropriate choice for
typefaces that imitate typewriting or ASCIIoid text. But few processors
and browsers observe this distinction.

Somebody suggested <Q>...</Q> tags for HTML. This would elegantly
solve the quoting problem (the user could select American double
quotes, British single quotes, Swedish right-right quotes, lower-upper
quotes . . .) but would do nothing for apostrophes.

My website is optimized for Lynx, so its pages' apostrophes are simple
ASCII apostrophes. If you read them with a graphics browser and a Roman
font, the apostrophes are rendered non-directional and look funny.
What we need is an HTML entity for an application-level apostrophe!

-:-
"To what do I owe the honor of this unexpected visit, Lord
Ruthven? ... alias Lyford Pemberton!"

H. C. Artmann, "Tom Parker, International Detective"
--
G. L. Sicherman
work: sich...@lucent.com
home: col...@mail.monmouth.com

Jukka Korpela

unread,

Mar 25, 2000, 3:00:00 AM3/25/00

to

On Fri, 24 Mar 2000 01:06:53 +0800, Alan <sargent*@*iohk.com> wrote:

>>How about HTML? After all, < and > are part of the reference concrete
>
>As I said, used in programming.
>
>>syntax of SGML, therefore widely used in markup languages.

If you wish to demonstrate your ignorance of the nature of markup
languages, nobody can prevent you, but you should then expect people
to point out your mistakes, especially when a group discussing HTML is
involved. Incidentally, < and > are used in linguistics and character
code discussions too (in lack of better delimiting symbols, but
anyway) - perhaps that makes them programming to you.

>Dashes only got an & code (&emdash;) quite recently;

In the 1980s, yes. But how does that relate to anything? There seems
to be a widespread myth that it's somehow relevant to have entity.
Esse est entitatem habere? Anyway, the _correct_ entity reference, as
defined in the SGML standard, taken into HTML in HTML 4.0, is —
while &emdash; is undefined. And — is _only_ a reference that
denotes —, comparable to a constant definition in programming
languages.

>I can live with the limitations of HTML, stupid and unnecessary as they are

What limitations? You are referring to limitations caused by browsers,
not HTML.

>"improvements" in standards have led
>to the disappearance of real quote marks in much PRINTED text, a
>degradation of typography from what it has been for hundreds of years,
>is what I'm angry about.

I think I understand your feelings, but you are barking at the wrong
tree. With computers, we started from using A - Z, 0 - 9, and a few
symbols like +-$. When Ascii was defined it was a courageous
extension. Later, various _system and program specific_ methods were
developed for writing and processing texts with a richer character
repertoire, and even with different fonts, advanced typography like
the TeX based tools, etc.

It might be disappointing to note that on the Internet, you are almost
(and in many contexts literally) taken back to Ascii, or even a subset
thereof (think of URL syntax, or Usenet group names. Of course you
_can_ use the Internet and the Web just for distribution of data in
application-specific formats (PDF, PostScript, TeX, whatever), and
that can be very useful. But for _universal_ and smooth accessibility,
you need something that isn't application or device dependent, you
need something different. Plain Ascii text is where we started, more
or less, in that area, and it too has its uses still - plain Ascii
text is still the normative format of RFCs, for example. HTML then has
gradually tried to provide better tools. It's not the problem but part
of the solution.

Incidentally, I recently re-read the tutorial at the start of the SGML
Handbook, and noticed that it uses, in the examples (not discussing
any particular markup language but a hypothetical, exemplary one), a
<Q> element for quotations, and mentions that the idea is that when
logical markup is used, quotations can be presented using different
levels of typographic quality. So when "smart quotes" are available, a
system could use them to present a <Q> element while a system with a
limited character repertoire would use Ascii apostrophes. (Assumably
some system could use a different font face like italics, or a
different tone of voice.) For some odd reason, <Q> was not taken into
HTML from the beginning; HTML 4.0 tries to introduce it, but this
seems to be a lost case - authors who use <Q> will be punished by
browsers that don't support it. The interesting thing is that in HTML
4.0 the <Q> element is motivated as a being somehow language
sensitive, without mentioning the idea of fallback to simplistic
presentation (with apostrophes). We don't really need <Q> for getting
language-specific punctuation itself - we can actually _use_ the
correct punctuation characters - but we could use it if we knew that
it would make the contained text indicated as a quotation _somehow_.
And this isn't the case at all.

tobias b koehler

unread,

Mar 25, 2000, 3:00:00 AM3/25/00

to

In comp.fonts Thierry Bouche <Thierry...@ujf-grenoble.fr> wrote:

> As the "straight ascii single quote" has no known usage,

Except for representing computer code. Which is the only known usage
for the ^ as well (I wouldn't know where to say ^ in normal text) :)

--
tobias benjamin köhler _______________ t...@rcs.urz.tu-dresden.de
__________ ___________<__ ______________ ______________
,-''0=========||0============0||0============0||=======0======|
`-oo--------oo-'`-oo--------oo-'`-oo--------oo-'`-oo--------oo-'
the ICE/ICT pages - http://mercurio.iet.unipi.it/ice/

tobias b koehler

unread,

Mar 25, 2000, 3:00:00 AM3/25/00

to

On Sat, 18 Mar 2000 01:14:45 GMT, Char <char@cter.s> wrote in comp.fonts:

> On the contrary, the straight single and double quotes are the ONLY
> characters that are proper for indicating the units feet and inches
> or minutes and seconds of arc.

Well .. if you look at the pdf files at http://charts.unicode.org/
you will find:

0020 " QUOTATION MARK
neutral (vertical), used as opening or closing quotation mark
preferred characters in English for paired quotation marks are 201C
and 201D
-> 02BA modifier letter double prime
-> 030B combining double acute accent
-> 030E combining double vertical line above
-> 2033 double prime
-> 3003 ditto mark

0027 ' APOSTROPHE
= APOSTROPHE-QUOTE
= APL-QUOTE
neutral (vertical) glyph having mixed usage
preferred character for apostrophe is 2019
preferred characters in English for paired quotation marks are
2018 and 2019
-> 02B9 modifier letter prime
-> 02BC modifier letter apostrophe
-> 02C8 modifier letter vertical line
-> 0301 combining acute accent
-> 2032 prime

Many fonts have separate characters for minutes and seconds
(unicode 2032 and 2033). Often these are the same design as 0027
and 0022, however.

--
tobias benjamin köhler _______________ t...@rcs.urz.tu-dresden.de

______<_ .--+--------+--. ______________ ______________
/'...<+>`\|H +--------+ H||H ========== H||H ========== H|
______`o-o--o-o'`-oo--------oo-'`-oo--------oo-'`-oo--------oo-'

Alan J. Flavell

unread,

Mar 25, 2000, 3:00:00 AM3/25/00

to

On Sat, 25 Mar 2000, Yummy wrote:

> HTML is a subset of programming,

I don't agree.

> there is no doubt about it.

I say you are wrong. There _is_ enough doubt about it for people to
honestly and justifiably come to opposite opinions.

HTML is markup. It no more qualifies as "programming", than printer's
markup qualifies as a subset of poetry.

> Ask anyone
> familiar with English language but not with coding

How would that help? We are, as I suppose, talking about a
specialised terminology related to the use of computer systems.

Certainly in other fields of endeavour, "programming" might also refer
to setting up a knitting machine, or to arranging segments of
television to fill an evening, or suchlike, but it would be perverse
to drag that terminology in here.

Perhaps we should ask an arbitrary shopper in the supermarket to tell
us what general relativity means. Would that appeal to your sense of
fairness in defining the language?

And if you're going to complain about threads dragging-on across
several groups, you might ask yourself why you are participating in
the process. f'ups now narrowed.

Dennis

unread,

Mar 25, 2000, 3:00:00 AM3/25/00

to

"Yummy" <Yumm...@SPAM.skuz.net> wrote
> For weeks I was staring at this ridiculous thread in comp.fonts,
> wondering why does it keep going, then realized it is cross-posted to HTML
> group. Little wonder then.

Thanks for pointing that out. I thought they had really gone off the deep end.
--
----------------------------------------------------------------------------
Note: Change NoSpam to wildtrumpet for e-mail replies
----------------------------------------------------------------------------

gr...@apple2.com

unread,

Mar 25, 2000, 3:00:00 AM3/25/00

to

In article <8bim0...@enews4.newsguy.com>,
Yu...@skuz.net (Yummy) wrote:

> HTML is a subset of programming, there is no doubt about it. Ask anyone

Ask anyone who doesn't know what programming is and they may say that.
Particularly those who think any language used with computers is a
programming language.

> HTML is a an artificial language that employs a set of arbitrary
> conventions to make up some message that is only meaninglful and
> useful through the use of interpreter - just like any computer
> language.

Not all artificial languages used by computers are programming
languages. HTML is a markup language: HyperText Markup Language. Last
time I looked the adjectives "markup" and "programming" were not
synonyms.

And your definition fails for HTML anyway. An HTML document is just
tags added to an existing document that clues a computer into what
structure exists in the prose. It is not necessary to understanding the
document for people, and the addition of such markup still doesn't give
the computer an understanding of the prose.

> (You don't talk to your wife in HTML, do you?).

Executives have spoken to their secretaries in a markup language when
dictating a letter with verbal punctuation period. Such a markup method
can be represented with text comma, and interpreted by a computer comma,
but it still does not make it a programming language period paragraph.

The superset is computer languages. Programming languages and markup
languages are disjoint sets which, if not proper subsets of, are at
least intersecting sets with the set of computer languages (as shown,
markup languages exist outside of computers, and one could think of such
things as recipes as a form of programming languages which are executed
by humans).

Oh, and before someone brings it up, compilation is not useful metric to
determine the nature of a language. BASIC was originally a compiled
language though in most implementations it is an interpreted language,
sometimes and sometimes not tokenized, both running on computers.

C. A. Upsdell

unread,

Mar 25, 2000, 3:00:00 AM3/25/00

to

<gr...@apple2.com> wrote in message
news:greg-E76BAC.1...@news.binary.net...

> In article <8bim0...@enews4.newsguy.com>,
> Yu...@skuz.net (Yummy) wrote:
>
> > HTML is a subset of programming, there is no doubt about it. Ask anyone
>
> Ask anyone who doesn't know what programming is and they may say that.
> Particularly those who think any language used with computers is a
> programming language.

Actually, I was a computer programmer for 27 years before ill health forced
me to switch to website development: and I consider HTML to be a programming
language. A simple programming language, true, but I have worked with even
simpler languages.

> Oh, and before someone brings it up, compilation is not useful metric to
> determine the nature of a language. BASIC was originally a compiled
> language though in most implementations it is an interpreted language,
> sometimes and sometimes not tokenized, both running on computers.

I agree that the compilation issue is irrelevant: but I must point out that
the first versions of Basic *were* interpreted.

Erland Sommarskog

unread,

Mar 25, 2000, 3:00:00 AM3/25/00

to

G. L. Sicherman (sich...@lucent.com) writes:
>As you see, Troff, like other well-designed typesetting utilities, inter-
>prets an ASCII apostrophe as a printed apostrophe, which is normally what
>the user intends. If you use M*******t W**d, this will not happen unless
>you specify it explicitly in one of the menus or use one of the bucky
>keys (CTRL or ALT--I forget which). This characteristically M*******tian
>misdesign accounts for the prevalence of ugly-looking non-directional
>apostrophes in advertisements, signs, and corporate logos.

Don't know exactly what you are talking about, but there is an option
in Microsoft Word "Use smart quotes" or some such. I usually turn it
off, because I'm Swedish and we don't use different quotes for opening
and closing. More importantly, the documents I write tend to include
code snippets, and opening and closing quotes in code samples are
only confusing. (Just as all those ill-designed man pages in Unix
which uses grave accent and apostrophes to emulate opening and closing
quotes.)

--
Erland Sommarskog, Stockholm, som...@algonet.se
This is an incomplete mess.

John Hauser

unread,

Mar 25, 2000, 3:00:00 AM3/25/00

to

Erland Sommarskog wrote:
> (Just as all those ill-designed man pages in Unix
> which uses grave accent and apostrophes to emulate opening and closing
> quotes.)

Let's be clear about one thing. Before it got taken over by the ISO,
official ASCII encouraged the use of codes 0x60 and 0x2C as opening and
closing single quotation marks. Seems to me about half the software
and fonts here in the States had converted over to that convention---
with _real_ quotation marks in those positions---before the official
pronouncements started coming down that that was somehow immoral. I for
one wish the ISO could have found it in their hearts to adopt the change
in their International Reference Version of ISO 646, but you know how
standards committees like to treat past blunders as immutable holy writ.

As for Unicode: With so much software still around---old and new---that
couldn't identify a Unicode if it ran over one on the road, I find the
``Let them eat Unicode'' solution to the lack of proper quotation marks
a bit premature.

Unfortunately, Unicode is such a perfect example of the type of artwork
that can only be created by committee: a union of most of the worst
features of every character set ever invented. And it just gets uglier
with every Technical Report coming out of the Consortium. Not everybody
is looking forward to the day when we'll all be Unicodified.

Lastly, anybody who thinks there's a typographic difference between a
proper apostrophe and a closing single quotation mark is halucinating.
There's no more difference between the two than between a period that
ends a sentence and one used to mark an abbreviation.

- John Hauser

tobias b koehler

unread,

Mar 26, 2000, 3:00:00 AM3/26/00

to

In comp.fonts Alan <sargent*@*iohk.com> wrote:

> Why do I have `'[]{}<>~^_|\ on my keyboard?

I don't, {[]}\ is AltGr+7890ß, though for <> there is a key which
causes the left shift key to go further left (a great problem for
those who come from typewriters and whose texts are originally
strewn with < ....)

--
tobias benjamin köhler t...@rcs.urz.tu-dresden.de
____________ ______________ ______________ ____________
,''=0==========||===0=========0||==========0===||==========0=``.
`-oo--------oo-'`-oo--------oo-'`-oo--------oo-'`-oo--------oo-'

Erland Sommarskog

unread,

Mar 26, 2000, 3:00:00 AM3/26/00

to

John Hauser (jha...@cs.berkeley.edu) writes:
>Let's be clear about one thing. Before it got taken over by the ISO,
>official ASCII encouraged the use of codes 0x60 and 0x2C as opening and
>closing single quotation marks. Seems to me about half the software
>and fonts here in the States had converted over to that convention---
>with _real_ quotation marks in those positions---before the official
>pronouncements started coming down that that was somehow immoral. I for
>one wish the ISO could have found it in their hearts to adopt the change
>in their International Reference Version of ISO 646, but you know how
>standards committees like to treat past blunders as immutable holy writ.

No matter what, the stupidies in Unix like

ééquotation''

could have been avoided but using simple double quotation marks. (And
typset variants of the text could still have gotten it right, if they
wished to.) Other operating systems could get in their docs, why not
Unix? (Why there was an accented e is left as an exercise to the reader
to find out, but a hint is that the grave accent is not a grave accent
in all national varities of ISO-646.)

>As for Unicode: With so much software still around---old and new---that
>couldn't identify a Unicode if it ran over one on the road, I find the
>``Let them eat Unicode'' solution to the lack of proper quotation marks
>a bit premature.

And in you quotation, there are two opening faint falling accents,
and two closing strong straight accents, in the font I'm reading
this text in (Courier New). Pretty ugly, I'd say.

>Unfortunately, Unicode is such a perfect example of the type of artwork
>that can only be created by committee: a union of most of the worst
>features of every character set ever invented. And it just gets uglier
>with every Technical Report coming out of the Consortium. Not everybody
>is looking forward to the day when we'll all be Unicodified.

Of course, if you can convince people all over the world to abandon
their current writing system in favour of the English alphabet,
there is a lot work saved.

gr...@apple2.com

unread,

Mar 26, 2000, 3:00:00 AM3/26/00

to

In article <WbaD4.3203$xz1....@cac1.rdr.news.psi.ca>,
"C. A. Upsdell" <cups...@upsdell.com> wrote:

><gr...@apple2.com> wrote:
>>Yu...@skuz.net (Yummy) wrote:

>>> HTML is a subset of programming, there is no doubt about it. Ask
>>> anyone

>> Ask anyone who doesn't know what programming is and they may say that.
>> Particularly those who think any language used with computers is a
>> programming language.

> Actually, I was a computer programmer for 27 years before ill health
> forced me to switch to website development: and I consider HTML to be
> a programming language.

Which only goes to show that experience isn't everything.

> A simple programming language, true, but I have worked with even
> simpler languages.

Were any of these simpler languages _programming_ languages?

If HTML is a programming language, where are the programs written in it?

A web page is not a program, therefore HTML is not a programming
language. It has no control logic. It has no conditional statements.
It has no memory, no variables, no registers. It is static, not dynamic.

A web page can _contain_ programs, such as Javascript, or embed external
programs such as Java or ActiveX, which can transform a static web page
into a dynamic document, but the web page is not a program in and of
itself any more than a robot controlled by a program is a program itself.

At most HTML is the transformation of a document into a series of data
fields, like a database file except more free-form, allowing programs to
manipulate it. Database files aren't programs either.

John Hauser

unread,

Mar 26, 2000, 3:00:00 AM3/26/00

to

Myself:

>Let's be clear about one thing. Before it got taken over by the ISO,
>official ASCII encouraged the use of codes 0x60 and 0x2C as opening and
>closing single quotation marks.

Erland Sommarskog:

> No matter what, the stupidies in Unix like
> ééquotation''
> could have been avoided but using simple double quotation marks.

> [...] (Why there was an accented e is left as an exercise to the

> reader to find out, but a hint is that the grave accent is not a grave
> accent in all national varities of ISO-646.)

What law of physics forces 0x60 to be a national variant character?
That decision was made by the ISO, not handed down by some deity from
above. The ISO chose to keep the less than and greater than signs ("<"
and ">") and lose the quotation marks. Was that really the best choice,
you think?

(The first person to say it was because "<" and ">" were needed for HTML
gets slapped.)

The UNIX documentation probably uses all of the other national variant
characters, "#", "$", "@", "[", "\", "]", "^", "{", "|", "}", and "~",
in one place or another. Don't you want to bitch about them, too?

> And in you quotation, there are two opening faint falling accents,
> and two closing strong straight accents, in the font I'm reading
> this text in (Courier New). Pretty ugly, I'd say.

It is ugly, although it's not much worse than what 'quotation' looks
like when 0x2C appears as a proper apostrophe/closing-single-quotation-
mark.

I don't know about you, but I don't love the stand-alone grave accent
enough to prefer it over a proper opening quotation mark. I have every
intention of continuing to use 0x60 and 0x2C as matching quotation marks
in 8-bit text. They work with much of the software I have, and nobody's
offering me an alternative outside of Unicode.

Hell, the keys on the 5-year-old keyboard I'm typing on are labeled
with opening and closing single quotation marks, not grave accent and
straight line. This isn't some special keyboard I ordered; it's the
one that came with the PC. I suppose the same higher authorities will
start cracking down on keyboard manufacturers soon, since we all know
the world would much rather have a useless grave accent key and a
typewriter-style straight quote.

Me:

> Not everybody
> is looking forward to the day when we'll all be Unicodified.

Erland:

> Of course, if you can convince people all over the world to abandon
> their current writing system in favour of the English alphabet,
> there is a lot work saved.

Unicode isn't the only possible solution, it's just the one that's being
standardized.

- John Hauser

Keith Thompson

unread,

Mar 26, 2000, 3:00:00 AM3/26/00

to

gr...@apple2.com writes:
[...]

> If HTML is a programming language, where are the programs written in it?
>
> A web page is not a program, therefore HTML is not a programming
> language. It has no control logic. It has no conditional statements.
> It has no memory, no variables, no registers. It is static, not dynamic.

I think the disagreement isn't about what HTML is, it's about what the
phrase "programming language" means. By some perfectly reasonale
definitions of the phrase, HTML is a programming language; by other
perfectly reasonable definitions, it isn't. (Turing-completeness
might be a good place to define the boundary, but it's not the only
possible criterion.)

There's probably some ISO or IEEE standard that defines the term
"programming language" precisely enough to decide the question.
Looking it up is left as an exercise for anyone who's sufficiently
interested.

Daniel R. Tobias

unread,

Mar 26, 2000, 3:00:00 AM3/26/00

to

Keith Thompson wrote:
>
> I think the disagreement isn't about what HTML is, it's about what the
> phrase "programming language" means. By some perfectly reasonale

We could do like President Clinton, and disagree about what "is" is!

Kai Henningsen

unread,

Mar 26, 2000, 3:00:00 AM3/26/00

to

sich...@lucent.com (He Comes As No Surprise) wrote on 24.03.00 in <8bgktr$9...@nntpa.cb.lucent.com>:

> It depends on what device you are using. Typing is younger than
> printing, and a typewriter keyboard is full of compromises. On a
> standard manual U.S. typewriter, the non-directional apostrophe is used
> (correctly) as (1) apostrophe, (2) closing single quote, (3) opening
> single quote, (4) prime/foot/minute-mark, (5) the upper portion of an
> exclamation point, (6) acute accent, and (7) grave accent. It probably
> has other standard uses that I don't know about.
>
> Most electric typewriter keyboards have an exclamation-point key, so
> you don't need to overstrike. Some have acute and grave accents too.

SAY WHAT?! US typewriters did not have exclamation-point keys?!

You already don't need umlauted vowels, what did you *do*?! Because I've
never seen a German mechanical typewriter without an exclamation key.

Kai
--
http://www.westfalen.de/private/khms/
"... by God I *KNOW* what this network is for, and you can't have it."
- Russ Allbery (r...@stanford.edu)

John Hauser

unread,

Mar 26, 2000, 3:00:00 AM3/26/00

to

I wrote:
> What law of physics forces 0x60 to be a national variant character?
> That decision was made by the ISO, not handed down by some deity from
> above. The ISO chose to keep the less than and greater than signs ("<"
> and ">") and lose the quotation marks. Was that really the best choice,
> you think?

> [...]

I'm not sure that sounds the way I wanted, so let me clarify:

ISO 646 made what I feel are some flawed decisions to try to squeeze
all of Europe's accented letters alongside the basic Roman alphabet and
punctuation in only 94 character codes. I understand the motivation for
this effort, but I reject attempts to make me bound by it. The national
variant character sets were a temporary hack, and the sooner we put them
behind us the better.

One of the apparent consequences of ISO 646 was to kill off the option
in ASCII for using 0x60 as an opening single quotation mark to go with
0x2C which could already be used as a closing single quotation mark.
ISO 8859 (Latin-1, etc.) didn't repair things either, although it
did find space for such useful stuff as superscripts 1, 2, and 3, and
fractions 1/4, 1/2, and 3/4.

Official standards notwithstanding, there was until recently a trend
toward adopting code 0x60 as the opening single quotation mark, since
there weren't many other options. I know that PostScript, TeX, and
various fonts on UNIX and Windows systems have placed a true opening
quotation mark in that position. As I've already mentioned, some PC
keyboards are labeled that way, too.

Unfortunately, this solution is in conflict with Unicode because codes
0x0060 and 0x002C are _neither_ the opening or closing single quotation
marks in Unicode. Those symbols have been assigned to other codes.
Moreover, Unicode backers refuse to accept any 8-bit solution to the
lack of proper quotation marks. Unicode is perfect, after all, (cough)
so why cause trouble with yet more ``variants'' of 8-bit character
codes?

It will be another 15 to 20 years, at least, before Unicode is so fully
entrenched that few people have reason to work with anything else.
Seems to me Unicode could stand to be a little more flexible and allow
quotation marks to appear at 0x0060 and 0x002C so that they could be
available within 8-bit Latin-1 and other local character sets that have
ASCII in their lower 128 codes.

- John Hauser

John Hauser

unread,

Mar 26, 2000, 3:00:00 AM3/26/00

to

Kai Henningsen wrote:
> SAY WHAT?! US typewriters did not have exclamation-point keys?!

Not originally. (I'm pretty sure IBM Selectrics did.)

- John Hauser

John Hauser

unread,

Mar 26, 2000, 3:00:00 AM3/26/00

to

Erland Sommarskog:
> No matter what, the stupidies in Unix like
> ééquotation''
> could have been avoided but using simple double quotation marks.

> [...](Why there was an accented e is left as an exercise to the reader

> to find out, but a hint is that the grave accent is not a grave accent
> in all national varities of ISO-646.)

I wrote:

> What law of physics forces 0x60 to be a national variant character?
> That decision was made by the ISO, not handed down by some deity from
> above. The ISO chose to keep the less than and greater than signs ("<"
> and ">") and lose the quotation marks. Was that really the best choice,
> you think?
> [...]

I'm not sure that sounds the way I wanted, so let me clarify:

ISO 646 made what I feel are some flawed decisions to try to squeeze
all of Europe's accented letters alongside the basic Roman alphabet and
punctuation in only 94 character codes. I understand the motivation for
this effort, but I reject attempts to make me bound by it. The national
variant character sets were a temporary hack, and the sooner we put them
behind us the better.

One of the apparent consequences of ISO 646 was to kill off the option
in ASCII for using 0x60 as an opening single quotation mark to go with

0x27 which could already be used as a closing single quotation mark.

ISO 8859 (Latin-1, etc.) didn't repair things either, although it
did find space for such useful stuff as superscripts 1, 2, and 3, and
fractions 1/4, 1/2, and 3/4.

Official standards notwithstanding, there was until recently a trend
toward adopting code 0x60 as the opening single quotation mark, since
there weren't many other options. I know that PostScript, TeX, and
various fonts on UNIX and Windows systems have placed a true opening
quotation mark in that position. As I've already mentioned, some PC
keyboards are labeled that way, too.

Unfortunately, this solution is in conflict with Unicode because codes

0x0060 and 0x0027 are _neither_ the opening or closing single quotation

marks in Unicode. Those symbols have been assigned to other codes.
Moreover, Unicode backers refuse to accept any 8-bit solution to the
lack of proper quotation marks. Unicode is perfect, after all, (cough)
so why cause trouble with yet more ``variants'' of 8-bit character
codes?

It will be another 15 to 20 years, at least, before Unicode is so fully
entrenched that few people have reason to work with anything else.
Seems to me Unicode could stand to be a little more flexible and allow

quotation marks to appear at 0x0060 and 0x0027 so that they could be

Alan

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

On Sat, 25 Mar 2000 15:29:19 GMT, Yumm...@SPAM.skuz.net (Yummy)
wrote:

>On the other hand, while I feel Alan's pain, I do not see why correct
>typography would or should superseed programmers' convenience on
>computer keyboard.

Because the vast majority of people who use computer keyboards are
using them to generate text that will be printed, or at least read on
screen. And this could easily have been done without inconveniencing
programmers, and in fact the character sets used in the 1970s served
both well (until the ` and ' stopped being left and right quote
marks).

>99% of computer users these days have next to
>zero understanding of either of them, so they cannot care less either
>way. Historically, it was because people made computers to suit their
>own needs (and the needs involved math and programming, not DTP),
>and it gets carried on because keeping things the same is simple and
>because there is no real pressure to change.

As above, there was a working solution, and it was broken.

>As long as a publisher of
>kindergarden's weekly flyer feels nothing's wrong with the absense of
>curly quotes and ligatures, it's OK. After all, typography is a specialized
>discipline too.

That people not only print kindergarten flyers, but hardback books,
advertising posters, product labels (look on your supermarket shelf),
with typewriter characters is not at all okay by me. What makes it so
absurd is that they have, usually without realising what they're
doing, proportional type, kerning, justification; but marred by
typewriter punctuation.

And it's not just amateurs. I work with professional layout and design
people. Many are very technically proficient, but I have to take great
pains and check every line of layout to prevent them from printing
typewriter punctuation on book covers and the like.

I've also lost days of work when I get an ms, begin work on it only to
realise that, eg, all the dashes have disappeared because they were
encoded in some appliction/version specific way. (After the first
time, of course, I now check this before I start; but sometimes a late
addition comes through and I skip the checks inadvertently.)

So my anger on these points is due to the time and trouble it has cost
me, and will continue to cost me for years to come.

Alan

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

On Sat, 25 Mar 2000 06:47:53 +0200, Jukka Korpela
<Jukka....@hut.fi> wrote:

>On Fri, 24 Mar 2000 01:06:53 +0800, Alan <sargent*@*iohk.com> wrote:
>
>>>How about HTML? After all, < and > are part of the reference concrete
>>
>>As I said, used in programming.
>>
>>>syntax of SGML, therefore widely used in markup languages.
>
>If you wish to demonstrate your ignorance of the nature of markup
>languages, nobody can prevent you, but you should then expect people
>to point out your mistakes, especially when a group discussing HTML is
>involved. Incidentally, < and > are used in linguistics and character
>code discussions too (in lack of better delimiting symbols, but
>anyway) - perhaps that makes them programming to you.

Try not to be so patronising. I studied maths, comp science and
physics, and am currently working in DTP and web design.
My point was that the characters I mentioned are not used in prose,
except for various specialised technical uses, one of these being
HTML. If I failed to list your favorite hobbyhorse, I'm sorry.

>
>>Dashes only got an & code (&emdash;) quite recently;
>
>In the 1980s, yes. But how does that relate to anything? There seems
>to be a widespread myth that it's somehow relevant to have entity.
>Esse est entitatem habere? Anyway, the _correct_ entity reference, as
>defined in the SGML standard, taken into HTML in HTML 4.0, is —
>while &emdash; is undefined. And — is _only_ a reference that
>denotes —, comparable to a constant definition in programming
>languages.

Sorry I mixed up the name here. My point was that the em dash, as you
finally said, only came into HTML in version 4.0, and thus many extant
browsers don't recognise it. I am unable to understand why such a
basic character was ignored for so many years. As for the 4-digit
numeric references, these also only work in recent browsers that are
Unicode compliant (if that's wrong, I'm sure you'll correct me, I
speak merely from experience here.)

>
>>I can live with the limitations of HTML, stupid and unnecessary as they are
>
>What limitations? You are referring to limitations caused by browsers,
>not HTML.

As above, dashes were absent from HTML until 4.0.

>
>>"improvements" in standards have led
>>to the disappearance of real quote marks in much PRINTED text, a
>>degradation of typography from what it has been for hundreds of years,
>>is what I'm angry about.
>
>I think I understand your feelings, but you are barking at the wrong
>tree. With computers, we started from using A - Z, 0 - 9, and a few
>symbols like +-$. When Ascii was defined it was a courageous
>extension. Later, various _system and program specific_ methods were
>developed for writing and processing texts with a richer character
>repertoire, and even with different fonts, advanced typography like
>the TeX based tools, etc.

Which tree should I be barking up then? It seems very clear from my
experience in DTP that the lack of a standard method for using normal
printing characters lead to, as you said, various _system and program
specific_ methods to be introduced, with the result that most users,
even experienced designers, who just type or paste in text, and don't
pore through the manuals, are ignorant of how to activate them, or do
so inappropriately.

Alan J. Flavell

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

On Mon, 27 Mar 2000, Alan wrote:

> On Sat, 25 Mar 2000 06:47:53 +0200, Jukka Korpela
> <Jukka....@hut.fi> wrote:

> >If you wish to demonstrate your ignorance of the nature of markup
> >languages, nobody can prevent you, but you should then expect people
> >to point out your mistakes, especially when a group discussing HTML is
> >involved. Incidentally, < and > are used in linguistics and character
> >code discussions too (in lack of better delimiting symbols, but
> >anyway) - perhaps that makes them programming to you.
>
> Try not to be so patronising.

The comment seems well-measured in the circumstances.

> I studied maths, comp science and
> physics, and am currently working in DTP and web design.

Then you should know better. There's even less reason to cut the
degree of slack that we usually allow for those who are evidently
newbies to the field.

As an ex-student of several specialised subjects, you of all people
should not be surprised to find that terms that may have wooly
everyday meanings also have rather precise technical significance.

> My point was that the characters I mentioned are not used in prose,

I'm afraid that doesn't get us very much further.

--

It is generally accepted that followups should try to _reduce_
confusion, not the other way around. - Tad McClellan on c.l.p.misc

Jukka Korpela

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

On Mon, 27 Mar 2000 01:42:43 +0800, Alan <sargent*@*iohk.com> wrote:

>My point was that the characters I mentioned are not used in prose,

>except for various specialised technical uses, one of these being
>HTML.

Well, then you should have written _that_. Your initial claim was much
more restrictive, not only as regards HTML. Even the new formulation
needs to be interpreted liberally.

>If I failed to list your favorite hobbyhorse, I'm sorry.

Your attitude is not very positive. It was not about my horses but
about a factually wrong claim about the very language we discuss in
one of the groups where this discussion goes.

>Sorry I mixed up the name here.

Well the confusion is understandable, since some browsers mixed it up,
and you really cannot know the correct entity name without consulting
the specifications.

> My point was that the em dash, as you
>finally said, only came into HTML in version 4.0

No, I did _not_ say that. I wrote that the _entity_ — was taken
(copied from SGML) to HTML 4.0. Do you _still_ think it's somehow
relevant to have an entity? If a programming language hasn't got a
predefined name for, say, the number 42, does it mean that you cannot
use the number 42 in that language?

>and thus many extant browsers don't recognise it.

You've got the causality backwards. It was evident as early as in the
HTML 2.0 specification that a richer character repertoire is needed.
The HTML versions and their specifications were really not the
problem, in this issue; the browsers were, and are.

>I am unable to understand why such a
>basic character was ignored for so many years.

It's not a matter of picking up characters from here and there and
including them into a language. Language design doesn't work that way.

>As above, dashes were absent from HTML until 4.0.

Have you actually checked what the HTML 3.2 specification says about
the character repertoire?

>Which tree should I be barking up then?

Software (and vendors producing it) that do not a) process character
encoding information adequately, b) support such important encodings
as UTF-8 and UCS-2, c) support processing of rich enough character
repertoires, d) come with sufficiently extensive fonts.

It's foolish that Netscape 4 doesn't support —. It's nice that
it supports — however. But the real problem with Netscape in
this area is that it generally fails to support &#bignumber; unless a
Unicode encoding is specified.

Jukka Korpela

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

On Sat, 25 Mar 2000 19:53:01 -0600, Mike Morrow <mi...@micratek.com>
wrote:

>When ASCII came along, there was an already existent code set in
>widespread use [namely EBCDIC]

The early history of character codes is probably not on-topic for
other groups than comp.std.internat (at most), so I've set Followup-To
to point there only.

According to
http://www.terena.nl/projects/multiling/euroml/section05.html
EBCDIC was announced around 1965. It is, and has always been,
vendor-specific - more exactly, for certain types of IBM systems. And
if you ask me, it wasn't logically designed at all - e.g. letters were
not consecutive.

According to
http://tronweb.super-nova.co.jp/characcodehist.html
ANSI (called ASA that time) announced the first version of ASCII in
1963. It had a smaller character repertoire than the final version
approved in 1968, but on the other hand the development had started in
the late 1950s.

The references above are not authoritative, but they appear to be
generally reliable, so I think we can say that EBCDIC and ASCII were
designed roughly the same time.

>It had 8 bits and, therefore, 256 possible combinations.

And wasted many of them, if I remember correctly, by not assigning
characters or control codes to them.

>ASCII, as introduced, was a 7 bit code using the eighth
>bit for parity.

Yes, and there were technical reasons to that, partly related to
communications protocols intended for more heterogenous environments
than the implied environment of EBCDIC. And in fact, bits were so
precious that time that some vendors even designed architectures which
actually used 7-bit bytes (Digital's 36-bit systems, packing five
characters into one word - and the "spare" bit was taken into a
special use too!).

>So extension, it was NOT with only 1/2 the code space
>of EBDDIC. Courageous? NOT! Stupid! Definitely!

ASCII was definitely a courageous extension, and it has become a great
success. Especially in the 1968 version it extended the character
repertoire from what had been available and used that time. It was at
that time a reasonable compromise between conflicting needs and
restrictions. Remember that computers were expensive equipment for
calculations, and hardly anyone dreamed of having a computer at home.

The courage is illustrated by the fact that even today, even the ASCII
set of characters isn't completely "safe". If I send data in ASCII
encoding to someone and he views it on screen, prints it, or processes
it with some program, it may _still_ happen that some characters get
processed wrong - mainly due to "national variants" of ASCII. This
used to a be big problem not that long ago. We have mostly got over it
(though it still happened last year that my E-mail message, when
quoted in a reply, had a tilde character converted to u umlaut, and
the messages did not even cross state boundaries). Now we have a mess
with ISO Latin 1 characters. They mostly work well on Web pages, but
not that well universally.

>These days, it has taken over due to the proliferation of PCs and Un*x
>boxes which use the new, 8 bit ASCII.

There is no 8 bit ASCII. You can check that no such code has been
defined by any standards body. Or you can see a short explanation (in
a long document) at http://www.hut.fi/u/jkorpela/chars.html#ascii8

Simon Brooke

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

"C. A. Upsdell" <cups...@upsdell.com> writes:

> <gr...@apple2.com> wrote in message
> news:greg-E76BAC.1...@news.binary.net...
> > In article <8bim0...@enews4.newsguy.com>,

> > Yu...@skuz.net (Yummy) wrote:
> >
> > > HTML is a subset of programming, there is no doubt about it. Ask anyone
> >
> > Ask anyone who doesn't know what programming is and they may say that.
> > Particularly those who think any language used with computers is a
> > programming language.
>
> Actually, I was a computer programmer for 27 years before ill health forced
> me to switch to website development: and I consider HTML to be a programming

> language. A simple programming language, true, but I have worked with even
> simpler languages.

OK, explain how you would calculate the factorial of one thousand in
HTML alone.

If you can't, it isn't Turing equivalent; if it isn't Turing
equivalent, it isn't a programming language.

Hint: in computing, there's such a thing as 'data', and even, for the
very advanced, there are 'data formats'.

--
si...@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

Due to financial constraints, the light at the end of the tunnel
has been switched off.

Simon Brooke

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

Keith Thompson <k...@cts.com> writes:

> gr...@apple2.com writes:
> [...]
> > If HTML is a programming language, where are the programs written in it?
> >
> > A web page is not a program, therefore HTML is not a programming
> > language. It has no control logic. It has no conditional statements.
> > It has no memory, no variables, no registers. It is static, not dynamic.
>

> I think the disagreement isn't about what HTML is, it's about what the
> phrase "programming language" means. By some perfectly reasonale

> definitions of the phrase, HTML is a programming language; by other
> perfectly reasonable definitions, it isn't. (Turing-completeness
> might be a good place to define the boundary, but it's not the only
> possible criterion.)

Since Alan Turing is the person who applied the word 'programme' to a
sequence of instructions to be interpreted by a machine, I think his
definition rules. However, of course, 'programme' != 'program'.

Alan J. Flavell

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

On Sat, 25 Mar 2000, Mike Morrow wrote:

> When ASCII came along, there was an already existent code set in

> widespread use. It had 8 bits and, therefore, 256 possible
> combinations.

However, only about half of the 256 possible code points were actually
assigned to characters, so the situation wasn't fundamentally
different from, say, ASCII with parity. It was an 8-bit code, but
only conveyed 7 bits of information.

> ASCII, as introduced, was a 7 bit code using the eighth

> bit for parity. So extension, it was NOT with only 1/2 the code space

> of EBDDIC. Courageous? NOT! Stupid! Definitely!

Well, of course when you make up history for yourself, you can make it
as stupid as you want to. I don't remember it being like that.

> These days, it has taken over due to the proliferation of PCs and Un*x
> boxes which use the new, 8 bit ASCII.

ASCII is an old 7-bit code[1]. Whenever I see someone referring to an
8 bit ASCII code, the bogosity alarm rings.

[1]well, actually several, due to national variants.

> Finally, it has the same code space as EBCDIC

Actually, the unassigned half of the EBCDIC code space was still being
shuffled around in the early 1990's when Pirard's well-known paper was
written, and SHARE was begging IBM to define an EBCDIC counterpart to
the already existing iso-8859-1 code. Which finally became CECP-1047.
See ftp://ftp.ulg.ac.be/pub/docs/iso8859/iso8859.networking or a copy
at ftp://watsun.cc.columbia.edu/kermit/charsets/iso8859.networking

> but still the translation problem for a major segment
> of the professional industry.

That doesn't describe any situation that I'm familiar with nowadays.

Aaron Priven

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

In article <7ackX...@khms.westfalen.de>,

kaih=7ackX...@khms.westfalen.de (Kai Henningsen) wrote:
>sich...@lucent.com (He Comes As No Surprise) wrote on 24.03.00 in <8bgktr$9...@nntpa.cb.lucent.com>:
>
>> It depends on what device you are using. Typing is younger than
>> printing, and a typewriter keyboard is full of compromises. On a
>> standard manual U.S. typewriter, the non-directional apostrophe is used
>> (correctly) as (1) apostrophe, (2) closing single quote, (3) opening
>> single quote, (4) prime/foot/minute-mark, (5) the upper portion of an
>> exclamation point, (6) acute accent, and (7) grave accent. It probably
>> has other standard uses that I don't know about.
>>
>> Most electric typewriter keyboards have an exclamation-point key, so
>> you don't need to overstrike. Some have acute and grave accents too.
>

>SAY WHAT?! US typewriters did not have exclamation-point keys?!
>

>You already don't need umlauted vowels, what did you *do*?! Because I've
>never seen a German mechanical typewriter without an exclamation key.

As the original poster mentioned, most US manual typewriters used a
single quote/apostrophe ' followed by a backspace and a period .

You might also be intrigued to learn that the typewriters also had no
number 1 -- the lower case L (er, l) was used instead. I occasionally
see references to years like "l970" in typeset text where people
obviously didn't bother to change their typing style when they ended
up typing on the computer.

I'm waiting for someone to come out with a computer keyboard in the
Linotype pattern.
--
Aaron Priven, Oakland, California, USA
aa...@priven.com, http://www.priven.com/

gr...@apple2.com

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

In article <m21z4wh...@gododdin.internal.jasmine.org.uk>,
Simon Brooke <si...@jasmine.org.uk> wrote:
>Keith Thompson <k...@cts.com> writes:

>> I think the disagreement isn't about what HTML is, it's about what the
>> phrase "programming language" means. By some perfectly reasonale
>> definitions of the phrase, HTML is a programming language; by other
>> perfectly reasonable definitions, it isn't. (Turing-completeness
>> might be a good place to define the boundary, but it's not the only
>> possible criterion.)

> Since Alan Turing is the person who applied the word 'programme' to a
> sequence of instructions to be interpreted by a machine, I think his
> definition rules. However, of course, 'programme' != 'program'.

And, since HTML is not a sequence of instructions but rather of data
descriptors with which a browser can do whatever it pleases....

celigne

unread,

Mar 27, 2000, 3:00:00 AM3/27/00

to

"Alan J. Flavell" wrote:
>
> ASCII is an old 7-bit code[1]. Whenever I see someone referring to an
> 8 bit ASCII code, the bogosity alarm rings.

I'd better throw away my copy of draft X3.134.1 "8-Bit ASCII Structure
and Rules" then!

Alan

unread,

Mar 28, 2000, 3:00:00 AM3/28/00

to

On Mon, 27 Mar 2000 09:34:09 GMT, Simon Brooke <si...@jasmine.org.uk>
wrote:

>> > Yu...@skuz.net (Yummy) wrote:
>> >
>> > > HTML is a subset of programming, there is no doubt about it. Ask anyone

>OK, explain how you would calculate the factorial of one thousand in
>HTML alone.

>If you can't, it isn't Turing equivalent; if it isn't Turing
>equivalent, it isn't a programming language.
>

Really? Since when did this become the definition of a programming
language? Can you cite an authority? This does not appear, for
instance, in the Oxford definition or Britannica article on the
subject. If Knuth says that, I would defer, though.

Anyway, you could do it using a cgi. Now you'll assert that's not
HTML. Easy to win arguments when you can set the definitions, isn't
it?

>Hint: in computing, there's such a thing as 'data', and even, for the
>very advanced, there are 'data formats'.

So what? Anyway, in HTML, the text is the data. HTML also includes
forms, allowing variable data. But again, so what? No one claimed that
HTML was a general purpose programming language.

Alan

unread,

Mar 28, 2000, 3:00:00 AM3/28/00

to

On Mon, 27 Mar 2000 02:36:00 +0200, "Alan J. Flavell"
<fla...@mail.cern.ch> wrote:

>On Mon, 27 Mar 2000, Alan wrote:
>
>> On Sat, 25 Mar 2000 06:47:53 +0200, Jukka Korpela
>> <Jukka....@hut.fi> wrote:
>
>> >If you wish to demonstrate your ignorance of the nature of markup
>> >languages, nobody can prevent you, but you should then expect people
>> >to point out your mistakes, especially when a group discussing HTML is
>> >involved. Incidentally, < and > are used in linguistics and character
>> >code discussions too (in lack of better delimiting symbols, but
>> >anyway) - perhaps that makes them programming to you.
>>
>> Try not to be so patronising.
>
>The comment seems well-measured in the circumstances.

Pompous and patronising seems to fit better from where I sit.

>> I studied maths, comp science and
>> physics, and am currently working in DTP and web design.
>
>Then you should know better. There's even less reason to cut the
>degree of slack that we usually allow for those who are evidently
>newbies to the field.

>As an ex-student of several specialised subjects, you of all people
>should not be surprised to find that terms that may have wooly
>everyday meanings also have rather precise technical significance.

And you should understand that these words were used in an everyday
sense, not a precise technical context. And even if the latter, you
have preferred to make more disparaging personal remarks rather than
making any reasoned argument or authoritative citation.

>
>> My point was that the characters I mentioned are not used in prose,
>

>I'm afraid that doesn't get us very much further.

You don't understand the word "prose"? Didn't you read anything in the
original post except the few words that triggered your flame?

Antoine Leca

unread,

Mar 28, 2000, 3:00:00 AM3/28/00

to

Erland Sommarskog wrote:
>
> Of course, if you can convince people all over the world to abandon
> their current writing system in favour of the English alphabet,
> there is a lot work saved.

No.
Quite a number of languages in the world use sounds unknown to English,
some of them (ayn, stod, znak) are conventionnaly written with
apostrophes or similar glyphs adorning the usual English letters.
So you are still back to the topic.

Sorry.

Antoine

Alan J. Flavell

unread,

Mar 28, 2000, 3:00:00 AM3/28/00

to

Well, bogosity alarms have been known to go off at the wrong moment
;-)

I'd say this is a somewhat dubious nickname for an ISO 8-bit code.

Google finds this:
http://sunsite.org.uk/packages/dbperl/refinfo/fips/fip127-2.txt

which references this:
s. ISO 4873, Information Processing - ISO 8-bit code for information
interchange - Structure and rules for implementation, Third Edition,
1991. Replaces ANSI X3.134.1, 8-bit ASCII.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

That tends to confirm my interpretation that the _draft_ for an 8-bit
ASCII was abandoned due to the introduction of ISO standards.

Google also found this somewhat turgid document:
http://sw-eng.falls-church.va.us/itsg/P14V31.htm
which cites an ANSI standard 4873:1991 "ISO 8-Bit Code for
Information Interchange - Structure and Rules for Implementation"
described as "Revision and redesignation of ANSI X3.134.1"

cheers

Michael Berbner

unread,

Mar 28, 2000, 3:00:00 AM3/28/00

to

On 26 Mar 2000 20:34:00 +0200, kaih=7ackX...@khms.westfalen.de (Kai
Henningsen) wrote:

>sich...@lucent.com (He Comes As No Surprise) wrote on 24.03.00 in <8bgktr$9...@nntpa.cb.lucent.com>:
>
>> It depends on what device you are using. Typing is younger than
>> printing, and a typewriter keyboard is full of compromises. On a
>> standard manual U.S. typewriter, the non-directional apostrophe is used
>> (correctly) as (1) apostrophe, (2) closing single quote, (3) opening
>> single quote, (4) prime/foot/minute-mark, (5) the upper portion of an
>> exclamation point, (6) acute accent, and (7) grave accent. It probably
>> has other standard uses that I don't know about.
>>
>> Most electric typewriter keyboards have an exclamation-point key, so
>> you don't need to overstrike. Some have acute and grave accents too.
>
>SAY WHAT?! US typewriters did not have exclamation-point keys?!
>
>You already don't need umlauted vowels, what did you *do*?! Because I've
>never seen a German mechanical typewriter without an exclamation key.
>

Like the German mechanical typewriters that had no 1(one) and 0 (zero)
and you had to use l(small letter L) and O instead.

Michael

Alan

unread,

Mar 29, 2000, 3:00:00 AM3/29/00

to

On 27 Mar 2000 17:30:14 GMT, aa...@priven.com (Aaron Priven) wrote:

>You might also be intrigued to learn that the typewriters also had no
>number 1 -- the lower case L (er, l) was used instead. I occasionally
>see references to years like "l970" in typeset text where people
>obviously didn't bother to change their typing style when they ended
>up typing on the computer.

I came close to doing that, an author with typewriter habits. It look
particularly bad if you're using old-style figures. Worked out a grep
regular expression to search for "L" followed or preceded by a figure
to find them all.

Eric Fischer

unread,

Mar 29, 2000, 3:00:00 AM3/29/00

to

John Hauser <jha...@cs.berkeley.edu> wrote:

> Let's be clear about one thing. Before it got taken over by the ISO,
> official ASCII encouraged the use of codes 0x60 and 0x2C as opening and
> closing single quotation marks.

The chronology here is all wrong. The American Standards Association
began investigating the possibility of a standard character before
the ISO did, it is true, but the original (1963) published version
of ASCII was almost identical to what was then the ISO proposal
for an international standard, and in both national and international
versions, the ` character was then undefined. A proposed revision
of ASCII gained the character only because it had also been added to
the revised proposal for the ISO code. The difference is that the
ASCII committee referred to the character as "opening single quotation
mark" while the ISO called it "grave accent." The current revision
of ASCII *still* gives the opening single quotation mark as one of the
interpretations of the character, while the ISO code has never specified
the character to be a quotation mark. In both cases, this is something
that has not changed since the essentially simultaneous introduction
of the character into the respective standards proposals.

eric

The Pied Typer

unread,

Mar 29, 2000, 3:00:00 AM3/29/00

to

In <8F02E799...@194.213.69.148>, som...@algonet.se wrote:
>
> Don't know exactly what you are talking about, but there is an option
> in Microsoft Word "Use smart quotes" or some such. I usually turn it
> off, because I'm Swedish and we don't use different quotes for opening
> and closing.

That makes sense. But W**d tries to show off its scintillating
intelligence by guessing whether to turn ' and " into an opening quote
or a closing quote, and it often guesses wrong. Anybody who has tried
to use W**d to type the year '99 (for 1999) knows what I mean.

> More importantly, the documents I write tend to include
> code snippets, and opening and closing quotes in code samples are
> only confusing.

My project's manuals in FrameMaker have this problem. Not only
do some text passages have non-directional quotation marks, but
some ASCII computer dialogues have directional quotation marks!

> (Just as all those ill-designed man pages in Unix
> which uses grave accent and apostrophes to emulate opening and closing
> quotes.)

Unix man pages are seldom written correctly. There is a man page for
the man pages, but few people read it. Fewer still have read the
Troff manual.

Man pages may be printed or typed. Nroff and Troff are reasonably
compatible; for instance, Nroff types both the hyphen (code "-")
and the minus sign (code "\-") as "-". But quotation marks cannot
be made compatible, because a printer has two of them and a typist
has only one. Best practice for a man-page writer is to use non-
directional quotes for code and dialogue, and a variant macro for
text:

.ie t \{ # Troff (printing)
.ds LQ ``
.ds RQ ''\}
.el \{ # Nroff (typing)
.ds LQ ""
.ds RQ ""\}
If you run
.I miff
as a filter, the output will include a lot of \[LQ]cheese\[RQ].

The quotation mark must be doubled because the first one denotes
the start of a quoted Nroff string.

-:-
If you understand this, and stay with confusion, confusion will sort
itself out by itself. If you try to sort it out, compute how to do
it, if you ask me for a prescription how to do it, you only add more
confusion to your productions.

--F. Perls (1969)
--
G. L. Sicherman
work: sich...@lucent.com
home: col...@mail.monmouth.com

The Dangling Conversationalist

unread,

Mar 29, 2000, 3:00:00 AM3/29/00

to

In <38e09ff5.7942992@news>, michael...@ffm2.siemens.de wrote:
>
> Like the German mechanical typewriters that had no 1(one) and 0 (zero)
> and you had to use l(small letter L) and O instead.

And Russian mechanical typewriters had no "3" key--they used "Z" instead.

-:-
"Hay, be seedy! He-effigy, hate-shy jaky yellow man, oh
peek, you are rusty, you've edible, you ex-wise he!"

--Harry Mathews

Erland Sommarskog

unread,

Mar 29, 2000, 3:00:00 AM3/29/00

to

John Hauser (jha...@cs.berkeley.edu) writes:
>Official standards notwithstanding, there was until recently a trend
>toward adopting code 0x60 as the opening single quotation mark, since
>there weren't many other options. I know that PostScript, TeX, and
>various fonts on UNIX and Windows systems have placed a true opening
>quotation mark in that position. As I've already mentioned, some PC
>keyboards are labeled that way, too.

Other fonts have not a symmetry in shape between the grave accent
and the apostrophe.

>Unfortunately, this solution is in conflict with Unicode because codes
>0x0060 and 0x0027 are _neither_ the opening or closing single quotation
>marks in Unicode.

Good. Let it stay that way.

--
Erland Sommarskog, Stockholm, som...@algonet.se
This is an incomplete mess.

John Hauser

unread,

Mar 29, 2000, 3:00:00 AM3/29/00

to

Myself:

> Official standards notwithstanding, there was until recently a trend
> toward adopting code 0x60 as the opening single quotation mark, since
> there weren't many other options. I know that PostScript, TeX, and
> various fonts on UNIX and Windows systems have placed a true opening
> quotation mark in that position. As I've already mentioned, some PC
> keyboards are labeled that way, too.

Erland Sommarskog:

> Other fonts have not a symmetry in shape between the grave accent
> and the apostrophe.

So I guess that makes them better.

Me:

> Unfortunately, this solution is in conflict with Unicode because codes
> 0x0060 and 0x0027 are _neither_ the opening or closing single quotation
> marks in Unicode.

Erland Sommarskog:

> Good. Let it stay that way.

If you prefer conflict, okay. I won't change to suit you either.

- John Hauser

Eric Fischer

unread,

Mar 30, 2000, 3:00:00 AM3/30/00

to

C. A. Upsdell <cups...@upsdell.com> wrote:

> I agree that the compilation issue is irrelevant: but I must point out that
> the first versions of Basic *were* interpreted.

No they weren't. All BASICs were compiled for years before there
were any BASIC interpreters. See the paper on BASIC from the ACM
History of Programming Languages conference for details.

eric