VSECS & SECS: Small Extended Character Sets

1 view
Skip to first unread message

Markus Kuhn

unread,
Aug 14, 1998, 3:00:00 AM8/14/98
to
Small European Character Sets
-----------------------------

I have recently spent quite some time working out a proposal for two
Unicode/ISO 10646 subsets that are so small that I hope they will become
widely implemented in Europe and America. Both are specifically designed
to be suitable for systems where characters are represented in
low-resolution fixed-width fonts. This includes for instance your xterm
and Emacs window under Unix (or more general VT100 emulators and source
code editors), but also applications such as portable LCD devices
(pager, mobile phones), where only a small subset of Unicode makes sense
to be implemented and where no single 8-bit set can cover a reasonable
number of languages. These subsets are not really intended for
applications such as the publishing industry, where these display
restrictions do not exist and larger Unicode subsets or even full
implementations might be adequate.

The two subsets are:

- Very Simple European Character Set (VSECS)
345 characters, basically the superset of Latin 1-4,9,10,15 and CP1251
plus a very few ISO 6397 characters

Rows Positions (Cells)
00 20-7E A0-FF
01 00-13 16-2B 2E-31 34-3E 41-48 4A-4D 50-7E 92
02 C6-C7 D8-DD
20 13-15 18-1A 1C-1E 20-22 26 30 39-3A AC
21 22 26 5B-5E 90-93
26 6A
FF FD

- Simple European Character Set (SECS)
683 characters, covers in addition to VSECS also Cyrillic, Greek,
MS-DOS blockgraphics, and a moderate set of mathematical characters
that is likely to be used in academic email and source code comments.

Rows Positions (Cells)
00 20-7E A0-FF
01 00-13 16-2B 2E-31 34-3E 41-48 4A-4D 50-7E 92
02 BC-BD C6-C7 D8-DD
03 84-86 88-8A 8C 8E-A1 A3-CE D1 D5-D6 F1
04 01-0C 0E-4F 51-5C 5E-5F 90-91
20 13-15 17-1A 1C-1E 20-22 26 30 32-34 39-3A 70 7F-83 A7 AC
21 02 15-16 1A 1D 22 24 26 5B-5E 90-95 A4-A7 D0-D5
22 00-09 0B-0C 12-13 18-1A 1D-1E 24-2A 3C 43 45 48-49 58 5F-62 64-65
22 6A-6B 82-8B 95 97 A4-A7 C2-C3 C5
23 00 08-0B 10 15 20-21 29-2A
25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 B2
25 BA BC C4 CB
26 10-12 3A-3C 40 42 6A-6B 6D-6F
27 13 17
FF FD

VSECS is somewhat similar to ISO 6937 with some bugs fixed (e.g., the
Euro symbol is included, as are the directed quotation marks).

SECS is somewhat similar to Microsoft/Adobe WGL4. I think SECS is much
better than WGL4, because WGL4 contains many letters for which I could
not find out where they are used (for at least three I am sure they
never existed). SECS contains the following 91 characters that are not
part of WGL4:

Rows Positions (Cells)
02 BC-BD
03 D1 D5-D6 F1
20 34 70 80-83
21 02 15 1A 1D 24 A4-A7 D0-D5
22 00-01 03-05 07-09 0B-0C 13 18 1D 24-28 2A 3C 43 45 49 58 5F 62
22 6A-6B 82-8B 95 97 A4-A7 C2-C3 C5
23 00 08-0B 15 29-2A
26 10-12 6D-6F
27 13 17
FF FD

Almost all of these are a set of basic mathematic characters that most
high school students should be familiar with. They are very useful to
have available in academic email discussions and source code comments.
It would be nice if the authors of WGL4 considered seriously to extend
their Unicode subset by those few dozen elementary math symbols. Then
SECS would become a subset of WGL4. VSECS is already a subset of WGL4
except for U+FFFD.

The mathematical symbols of SECS will hopefully provide for US
developers who do not specialize in i18n issues some motivation to get
interested in 16-bit character sets, as they are more relevant for their
personal use than the accented characters of crazy Europeans.

My dream is that something like SECS becomes rather soon the common
minimum repertoire in Unix X11 fonts and printer fonts. VSECS is
intended as an intermediate step for applications where the size of the
character set is critical and only Latin script support is required.

I do not think SECS contains any useless symbol. I know for each letter
and symbol why it is in there and in which languages or fields it is
used. Just ask.

Much more information on the two sets is available from

http://www.cl.cam.ac.uk/~mgk25/ucs/vsecs.html
http://www.cl.cam.ac.uk/~mgk25/ucs/secs.html

Much better than just looking at these web pages is to download the
database (Perl needed) that generated them from

http://www.cl.cam.ac.uk/~mgk25/ucs/secs.tar.gz

Then you can play around with them and test the subset properties with
regard to other sets easily yourself.

If you want to see example glyphs on the HTML output of this script,
then you'll also need

http://www.cl.cam.ac.uk/~mgk25/ucs/glyphs.zip

The uniset Perl script allows you to comfortably build up your own
database of character collections, to merge and subtract them and to
generate Unicode subsets and study their relations with other subsets.
The mapping files from the Unicode Consortium can be used directly as
input.

Please let me know what you think about SECS and VSECS and if this is
something you would like to see widely implemented.

Markus

--
Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK
email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/>

Tiro Typeworks

unread,
Aug 14, 1998, 3:00:00 AM8/14/98
to
On Fri, 14 Aug 1998 20:35:14 +0100, Markus Kuhn
<Marku...@cl.cam.ac.uk> wrote:

>SECS is somewhat similar to Microsoft/Adobe WGL4. I think SECS is much
>better than WGL4, because WGL4 contains many letters for which I could
>not find out where they are used (for at least three I am sure they
>never existed).

Markus, I'm very interested in your proposal, but would like to know
for which WGL4 letters you could find no use. I have spent a lot of
time researching European (and non-European) orthographies, and may be
able to account for some of the lesser known letters (which is not to
say that I think WGL4 is perfect).

John Hudson, Type Director

Tiro Typeworks
Vancouver, BC
ti...@tiro.com
www.tiro.com

Markus Kuhn

unread,
Aug 14, 1998, 3:00:00 AM8/14/98
to
Tiro Typeworks wrote:
> Markus, I'm very interested in your proposal, but would like to know
> for which WGL4 letters you could find no use. I have spent a lot of
> time researching European (and non-European) orthographies, and may be
> able to account for some of the lesser known letters (which is not to
> say that I think WGL4 is perfect).

I'm very interested in hearing more about what the rationale to have
the following characters in WGL4 might be:

I don't know where the following ones come from:

0114 # LATIN CAPITAL LETTER E WITH BREVE
0115 # LATIN SMALL LETTER E WITH BREVE
012C # LATIN CAPITAL LETTER I WITH BREVE
012D # LATIN SMALL LETTER I WITH BREVE
014E # LATIN CAPITAL LETTER O WITH BREVE
014F # LATIN SMALL LETTER O WITH BREVE
01FA # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
01FB # LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE
01FC # LATIN CAPITAL LETTER AE WITH ACUTE
01FD # LATIN SMALL LETTER AE WITH ACUTE
01FE # LATIN CAPITAL LETTER O WITH STROKE AND ACUTE
01FF # LATIN SMALL LETTER O WITH STROKE AND ACUTE
02C9 # MODIFIER LETTER MACRON
02D6 # MODIFIER LETTER PLUS SIGN
0387 # GREEK ANO TELEIA

The long s might be from German Fraktur fonts which is unused
since ~1945. This letter has certainly no equivalent in modern
German roman/antiqua fonts and is certainly not needed to
write German:

017F # LATIN SMALL LETTER LONG S

I understand that the following ones were added by mistake to
ISO 6937:

0132 # LATIN CAPITAL LIGATURE IJ
0133 # LATIN SMALL LIGATURE IJ
013F # LATIN CAPITAL LETTER L WITH MIDDLE DOT
0140 # LATIN SMALL LETTER L WITH MIDDLE DOT
0149 # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE

Usage of LIGATURE IJ is now deprecated in the Netherlands
and the other ones never existed in Catalan or Afrikaans
as originally assumed (source: NL gov manual by J.W. van
Wingen).

The following are claimed to be used in Welsh, but Welsh
native speakers who I asked claimed to have never seen them,
so I suspect they are historic characters that are not in
general use.

1E80 # LATIN CAPITAL LETTER W WITH GRAVE
1E81 # LATIN SMALL LETTER W WITH GRAVE
1E82 # LATIN CAPITAL LETTER W WITH ACUTE
1E83 # LATIN SMALL LETTER W WITH ACUTE
1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
1E85 # LATIN SMALL LETTER W WITH DIAERESIS
1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
1EF3 # LATIN SMALL LETTER Y WITH GRAVE

The purpose of the following characters is also
unclear to me:

201B # SINGLE HIGH-REVERSED-9 QUOTATION MARK
203C # DOUBLE EXCLAMATION MARK
203E # OVERLINE

All these are in WGL4 but (so far) not in SECS.

There are also some mysterious characters in the MES-2
proposal which I have not found anywhere else:

01B7 # LATIN CAPITAL LETTER EZH
01C4 # LATIN CAPITAL LETTER DZ WITH CARON
01C6 # LATIN SMALL LETTER DZ WITH CARON
01C7 # LATIN CAPITAL LETTER LJ
01C9 # LATIN SMALL LETTER LJ
01CA # LATIN CAPITAL LETTER NJ
01CC # LATIN SMALL LETTER NJ
01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON
01E4 # LATIN CAPITAL LETTER G WITH STROKE
01E5 # LATIN SMALL LETTER G WITH STROKE
01E6 # LATIN CAPITAL LETTER G WITH CARON
01E7 # LATIN SMALL LETTER G WITH CARON
01E8 # LATIN CAPITAL LETTER K WITH CARON
01E9 # LATIN SMALL LETTER K WITH CARON
01EE # LATIN CAPITAL LETTER EZH WITH CARON
01EF # LATIN SMALL LETTER EZH WITH CARON
01F1 # LATIN CAPITAL LETTER DZ
01F3 # LATIN SMALL LETTER DZ
01F4 # LATIN CAPITAL LETTER G WITH ACUTE
01F5 # LATIN SMALL LETTER G WITH ACUTE
027C # LATIN SMALL LETTER R WITH LONG LEG
0292 # LATIN SMALL LETTER EZH
0374 # GREEK NUMERAL SIGN
0375 # GREEK LOWER NUMERAL SIGN
037A # GREEK YPOGEGRAMMENI
037E # GREEK QUESTION MARK

Do you know a good reason why any of these characters should
go into a simple European character set?

Tiro Typeworks

unread,
Aug 14, 1998, 3:00:00 AM8/14/98
to
cc. comp.std.internat,comp.software.international,comp.fonts,
comp.text.tex
Marku...@cl.cam.ac.uk; cmak...@COMPUSERVE.COM


My browser finally finished downloading Markus' SECS website, and I
have prepared the following comments on some of the WGL4 characters he
has excluded from SECS. I believe that some of these characters should
be included in the SECS, in accordance with Markus' criteria, and have
marked my comments on these characters with an asterisk.

I have not bothered to comment on the heavy linedraw characters, etc.,
and have confined my comments to letters and diacritics.

[I am also concerned that Markus' recommended mathematical set may be
too extensive. Is this really a _basic_ mathematical subset, or
something more?]


0114 LATIN CAPITAL LETTER E WITH BREVE

0115 LATIN SMALL LETTER E WITH BREVE

012C LATIN CAPITAL LETTER I WITH BREVE

012D LATIN SMALL LETTER I WITH BREVE

These characters are not required for the modern writing of any
European language. They are essential to much European prosody, and
are found in most Latin language textbooks and dictionaries. I believe
it would be sound to omit them from SECS if the basic, non-combining
IPA characters are also to be omitted. If the latter are included it
would make sense to include short and long vowel diacritics.


0132 LATIN CAPITAL LIGATURE IJ

0133 LATIN SMALL LIGATURE IJ

These, of course, are the Dutch digraph characters. There is no need
for them to be separately encoded, as Dutch writers commonly type /I/
followed by /J/. These characters can, I believe, be safely omitted
from SECS.


013F LATIN CAPITAL LETTER L WITH MIDDLE DOT

0140 LATIN SMALL LETTER L WITH MIDDLE DOT

These are composite rendering forms for the Catalan lateral
approximant. They are not strictly necessary in a character set which
includes an appropriately sized, positioned and spaced midpoint
character (U+00B7). I am a little concerned that in a monospaced font,
of the kind referred to in Markus' SECS criteria, reliance on the
midpoint character will produce gaping holes in the middle of many
Catalan words. I am undecided about the possible inclusion of these
characters.


0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE

This is an Hewlett Packard character, apparently used by them for
Afrikaans. I've never heard a clear explanation of its purpose, or its
inclusion in WGL4 or other character sets (other than the fact that HP
wanted it to be included). In any case, Afrikaans is beyond the scope
of SECS, so this character may be safely omitted.


014E LATIN CAPITAL LETTER O WITH BREVE

014F LATIN SMALL LETTER O WITH BREVE

These characters are not required for the modern writing of any
European language. They are essential to much European prosody, and
are found in most Latin language textbooks and dictionaries. I believe
it would be sound to omit them from SECS if the basic, non-combining
IPA characters are also to be omitted. If the latter are included it
would make sense to include short and long vowel diacritics.


017F LATIN SMALL LETTER LONG S

Archaic. This may be safely omitted.


01A0 LATIN CAPITAL LETTER O WITH HORN

01A1 LATIN SMALL LETTER O WITH HORN

01AF LATIN CAPITAL LETTER U WITH HORN

01B0 LATIN SMALL LETTER U WITH HORN

Vietnamese. These characters may be safely omitted (although there are
sizeable Vietnamese speaking populations in parts of Europe, notably
in the Netherlands).


01FA LATIN CAPITAL LETTER A WITH RING ABOVE AND
ACUTE

01FB LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE

01FC LATIN CAPITAL LETTER AE WITH ACUTE

01FD LATIN SMALL LETTER AE WITH ACUTE

01FE LATIN CAPITAL LETTER O WITH STROKE AND ACUTE

01FF LATIN SMALL LETTER O WITH STROKE AND ACUTE

* These characters are used in Danish and their inclusion in both
Unicode and the WGL4 set was at the request of the Danish standards
organization. My understanding is that there is some debate over the
status of these characters in modern Danish. Some sources claim that
they are archaic, others that they are orthographically correct and
that to omit them is a mistake. I believe they should not be omitted
from SECS without further research.


02D6 MODIFIER LETTER PLUS SIGN

This may be safely omitted.


1E80 LATIN CAPITAL LETTER W WITH GRAVE

1E81 LATIN SMALL LETTER W WITH GRAVE

1E82 LATIN CAPITAL LETTER W WITH ACUTE

1E83 LATIN SMALL LETTER W WITH ACUTE

1E84 LATIN CAPITAL LETTER W WITH DIAERESIS

1E85 LATIN SMALL LETTER W WITH DIAERESIS

1EF2 LATIN CAPITAL LETTER Y WITH GRAVE

1EF3 LATIN SMALL LETTER Y WITH GRAVE

* All these characters are used in modern Welsh and should _not_ be
omitted from SECS. Their use is less common than the W and Y
circumflex diacritics, but all are essential to semantic distinction
and or pronunciation. My source for this information is Andrew Hawke
(a...@pophost.aber.ac.uk), assistant editor of the University of Wales
dictionary of the Welsh language. I can provide a Welsh word list if
required.

Tiro Typeworks

unread,
Aug 14, 1998, 3:00:00 AM8/14/98
to
On Fri, 14 Aug 1998 23:29:28 +0100, Markus Kuhn
<Marku...@cl.cam.ac.uk> wrote:

>There are also some mysterious characters in the MES-2
>proposal which I have not found anywhere else:

>01B7 # LATIN CAPITAL LETTER EZH

Ezh (or Yogh) is found in Old and Middle English texts, and is a
letter in the orthographies of a number of African languages. The only
modern European language I associate it with is Skolt Saami (see
below). The number of speakers/writers of Skolt Saami is probably well
below the 10,000 minimum set in Markus' criteria.


>01C4 # LATIN CAPITAL LETTER DZ WITH CARON
>01C6 # LATIN SMALL LETTER DZ WITH CARON
>01C7 # LATIN CAPITAL LETTER LJ
>01C9 # LATIN SMALL LETTER LJ
>01CA # LATIN CAPITAL LETTER NJ
>01CC # LATIN SMALL LETTER NJ

These are digraphs which were separately encoded in ISO/IEC 10646 and
Unicode to facilitate compatible font mappings between Latin and
Cyrillic fonts for Serbo-Croatian. Language reform policies in the
former Yugoslav republic -- particularly in Croatia -- have greatly
reduced the need for such compatability. I believe these digraph
characters may still be of use in Serbia, if transliteration to Latin
script is a requirement, but such specialised usage may fall beyond
the proposed scope of SECS.


>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON

>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON

Unicode 2.0 identifies these characters as Lappish. In the first
place, Lappish is generally considered a derogatory term; in the
second these characters do not appear in any of the Saami
orthographies I have collected. Note that I only have Latin
orthographies for five of the nine Saami languages.


>01E4 # LATIN CAPITAL LETTER G WITH STROKE
>01E5 # LATIN SMALL LETTER G WITH STROKE

These characters are used to write Skolt Saami in the Latin script
(Skolt Saami is also written by some using the Cyrillic script). The
number of speakers/writers of Skolt Saami is probably well below the
10,000 minimum set in Markus' criteria.


>01E6 # LATIN CAPITAL LETTER G WITH CARON
>01E7 # LATIN SMALL LETTER G WITH CARON

I can find no reference for these characters. Their use in Turkish is
incorrect and an unacceptable substitute for the G breve diacritics.
Unicode 2.0 indicates 'Lappish', but they do not occur in any of the
Saami orthographies I have on file.


>01E8 # LATIN CAPITAL LETTER K WITH CARON
>01E9 # LATIN SMALL LETTER K WITH CARON
>01EE # LATIN CAPITAL LETTER EZH WITH CARON
>01EF # LATIN SMALL LETTER EZH WITH CARON

These characters are used to write Skolt Saami in the Latin script
(Skolt Saami is also written by some using the Cyrillic script). The
number of speakers/writers of Skolt Saami is probably well below the
10,000 minimum set in Markus' criteria.


>01F1 # LATIN CAPITAL LETTER DZ
>01F3 # LATIN SMALL LETTER DZ
>01F4 # LATIN CAPITAL LETTER G WITH ACUTE
>01F5 # LATIN SMALL LETTER G WITH ACUTE

Your guess is as good as mine. I believe these can be safely omitted.


>027C # LATIN SMALL LETTER R WITH LONG LEG

I know of no usage of this character outside of phonetic transcription
(strident apico-alveolar trill). I'm not even sure that it remains
part of the official IPA standard set.


>0292 # LATIN SMALL LETTER EZH

See note above for uppercase Ezh/Yogh. Of course, if it is decided to
include a basic IPA subset, this character would become necessary.


>0374 # GREEK NUMERAL SIGN
>0375 # GREEK LOWER NUMERAL SIGN

I believe these to be archaic, and are only of use when Greek letters
are serving as numerals (as they did before the introduction of
'Arabic' numerals).


>037A # GREEK YPOGEGRAMMENI

This is the Greek subscript iota. It is not used in modern, monotonic
Greek, so may be safely omitted from SECS.


>037E # GREEK QUESTION MARK

I'm unable to confirm, at this time, whether this punctuation mark is
still in use or not. I suspect not, and most readers would be unlikely
to distinguish it from a semicolon.

Moocows hate spam.

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
On Fri, 14 Aug 1998 23:29:28 +0100, Markus Kuhn
<Marku...@cl.cam.ac.uk> mooed:

>Tiro Typeworks wrote:
>> Markus, I'm very interested in your proposal, but would like to know
>> for which WGL4 letters you could find no use. I have spent a lot of
>> time researching European (and non-European) orthographies, and may be
>> able to account for some of the lesser known letters (which is not to
>> say that I think WGL4 is perfect).
>
>I'm very interested in hearing more about what the rationale to have
>the following characters in WGL4 might be:
>
>I don't know where the following ones come from:

(snip)


>The following are claimed to be used in Welsh, but Welsh
>native speakers who I asked claimed to have never seen them,
>so I suspect they are historic characters that are not in
>general use.
>

>1E80 # LATIN CAPITAL LETTER W WITH GRAVE
>1E81 # LATIN SMALL LETTER W WITH GRAVE
>1E82 # LATIN CAPITAL LETTER W WITH ACUTE
>1E83 # LATIN SMALL LETTER W WITH ACUTE
>1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
>1E85 # LATIN SMALL LETTER W WITH DIAERESIS
>1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
>1EF3 # LATIN SMALL LETTER Y WITH GRAVE

Actually, if one is doing a Welsh *pronunciation* guide these could be
potentially useful; I've also seen at least "w" and "y" acute in past.
(The others, I'll admit, *are* weird--I'm not entirely sure where they'd
be used, save *maybe* in other languages in the same subfamily of Celtic
languages Cymru/Welsh is in [for example, Breton or Manx]. I'm rather
afraid I don't speak any Celtic tongue so I can't be for sure on this;
if memory serves, there is a Manx dictionary online, though. IF my
memory of that serves at ALL well, Manx doesn't use "w" as a vowel but
*does* use "y"; I know exactly nothing on Breton.)

*POSSIBLY* w-diaresis and y-diaresis occur in *some* transcription schemes
for Native American languages (if they occur in this, it'd likely be for
Northwest languages that have vowels and consonants that literally cannot
be expressed in any other way without resorting to the International
Phonetic Alphabet).

Y-dieresis and y-dieresis do occur in the standard character sets of most
English-language Postscript and Truetype fonts.

Offhand, as an aside--I expect some of the other oddish characters
(AE-grave, etc.) are also used mostly in pronunciation guides as well.

A-dieresis-grave, etc. *may* be used in Vietnamese, but I'm not sure.

>The purpose of the following characters is also
>unclear to me:
>
>201B # SINGLE HIGH-REVERSED-9 QUOTATION MARK
>203C # DOUBLE EXCLAMATION MARK
>203E # OVERLINE
>
>All these are in WGL4 but (so far) not in SECS.

Double-exclamation sounds more like a "typesetting character"; so does
"high reversed-9 quot mark" (maybe this is equivalent to leftquot?)

>There are also some mysterious characters in the MES-2
>proposal which I have not found anywhere else:
>
>01B7 # LATIN CAPITAL LETTER EZH

>01C4 # LATIN CAPITAL LETTER DZ WITH CARON
>01C6 # LATIN SMALL LETTER DZ WITH CARON
>01C7 # LATIN CAPITAL LETTER LJ
>01C9 # LATIN SMALL LETTER LJ

EZH I'm not sure on, but it *may* be used in some Turkic languages; DZ and
its variants, and LJ and its variants, occur in some Slavic languages and
also possibly in some Turkic languages (mostly those spoken in countries
that split off from the old USSR and are going back to Romanised
chracters).

(In Cyrillic, separate letters *do* exist for each of these in regional
variants that were used before the USSR split up. This is probably why
they carry over.)

LJ/lj is roughly equivalent to slash-l in Polish, BTW.

>01CA # LATIN CAPITAL LETTER NJ
>01CC # LATIN SMALL LETTER NJ

Used in some Slavic languages, and occasionally in various African
languages. (In Slavic languages, indicates a palatalised-N (similar to
n-acute in some Slavic languages; the "j" essentially means the same as
the "soft mark" in Cyrillic); in the African languages where this is an
actual character, indicates exactly what it says--an "nj" sound (like "ng"
only one doesn't touch one's palate). :)

>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON

If memory serves, used in Vietnamese (in this case, the macron is a tone
character) and in some transcription schemes for Native American
languages.

>01E4 # LATIN CAPITAL LETTER G WITH STROKE
>01E5 # LATIN SMALL LETTER G WITH STROKE

I've only seen this offhand in *some* transcription schemes for Native
American languages [this indicates *roughly* the same as g-caron; see
below] but it may occur in Turkic languages that are converting to Roman
characters.

>01E6 # LATIN CAPITAL LETTER G WITH CARON
>01E7 # LATIN SMALL LETTER G WITH CARON

Commonly used in Turkish and some other Turkic languages to indicate a
"hard G" sound. Also occurs, for the same sound, in some Native American
language transcription schemes.

>01E8 # LATIN CAPITAL LETTER K WITH CARON
>01E9 # LATIN SMALL LETTER K WITH CARON

Less common, but does occur in some Turkic languages; indicates a "hard K"
sound (like hard G--you say it in the back of your throat). Occurs in
some transcription schemes for Native American languages as well.

(As a minor aside--you will find many, MANY standards for transcription
and, in some cases, transliteration of Native American languages. These
vary from fitting the closest Roman equivalent, to using diacritical marks
for consonants that are "sort" of close [many languages have literally two
to four different ways you can pronounce a consonant sound where we might
have one in English, for example] to using unused characters to represent
sounds ["x" for "sh" and "c" for "soft ch" are rather common] to resorting
to the IPA when there's no really good way to represent it via Roman
characters. Hence my notes on this. :)

>01EE # LATIN CAPITAL LETTER EZH WITH CARON
>01EF # LATIN SMALL LETTER EZH WITH CARON

Possibly used in some Slavic languages and Slavic transliteration schemes.
Possibly occurs in Turkic languages. (Again, an "ezh-caron" equivalent
does occur in several local variants of Cyrillic used for "minority
languages" in the USSR.)

>01F1 # LATIN CAPITAL LETTER DZ
>01F3 # LATIN SMALL LETTER DZ

Commonly used in Slavic and Turkic languages; occurs in some Native
American languages as well (most notably the Na-Dene family, which
includes Dine' [Navaho]).

>01F4 # LATIN CAPITAL LETTER G WITH ACUTE
>01F5 # LATIN SMALL LETTER G WITH ACUTE

Fairly unusual, but does occur in some Native American and Slavic (and
possibly Turkic as well, depending on the country's Romanisation scheme)
languages. Usually indicates a palatalised g sound in the few places
where I've seen it.

>027C # LATIN SMALL LETTER R WITH LONG LEG

Fairly unusual; used in some Native American languages as an R-variant.
This is borrowed from the IPA, offhand. This also, occasionally, occurs
in transcription schemes for some African languages.

Some Turkic languages may use it; not sure (at least I've not *seen* any)
however.

>Do you know a good reason why any of these characters should
>go into a simple European character set?

Some of them I'm sort of puzzled on m'self. Some (like Y-dieresis and
Y-acute-dieresis, for example) I can see as they are used in languages
with a known, large audience on Usenet (for instance, Vietnamese-language
or Cymru-language newsgroups).

Some of them, I will frankly admit (namely, *all* the Greek characters
noted and, possibly, some of the other *unusual* letters like longleg-r
and k-macron, etc.) puzzle me why they're included. (As far as I know,
longleg-r only exists in a few Native American transcription schemes and
in some African-language transcription schemes; unless there is a large
Usenet population of folks wishing to type in Salish, I'm not sure why it
should be there. [If it is in there, we should go ahead and add upside-
down K/k, upside-down T/t, cedilla-H, Latin-omega-acute-dieresis,
Latin-chi, etc. and all the other IPA characters you *have* to import from
the IPA to write some of the languages of that area. :) And, of course,
import Latin capital-schwa and Latin small-schwa for our friends in
Azerbaijan; hell, let's just import the entire IPA and be done with it :)

Ah well...I'm sure the author will be glad to explain, in any case. :)

-moo
who, incidentially, still wants to know when the author will write the
terminal patch that will allow a VT100 terminal hooked up to an IBM 3090
mainframe to actually *read* these strange and ferlie characters, or pay
for the unis still using these beasts for student Internet access to
upgrade to nice happy spanking new DEC Alphas and upgrade everyone's
computer to a Pentium whilst they're at it :)


Thomas Chan

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
On 15 Aug 1998 05:09:31 GMT, Moocows hate spam.

<pmba...@spamtrampling-moocow.slug.louisville.edu> wrote:
>A-dieresis-grave, etc. *may* be used in Vietnamese, but I'm not sure.
>>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
>>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON
>
>If memory serves, used in Vietnamese (in this case, the macron is a tone
>character) and in some transcription schemes for Native American
>languages.

No diaeresis's and macrons in Vietnamese.

One needs:

one of <a>, <i>, <u>, <e>, <o>, <y>

with possibility of a circumflex on <a>,
or a horn on <o>,
or a horn on <u>,
or a circumflex on <o>,
or a circumflex on <e>

plus nothing,
or acute accent,
or grave accent,
or "curl", (sorry, do not know technical name for this)
or tilde,
or dot underneath (is there a technical name for this?)

(Not all of the above combinations will exist.)

(Optionally, a <2> or <z>-like hybrid of "curl" and tilde
may occur in the handwriting of southern Vietnamese
speakers who do not distinguish the two tones
marked by those diacritics.)


Thomas Chan
tc...@cornell.edu

Christoph Nahr

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
On Fri, 14 Aug 1998 23:29:28 +0100, Markus Kuhn
<Marku...@cl.cam.ac.uk> wrote:

>The long s might be from German Fraktur fonts which is unused
>since ~1945. This letter has certainly no equivalent in modern
>German roman/antiqua fonts and is certainly not needed to
>write German:
>

>017F # LATIN SMALL LETTER LONG S

While I agree that this letter is not needed in a basic European
character set your reasoning is quite wrong.

The long s was actually used in *both* Fraktur and Antiqua (i.e.
non-Fraktur) typefaces for centuries, and is completely unrelated to
any "Germanness". You should see lots of long s in any older English
(French, Italian, ...) book. The only difference is that Antiqua (or
"Latin") typefaces eventually dropped the long s while Fraktur
typefaces kept it to this day.

As for Fraktur going out of fashion in Germany by 1945... well, the
connection between Nazis and Fraktur is a common misconception.
Actually, the Nazi government *discouraged* use of Fraktur in 1940
because Hitler thought it outdated and contrary to his plans to
"modernise" Germany according to Nazi ideology.

As for Fraktur being "unused" today... several new Fraktur typefaces
have been designed during the past few decades by German designers.
If you go to any newspaper stand you'll see plenty of Fraktur
headlines on newspapers of any nationality. Station and street signs
are also frequently set in Fraktur. But I agree that Fraktur
typefaces are only being used as decorative fonts these days, not as
text fonts which is the important criterium for this discussion.
--
Chris Nahr (cn...@hal9000.net, replace hal9000 with ibm to e-mail me)
Please don't e-mail me if you post! PGP key at wwwkeys.ch.pgp.net

William Ehrich

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
If we can afford to include just one letter for historical / sentimental
reasons I would like that to be:

> 017F # LATIN SMALL LETTER LONG S

It is useful for quoting most old English and German literature.

-- William Ehrich


Thomas Chan

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
On 15 Aug 1998 06:59:51 GMT, Thomas Chan <tc...@cornell.edu> wrote:
>On 15 Aug 1998 05:09:31 GMT, Moocows hate spam.
><pmba...@spamtrampling-moocow.slug.louisville.edu> wrote:
>>A-dieresis-grave, etc. *may* be used in Vietnamese, but I'm not sure.
>>>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
>>>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON
>>
>>If memory serves, used in Vietnamese (in this case, the macron is a tone
>>character) and in some transcription schemes for Native American
>>languages.
>
>No diaeresis's and macrons in Vietnamese.
>
>One needs:
>
>one of <a>, <i>, <u>, <e>, <o>, <y>
>
>with possibility of a circumflex on <a>,
>or a horn on <o>,
>or a horn on <u>,
>or a circumflex on <o>,
>or a circumflex on <e>

Correction to myself: There's also the possibility of
a breve on <a>.

Paul L. Allen

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
In article <35d594f8...@news3.newscene.com>
cn...@hal9000.net (Christoph Nahr) writes:

> As for Fraktur going out of fashion in Germany by 1945... well, the
> connection between Nazis and Fraktur is a common misconception.
> Actually, the Nazi government *discouraged* use of Fraktur in 1940
> because Hitler thought it outdated and contrary to his plans to
> "modernise" Germany according to Nazi ideology.

I don't know if he ever stated that. However, on the 23rd of January 1941,
an official order of the German Nazist Party (Anordnung 2/41; Ordnungsziffer
111) abolished Fraktur and Schwabacher from all printed items, saying:

...It is ordered that from now on only the normal type is to be used
for all printed documents. As normal type, the antiqua type is meant.
The so-called gothic type (Fraktur) is not a german type but goes back
to the schwabacher jew-letters. This type has been strongly used in
Germany because Jews owned the printing works already since typography
was introduced, and later on the newspapers...

I don't know who did the translation (possibly Yannis Haralambous) but
it was accompanied by a photostat of the order. See TUGboat vol. 12 no.
1, March 1991.

--Paul

Erland Sommarskog

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
ti...@tiro.com (Tiro Typeworks) skriver:

>I am a little concerned that in a monospaced font, of the kind referred to in
>Markus' SECS criteria, reliance on the midpoint character will produce gaping
>holes in the middle of many Catalan words. I am undecided about the possible
>inclusion of these characters.

If you go with the Barcelona you will find that there is a station
which appears to be named Paral-lel, so thick is the middle dot,
and this is not the only instance, I've seen.

>* These characters are used in Danish and their inclusion in both
>Unicode and the WGL4 set was at the request of the Danish standards
>organization. My understanding is that there is some debate over the
>status of these characters in modern Danish. Some sources claim that
>they are archaic, others that they are orthographically correct and
>that to omit them is a mistake. I believe they should not be omitted
>from SECS without further research.

Of course, as I'm coming from a neighbour country, I cannot be taken
for an authority, but so much I can say that I have never seen them.

Then again, most Russian I have read had accented vowels, and there
appears to be no accented Cyrillic letters in Markus's set. But as
you might have guessed, I never came much further than my beginner's
textbook...


--
Erland Sommarskog, Stockholm, som...@algonet.se
This could have been my two cents worth, but alas the Swedish
government has decdided that I am not to have any cents.

Markus Kuhn

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
Tiro Typeworks wrote:
[<http://www.cl.cam.ac.uk/~mgk25/ucs/secs.html>]

> My browser finally finished downloading Markus' SECS website, and I
> have prepared the following comments on some of the WGL4 characters he
> has excluded from SECS. I believe that some of these characters should
> be included in the SECS, in accordance with Markus' criteria, and have
> marked my comments on these characters with an asterisk.

Thanks for your comments, they have been very useful.

> [I am also concerned that Markus' recommended mathematical set may be
> too extensive. Is this really a _basic_ mathematical subset, or
> something more?]

There is an international standard ISO 31-11 that defines large parts
of the mathematical notation that is commonly used all over the world.
Most of the character that I have included are from ISO 31-11. I
tried to cover this standard entirely as far as this is possible
in a fixed-width font.

The actual list of math characters that I have included is appended
below. It contains a few remarks about why I think this character
should be covered. Comments welcome.

It is a quite comprehensive set of symbols, so I certainly would not
argue that the math collection should become any larger. I admit that
there are might be a few less common symbols in it that are mostly
of concern to computer scientists, but after all, these are computer
character sets and I can well imagine that most of these symbols
will be used in source code comments etc.

MATH

Markus Kuhn

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
The Windows standard character set CP1252 extends ISO 8859-1
by the following 27 characters:

0x80 0x20AC #EURO SIGN
0x81 #UNDEFINED
0x82 0x201A #SINGLE LOW-9 QUOTATION MARK
0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
0x84 0x201E #DOUBLE LOW-9 QUOTATION MARK
0x85 0x2026 #HORIZONTAL ELLIPSIS
0x86 0x2020 #DAGGER
0x87 0x2021 #DOUBLE DAGGER
0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT
0x89 0x2030 #PER MILLE SIGN
0x8A 0x0160 #LATIN CAPITAL LETTER S WITH CARON
0x8B 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C 0x0152 #LATIN CAPITAL LIGATURE OE
0x8D #UNDEFINED
0x8E 0x017D #LATIN CAPITAL LETTER Z WITH CARON
0x8F #UNDEFINED
0x90 #UNDEFINED
0x91 0x2018 #LEFT SINGLE QUOTATION MARK
0x92 0x2019 #RIGHT SINGLE QUOTATION MARK
0x93 0x201C #LEFT DOUBLE QUOTATION MARK
0x94 0x201D #RIGHT DOUBLE QUOTATION MARK
0x95 0x2022 #BULLET
0x96 0x2013 #EN DASH
0x97 0x2014 #EM DASH
0x98 0x02DC #SMALL TILDE
0x99 0x2122 #TRADE MARK SIGN
0x9A 0x0161 #LATIN SMALL LETTER S WITH CARON
0x9B 0x203A #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C 0x0153 #LATIN SMALL LIGATURE OE
0x9D #UNDEFINED
0x9E 0x017E #LATIN SMALL LETTER Z WITH CARON
0x9F 0x0178 #LATIN CAPITAL LETTER Y WITH DIAERESIS

Most of them make perfectly sense and are useful extentions, however
I have no idea what the purpose of the following three is:

0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT
0x98 0x02DC #SMALL TILDE

Any ideas?

Dik T. Winter

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
In article <35D84110...@cl.cam.ac.uk> Markus Kuhn <Marku...@cl.cam.ac.uk> writes:
> 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK

Probably the Dutch Gulden symbol.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Paul L. Allen

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
In article <ExuC1...@cwi.nl>

d...@cwi.nl (Dik T. Winter) writes:

> In article <35D84110...@cl.cam.ac.uk> Markus Kuhn <Marku...@cl.cam.ac.uk> writes:
> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
>
> Probably the Dutch Gulden symbol.

Certainly the Dutch Guilder/Gulden/Florin symbol.

--Paul

Markus Kuhn

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
Dik T. Winter wrote:
> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
>
> Probably the Dutch Gulden symbol.

Shall we still include it in VSECS (same for the Peseta sign from
CP437)? If we include Peseta and Gulden, then we would also
have to include the Franc and Lira symbols. All these currency
symbols are expected to be superseded by the Euro symbol from
mid-2002 on and would only be of historical value.

Paul L. Allen

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
In article <35D87587...@cl.cam.ac.uk>
Markus Kuhn <Marku...@cl.cam.ac.uk> writes:

> Dik T. Winter wrote:
> > > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> >
> > Probably the Dutch Gulden symbol.
>
> Shall we still include it in VSECS (same for the Peseta sign from
> CP437)? If we include Peseta and Gulden, then we would also
> have to include the Franc and Lira symbols.

What puzzles me is why Unicode added a Lira symbol. The lira symbol
is essentially identical to the pound symbol. A couple of Unicode
fonts I've seen give the pound one cross-stroke and the lira two
cross-strokes, but that's a matter of aesthetics and font design.
Even if there is some inherent national preference one way or the other
between UK and Italian typography, typesetters in each country will tend to
use the same glyph for both symbols (or, more usually, use the symbol
for their national currency and a letter for the other one to avoid
confusion).

--Paul

Tiro Typeworks

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
On 15 Aug 1998 05:09:31 GMT,
pmba...@spamtrampling-moocow.slug.louisville.edu (Moocows hate spam.)
wrote:

>>The following are claimed to be used in Welsh, but Welsh
>>native speakers who I asked claimed to have never seen them,
>>so I suspect they are historic characters that are not in
>>general use.

>>1E80 # LATIN CAPITAL LETTER W WITH GRAVE
>>1E81 # LATIN SMALL LETTER W WITH GRAVE
>>1E82 # LATIN CAPITAL LETTER W WITH ACUTE
>>1E83 # LATIN SMALL LETTER W WITH ACUTE
>>1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
>>1E85 # LATIN SMALL LETTER W WITH DIAERESIS
>>1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
>>1EF3 # LATIN SMALL LETTER Y WITH GRAVE

For the record, as provided to me by Andrew Hawke, assistant editor of
the University of Wales Dictionary of the Welsh Language:

Modern usage of the diacritics in Welsh is as follows:

The circumflex is used solely to indicate that a vowel is long
in a context in which it would normally be expected to be
short, e.g.:

gwa^n (he pierces) vs. gwan (weak)
gwe^n (a smile) vs. gwen (white (fem.))
pi^n (pine (wood, tree)) vs. pi`n (a pin)
co^r (a choir) vs. cor (a dwarf)
bu^m (I was (perfect)) vs. bum (five (mutated))
tw^r (a tower) vs. twr (a group)
y^m (we are) vs. ym (in (before m))

The diaeresis is used to separate vowels, as in English:

prosa"ig (prosaic)
cre"wr (creator)
copi"o (to copy)
tro"edigaeth (conversion)
du"wch (blackness)
Rebacay"ddiaeth (lit. Rebaccaism)
cyw"res (concubine)

The acute accent is used to indicate unexpected stress (i.e.
not on the penultimate):

casa'u (to hate)
case't (cassette)
ricri'wt (a recruit)
paraso'l (a parasol)
rebu'wc (a rebuke)
caridy'ms (riff-raff)
gw'raidd (manly)
[this last is on the penult, but is to distinguish it
from the word gwraidd (root)which is monosyllabic]

The grave accent is used to indicate that a vowel is short in
a context in which it would normally be expected to be long:

pa`s (a pass, permit) vs. pas (a cough)
sie`d (a shed) vs. sie^d/sied (escheat)
sgi`l (a skill) vs. sgi^l/sgil (following)
no`d (a nod) vs. nod (a target, an aim)
cu`l (a hut) vs. cul (narrow)
mw`g (a mug) vs. mwg (smoke (n.))
py`g (dirty) vs. pyg (pitch, tar)

Generally speaking, diacritics in Welsh cannot reasonably be
omitted as they are used either to show unusual stress, or to
differentiate between pairs of otherwise identical words with
different pronounciations. As such they are equally necessary
in upper- and lower-case forms.

The commonest diacritic is the circumflex, followed by the
acute and diaeresis probably about equally. The grave is rare,
but as more and more words are borrowed from English, and new

compounds coined for technical terms, their use will
undoubtedly increase.

To give a very rough indication, according to the headwords in
our (unfinished) dictionary (which we estimate will contain
about about 84,500 entries), the number of accented keywords
(extrapolated to the expected finished size of the dictionary)

will be roughly:

circumflex: 2,000
diaeresis: 880
acute: 500
grave: 160


Clearly it would be a mistake to omit these diacritics from any
character set intended to support the Welsh language.

Chris Maden

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
Markus Kuhn <Marku...@cl.cam.ac.uk> writes:

> The Windows standard character set CP1252 extends ISO 8859-1
> by the following 27 characters:
>

[...]


>
> Most of them make perfectly sense and are useful extentions, however
> I have no idea what the purpose of the following three is:
>

> 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK

This is the guilder sign. Unicode, for whatever reason, doesn't
include an actual guilder/florin sign, but the small f with hook looks
right. This mapping is an approximation. Both the Windows and
Macintosh character sets include the character, so its omission from
Unicode was a surprise to me.

> 0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT
> 0x98 0x02DC #SMALL TILDE

These are to distinguish between the character and the accent. The
circumflex (shift-6 on most US keyboards) is now used for the literal
character (for TeX superscript, regexp inversion...), and so a
distinct character is needed for the diacritic. Similarly, the tilde
is now used for home directories or approximation; a smaller tilde is
needed for using as a diacritic.

-Chris
--
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>

H. Peter Anvin

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
Followup to: <35D87587...@cl.cam.ac.uk>
By author: Markus Kuhn <Marku...@cl.cam.ac.uk>
In newsgroup: comp.std.internat

>
> Dik T. Winter wrote:
> > > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> >
> > Probably the Dutch Gulden symbol.
>
> Shall we still include it in VSECS (same for the Peseta sign from
> CP437)? If we include Peseta and Gulden, then we would also
> have to include the Franc and Lira symbols. All these currency
> symbols are expected to be superseded by the Euro symbol from
> mid-2002 on and would only be of historical value.
>

Include them. It is going to be much more painful to omit them,
IMNSHO. However, my understanding is that the Franc symbol isn't in
common use; in fact, I've had French people tell me "what Franc
symbol", pretty much what I'd tell anyone who'd ask me what the symbol
for a Swedish Crown is.

-hpa
--
PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD 1E DF FE 69 EE 35 BD 74
See http://www.zytor.com/~hpa/ for web page and full PGP public key
I am Bahá'í -- ask me about it or see http://www.bahai.org/
"To love another person is to see the face of God." -- Les Misérables

H. Peter Anvin

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
Followup to: <evale...@sktb.demon.co.uk>
By author: p...@sktb.demon.co.uk
In newsgroup: comp.std.internat

>
> What puzzles me is why Unicode added a Lira symbol. The lira symbol
> is essentially identical to the pound symbol. A couple of Unicode
> fonts I've seen give the pound one cross-stroke and the lira two
> cross-strokes, but that's a matter of aesthetics and font design.
> Even if there is some inherent national preference one way or the other
> between UK and Italian typography, typesetters in each country will tend to
> use the same glyph for both symbols (or, more usually, use the symbol
> for their national currency and a letter for the other one to avoid
> confusion).
>

*BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
exactly one cross-stroke, the Italian Lira symbol has two. You
*never* see the other way around, and they are not interchangable. It
is not like the one or two strokes on the dollar sign!

Stephen Baynes

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
H. Peter Anvin wrote:
>
> Followup to: <35D87587...@cl.cam.ac.uk>
> By author: Markus Kuhn <Marku...@cl.cam.ac.uk>
> In newsgroup: comp.std.internat
> >
> > Dik T. Winter wrote:
> > > > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> > >
> > > Probably the Dutch Gulden symbol.
> >
> > Shall we still include it in VSECS (same for the Peseta sign from
> > CP437)? If we include Peseta and Gulden, then we would also
> > have to include the Franc and Lira symbols. All these currency
> > symbols are expected to be superseded by the Euro symbol from
> > mid-2002 on and would only be of historical value.
> >
>
> Include them. It is going to be much more painful to omit them,
> IMNSHO. However, my understanding is that the Franc symbol isn't in
> common use; in fact, I've had French people tell me "what Franc
> symbol", pretty much what I'd tell anyone who'd ask me what the symbol
> for a Swedish Crown is.
>

Can anyone tell me if the Turkish Lira symbol (a sort of TL monogram similar
to the TM (trademark) symbol)
which exists in the Teletext character set but not in Unicode is another
currency symbol that is not used in practice (or just an invention of the
Teletext standards authority)?

--
Stephen Baynes CEng MBCS Stephen...@soton.sc.philips.com
Philips Semiconductors Ltd
Southampton SO15 0DJ +44 (01703) 316431
United Kingdom My views are my own.
Do you use ISO8859-1? Yes if you see © as copyright, ÷ as division and ½ as 1/2.

The Graphical Gnome

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
In article <35d4ae45....@news.portal.ca>, ti...@tiro.com (Tiro Typeworks) wrote:
>
>0132 LATIN CAPITAL LIGATURE IJ
>
>0133 LATIN SMALL LIGATURE IJ
>
>These, of course, are the Dutch digraph characters. There is no need
>for them to be separately encoded, as Dutch writers commonly type /I/
>followed by /J/. These characters can, I believe, be safely omitted
>from SECS.
>
You can also write oe, ue and ae. Does this mean that the o-umlaut, u-umlaut
and a-umlaut should be removed?? The same applies for the German Sharp s tou
can write it as ss.

We write IJ and ij because the glyph is not available. If you look at TeX you
see that it is added because we (the Dutch) wanted and needed it.

<smily on>
We do not have much of a culture, so don't take away the little we have left.
<smily off>

The Graphical Gnome (r...@ktibv.nl)
Sr. Software Engineer IT Department
-----------------------------------------
The Unofficial Delphi Developers FAQ
http://www.gnomehome.demon.nl/uddf/index.htm

The Graphical Gnome

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
In article <35D4BA48...@cl.cam.ac.uk>, Markus Kuhn <Marku...@cl.cam.ac.uk> wrote:
>0132 # LATIN CAPITAL LIGATURE IJ
>0133 # LATIN SMALL LIGATURE IJ
>
>Usage of LIGATURE IJ is now deprecated in the Netherlands
>and the other ones never existed in Catalan or Afrikaans
>as originally assumed (source: NL gov manual by J.W. van
>Wingen).
Say What!!!!!!.

Because of the fact that most old typewriting systems could not cope with this
glyph does not mean it is deprecated in the Netherlands. It's a Dutch glyph,
and we are mighty proud of it!.

Stewart C. Russell

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
h...@transmeta.com (H. Peter Anvin) wrote:
>*BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
>exactly one cross-stroke, the Italian Lira symbol has two. You
>*never* see the other way around, and they are not interchangable.

At school, I was taught to write a pound sign with two strokes
(Scotland, mid-70's). I don't write it that way now, for with age
comes laziness.

Peter, there are ways of disagreeing with people that are not so
inflammatory. Always think it possible that you might be mistaken.

--
Stewart C. Russell, Glasgow, Scotland - scr...@enterprise.net
"Hang on... This is the real thing... The truth, my friend,
and nothing but the truth" - Mervyn Peake
http://homepages.enterprise.net/scruss/

Paul L. Allen

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
In article <6rb916$fuo$1...@palladium.transmeta.com>

h...@transmeta.com (H. Peter Anvin) writes:

> Followup to: <evale...@sktb.demon.co.uk>
> By author: p...@sktb.demon.co.uk
> In newsgroup: comp.std.internat
> >
> > What puzzles me is why Unicode added a Lira symbol. The lira symbol
> > is essentially identical to the pound symbol. A couple of Unicode
> > fonts I've seen give the pound one cross-stroke and the lira two
> > cross-strokes, but that's a matter of aesthetics and font design.
> > Even if there is some inherent national preference one way or the other
> > between UK and Italian typography, typesetters in each country will tend to
> > use the same glyph for both symbols (or, more usually, use the symbol
> > for their national currency and a letter for the other one to avoid
> > confusion).
> >
>

> *BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
> exactly one cross-stroke, the Italian Lira symbol has two.

*BZZZZZZT*. Totally wrong answer! I'm in the UK and have been for
the 40-odd years of my life. My father was a printer, as was my brother,
grandfather, three uncles and various cousins (in case you're interested, my
father was a laserjet). I'm old enough to remember when the two
cross-stroke form was the norm in the UK. In fact I'm old enough that I was
*taught* that the two-stroke form should be used.

> You *never* see the other way around,

I admit that the one-stroke form predominates in the UK these days. But
that is a matter of typographic style, not an absolute rule. Either is
acceptable.

> and they are not interchangable.

<panto>Oh yes they are</panto>.

Take a look at Whittaker's Almanac in the foreign currency section. It
uses the one-stroke form for pound, punt and lira.

> It is not like the one or two strokes on the dollar sign!

Ah, but it is. There may be national preferences involved and these may
change over time, but one- or two-cross stroke forms are entirely
interchangeable in the UK. Dunno about Italy.

--Paul

Tiro Typeworks

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
On Tue, 18 Aug 1998 09:52:08 GMT, r...@ktibv.nl (The Graphical Gnome)
wrote:

>You can also write oe, ue and ae. Does this mean that the o-umlaut, u-umlaut
>and a-umlaut should be removed?? The same applies for the German Sharp s tou
>can write it as ss.

These examples are hardly parallel. Apart from spacing considerations,
the IJ glyph is identical in appearence to an I followed by a J.
Obviously the same cannot be said of the o-umlaut which, in any case,
is also required as an o-diaeresis for non Germanic languages. The
German eszett cannot, in standard German, be replaced by /ss/, as
there exist words which are semantically distinguished by the use of
/ss/ or eszett.

That said, I'm perfectly happy to endorse inclusion of the IJ and ij
digraphs as characters in any font I make for Dutch clients, if they
want them. Most Dutch type designers I know (and I know a _lot_) seem
quite ambivalent about this digraph.

John Hudson

Rodger Whitlock

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
p...@sktb.demon.co.uk (Paul L. Allen) wrote:
>What puzzles me is why Unicode added a Lira symbol. The lira symbol
>is essentially identical to the pound symbol.

Although the glyphs may be the same or similar, they signify different
things. For much the same reason, the shape A turns up at least twice in
Unicode: once in the Roman alphabet and once in the Cyrillic. (I think
it was John Hudson who mentioned this a while ago.)

Likewise, the ring accent, a superscript zero, and a degree mark all
look about the same, but they are (I hope!) distinct in Unicode.

----
Rodger Whitlock

Dik T. Winter

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
In article <ke3eavj...@rosetta.ora.com> Chris Maden <cr...@oreilly.com> writes:

> Markus Kuhn <Marku...@cl.cam.ac.uk> writes:
> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
>
> This is the guilder sign. Unicode, for whatever reason, doesn't
> include an actual guilder/florin sign, but the small f with hook looks
> right.

Actually this symbol is no longer used very much. So omission is no
problem (and it should go away anyhow by 2002). What you mainly see
is one of: NLG, DFL, F, f, fl or simply nothing.

Claus André Färber

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
Paul L. Allen <p...@sktb.demon.co.uk> schrieb:

> What puzzles me is why Unicode added a Lira symbol. The lira symbol
> is essentially identical to the pound symbol. A couple of Unicode
> fonts I've seen give the pound one cross-stroke and the lira two
> cross-strokes, but that's a matter of aesthetics and font design.

In some fonts, O and 0 look very similar. Still, they are different
characters. Same for A and Greek Alpha and numerous other characters.

--
Claus André Färber <http://www.muc.de/~cfaerber/> Fax: +49_8061_3361
PGP: ID=1024/527CADCD FP=12 20 49 F3 E1 04 9E 9E 25 56 69 A5 C6 A0 C9 DC

Ruben Prins

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
Tiro Typeworks heeft geschreven in bericht
<35d9d957...@news.portal.ca>...

>These examples are hardly parallel. Apart from spacing considerations,
>the IJ glyph is identical in appearence to an I followed by a J.
>Obviously the same cannot be said of the o-umlaut which, in any case,
>is also required as an o-diaeresis for non Germanic languages. The
>German eszett cannot, in standard German, be replaced by /ss/, as
>there exist words which are semantically distinguished by the use of
>/ss/ or eszett.
>
>That said, I'm perfectly happy to endorse inclusion of the IJ and ij
>digraphs as characters in any font I make for Dutch clients, if they
>want them. Most Dutch type designers I know (and I know a _lot_) seem
>quite ambivalent about this digraph.
>
>John Hudson


I'm affraid you're a bit wrong there, since we TYPE i+j, but WRITE y+umlaut
(mostly, many people even omit the two dots); also, when you use an italics
font, it usually looks better when you have a round y (like Computer Modern)
with two dots instead of i+j (it looks downright awful with italic CM); the
same holds even stronger for script fonts (ever seen the captial IJ that
kids learn at school in the Netherlands? It's neither Y nor I+J). Also, IJ
is not always I+J in roman types, since some designers use a raised smaller
I just above the curl of the J (for capitals that is, in this case ij is
often a round y with dots too). This is NOT a kerning/ligature matter, for
Dutch also knows words like bijectie (bijection) where ij really is i+j,
besides, you'd need a different font just for typesetting Dutch!
Another omission in font/code page design is (well, you can't help that),
that the Dutch can also accentuate ij (and IJ) by putting two acutes on top
of it (like a Hungarian umlaut, omitting the dots on the i end j); now, I
haven't seen that option anywhere yet, not even for j+acute. Note that this
is not even an informal rule, but an official one by the Dutch and Flemish
governments (i+acute+j can be used however, as a last resort).
Add to this that IJ is often treated as a single letter in alphabetation,
you can safely say it is a letter that LOOKS like i+j (in standard typing),
but is quite different.

PS
The reason, I think, that many printers are ambivalent, is because they're
not used to the glyph ij since you can hardly find it anywhere and there are
no computer keyboards (that I know of) that support it either. Probably also
a reason why the /oe/ is infrequent in French (all arguments between
using/not using IJ and OE are interchangable, I think).

Ruben Prins

Het spijt me, maar betweterigheid zit 'm in de genen.

H. Peter Anvin

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
Followup to: <6rcfk1$h92$1...@news.enterprise.net>
By author: scr...@enterprise.net
In newsgroup: comp.std.internat

>
> h...@transmeta.com (H. Peter Anvin) wrote:
> >*BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
> >exactly one cross-stroke, the Italian Lira symbol has two. You
> >*never* see the other way around, and they are not interchangable.
>
> At school, I was taught to write a pound sign with two strokes
> (Scotland, mid-70's). I don't write it that way now, for with age
> comes laziness.
>
> Peter, there are ways of disagreeing with people that are not so
> inflammatory. Always think it possible that you might be mistaken.
>

You're right, I'm sorry. Let it suffice to say I wasn't in the right
frame of mind while posting that message.

Anyway, HOWEVER, I gather while to the Brits the dual-stroke £
character may be acceptable (and hence, in Britain this being a
stylistic difference), the same is -- as far as I understand --
distinctly NOT true for the Italians. In Italy you invariably see the
two-stroke version, unless there has been a dramatic change very
recently. The spacing between the two strokes in the Lira symbol is
usually pretty wide; I don't know if that is a stylistic difference or
not.

Either way, I not believe it is correct to say that they are
interchangeable. At least the Brits permit the single-stroke form.

H. Peter Anvin

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
Followup to: <ke3eavj...@rosetta.ora.com>
By author: Chris Maden <cr...@oreilly.com>
In newsgroup: comp.std.internat

>
> > Most of them make perfectly sense and are useful extentions, however
> > I have no idea what the purpose of the following three is:
> >
> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
>
> This is the guilder sign. Unicode, for whatever reason, doesn't
> include an actual guilder/florin sign, but the small f with hook looks
> right. This mapping is an approximation. Both the Windows and
> Macintosh character sets include the character, so its omission from
> Unicode was a surprise to me.
>

Presumably because LATIN SMALL LETTER F WITH HOOK was considered an
adequate mapping. I believe there is a usage note in the Unicode
manual saying this is used for the Dutch guilder.

Markus Kuhn

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
Dik T. Winter wrote:
> > > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> >
> > This is the guilder sign. Unicode, for whatever reason, doesn't
> > include an actual guilder/florin sign, but the small f with hook
> > looks right.
>
> Actually this symbol is no longer used very much. So omission is no
> problem (and it should go away anyhow by 2002). What you mainly see
> is one of: NLG, DFL, F, f, fl or simply nothing.

I have found another potential origin for the LATIN SMALL LETTER F
WITH HOOK: The ISO registered character set number 143 for mathematical
symbols (<http://www.cl.cam.ac.uk/~mgk25/ucs/IR-143.pdf>) contains
on position 05/13 a symbol called FUNCTION OF SIGN which looks
very similar to LATIN SMALL LETTER F WITH HOOK.

I think I have a quite decent mathematical background, but I have
never seen a FUNCTION OF SIGN used in any math courses that I
visited or math book that I read. It is also not defined in
ISO 31-11, a standard that covers large parts of the global
mathematical notation. What is it good for and where is this
symbol widely used?

Markus Kuhn

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
Claus André Färber wrote:
> In some fonts, O and 0 look very similar. Still, they are different
> characters. Same for A and Greek Alpha and numerous other characters.

Good fonts for data processing usage should be carefully designed to
make the glyphs easily distinguishable, and standardized simple Unicode
subsets should support this and should avoid homoglyphs wherever
this is feasible.

Terminal fonts usually add a dot or a slash to zeros to make them
distinguishable from Os. In OCR-B, the O looks more like a square
while the 0 looks more like a lozenge: /\
\/

Designing an OCR-B extension for all of Unicode or even for a big
subset such as MES-3 should be a quite challanging task.

JHB NIJHOF

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
In comp.text.tex The Graphical Gnome <r...@ktibv.nl> wrote:

: In article <35D4BA48...@cl.cam.ac.uk>, Markus Kuhn <Marku...@cl.cam.ac.uk> wrote:
:>0132 # LATIN CAPITAL LIGATURE IJ
:>0133 # LATIN SMALL LIGATURE IJ
:>
:>Usage of LIGATURE IJ is now deprecated in the Netherlands
:>and the other ones never existed in Catalan or Afrikaans
:>as originally assumed (source: NL gov manual by J.W. van
:>Wingen).
: Say What!!!!!!.

: Because of the fact that most old typewriting systems could not cope
: with this glyph does not mean it is deprecated in the
: Netherlands. It's a Dutch glyph, and we are mighty proud of it!.

Yes indeed! Most of the time it is considered to be a single letter,
the 'lange ij' ('long ij'). Battus, in his marvelous
'Opperlandse taal en letterkunde', recognizes three alphabets:
Oudhollands, Old Dutch: ... x ij z;
PTT: ... x y z; and
tolerant: ... x y ij z.
The distinction is relevant for pangrams (sentences containing each
letter of the alphabet) and the like.

Some dictionaries and encyclopedias alphabetize the ij after the x,
as do all crossword puzzles.

It is similiar to the CH in Hungarian (or was that Czech): there you will
find signs saying

W
E
CH
S
E
L
S
T
U
B
E


!

--
Jeroen Nijhof J.H.B....@aston.ac.uk
Accordion Links http://www-th.phys.rug.nl/~nijhof/accordions.html

Marc Joosen

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
Ruben Prins wrote:
> PS
> The reason, I think, that many printers are ambivalent, is because they're
> not used to the glyph ij since you can hardly find it anywhere and there are
> no computer keyboards (that I know of) that support it either. Probably also
> a reason why the /oe/ is infrequent in French (all arguments between
> using/not using IJ and OE are interchangable, I think).

In fact, there are (or at least, were) keyboards that support the glyph ij.
I have an obsolete Burroughs B20 system with Dutch keyboards (it's a small
network). There are keys for ij and even f (the italic long f discussed
somewhere
else in this thread). The accompanying daisywheel printer, a standard Diablo
630, has no problems printing the ij since the correct kerning is present in
the driver table.

--
Marc Joosen

Antti-Juhani Kaijanaho

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
r...@ktibv.nl (The Graphical Gnome) writes:

> You can also write oe, ue and ae.

As replacements for "u¹, ä, and ö? No, you can't. Here in Finland,
ae and ue are _not_ accetable replacements for ä and ö - a and o are
much better.

Do you know how much we Finns cry every time we see a reference to the
skier Marja-Liisa Haemaelaeinen (now Kirvesniemi due to marriage)? I
tell you, very much. An "ae" instead of "ä" _hurts_ the eye, and it's
hard to read, too!

Antti-Juhani

¹ Sorry, my keyboard config does not give me fast access to u diaeresis.
--
Antti-Juhani Kaijanaho <ga...@iki.fi> ** <URL:http://www.iki.fi/gaia/> **

All GNU users have more. Most of them have less.
Some of them have most.

Paul L. Allen

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
In article <6rdvtu$urj$1...@palladium.transmeta.com>

h...@transmeta.com (H. Peter Anvin) writes:

> Followup to: <6rcfk1$h92$1...@news.enterprise.net>
> By author: scr...@enterprise.net
> In newsgroup: comp.std.internat
> >
> > h...@transmeta.com (H. Peter Anvin) wrote:
> > >*BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
> > >exactly one cross-stroke, the Italian Lira symbol has two. You
> > >*never* see the other way around, and they are not interchangable.
> >
> > At school, I was taught to write a pound sign with two strokes
> > (Scotland, mid-70's). I don't write it that way now, for with age
> > comes laziness.
> >
> > Peter, there are ways of disagreeing with people that are not so
> > inflammatory. Always think it possible that you might be mistaken.
>
> You're right, I'm sorry. Let it suffice to say I wasn't in the right
> frame of mind while posting that message.
>
> Anyway, HOWEVER, I gather while to the Brits the dual-stroke £
> character may be acceptable (and hence, in Britain this being a
> stylistic difference), the same is -- as far as I understand --
> distinctly NOT true for the Italians.

And is this something you have been *told* by *experienced Italian
typographers or is this merely observation? It's very difficult to spot
any usage of the two-stroke pound symbol in the UK these days because
it appears to have gone out of fashion. Unless you were specifically
told otherwise, the same argument may apply to the Italian Lira.

> In Italy you invariably see the two-stroke version, unless there has been
> a dramatic change very recently.

Unless there was a dramatic change a long time ago...

> Either way, I not believe it is correct to say that they are
> interchangeable. At least the Brits permit the single-stroke form.

Which is something you were adamant we did not until two people
corrected you. Which is why I'd prefer an Italian typographer's comment
on this one...

--Paul

Paul L. Allen

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
In article <35DA9705...@cl.cam.ac.uk>
Markus Kuhn <Marku...@cl.cam.ac.uk> writes:

> Dik T. Winter wrote:
> > > > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> > >
> > > This is the guilder sign. Unicode, for whatever reason, doesn't
> > > include an actual guilder/florin sign, but the small f with hook
> > > looks right.
> >
> > Actually this symbol is no longer used very much. So omission is no
> > problem (and it should go away anyhow by 2002). What you mainly see
> > is one of: NLG, DFL, F, f, fl or simply nothing.
>
> I have found another potential origin for the LATIN SMALL LETTER F
> WITH HOOK: The ISO registered character set number 143 for mathematical
> symbols (<http://www.cl.cam.ac.uk/~mgk25/ucs/IR-143.pdf>) contains
> on position 05/13 a symbol called FUNCTION OF SIGN which looks
> very similar to LATIN SMALL LETTER F WITH HOOK.
>
> I think I have a quite decent mathematical background, but I have
> never seen a FUNCTION OF SIGN used in any math courses that I
> visited or math book that I read.

You're kidding, right? You never came across the notation y = f(x)
to denote that y is some function of x?

> It is also not defined in ISO 31-11, a standard that covers large parts
> of the global mathematical notation. What is it good for and where is this
> symbol widely used?

It's used in just about every aspect of mathematics at secondary level
and beyond. The f is an italic f. Actually, in well-designed fonts,
maths italic is slightly different from text italic in the design and
kerning of various letters (but you need to compare the two side-by-side
for most people to notice it).

--Paul

Helge Nareid

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
On Wed, 19 Aug 1998 10:18:27 +0100, Markus Kuhn
<Marku...@cl.cam.ac.uk> wrote:

>Terminal fonts usually add a dot or a slash to zeros to make them
>distinguishable from Os.

The slashes aren't a particularly good idea for us Norwegians and
Danes - we rather like our Ø's (O slash for those without proper
terminal software).

--
- Helge Nareid
Nordmann i utlendighet, Aberdeen, Scotland

Tiro Typeworks

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
On Wed, 19 Aug 1998 18:51:31 +0100, p...@sktb.demon.co.uk (Paul L.
Allen) wrote:

>And is this something you have been *told* by *experienced Italian
>typographers or is this merely observation? It's very difficult to spot
>any usage of the two-stroke pound symbol in the UK these days because
>it appears to have gone out of fashion. Unless you were specifically
>told otherwise, the same argument may apply to the Italian Lira.

Really, this is a moot point. Regardless of appearances, the Italian
Lira sign and the British Sterling sign are semantically dictinct and
are separately encoded in Unicode. Even if the two glyphs were
identical, I would still expect an Italian keyboard driver to map the
Lira codepoint, rather than the Sterling.

My recommendation for SECS and even VSECS is that _all_ encoded
European currency signs be included. The debate then becomes which
non-European currency signs should be included. US Dollar? Yen?

John Hudson, Type Director

Ruben Prins

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
Marc Joosen heeft geschreven in bericht
<35DAF590...@historia.et.tudelft.nl>...

> In fact, there are (or at least, were) keyboards that support the glyph
ij.
>I have an obsolete Burroughs B20 system with Dutch keyboards (it's a small
>network). There are keys for ij and even f (the italic long f discussed
>somewhere
>else in this thread). The accompanying daisywheel printer, a standard
Diablo
>630, has no problems printing the ij since the correct kerning is present
in
>the driver table.
>
>--
> Marc Joosen

Really? I know that there were some typewriters that knew the ij (mostly
only lowercase), but not of any computer. But since no Windows/DOS/Unix/OS2
is supporting it, it's pretty useless to have such a computer keyboard
anyway. (And probably they never will--I've got an official Dutch IBM
keyboard, but no IJ or florin/guilder.) Sob, snivel :(
I guess you can say I'm a little hooked on that letter, ah well...

Ruben Prins

WIJ EISEN IJ'S!

[vrij naar Annie M.G. Schmidt; for non-Dutch/Flemmish readers:
we "demand IJs", but "ijs" means "icecream" as well]

Tor Arntsen

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
In article <35e02343...@news.demon.co.uk>,

he...@nareid.demon.co.uk (Helge Nareid) writes:
>On Wed, 19 Aug 1998 10:18:27 +0100, Markus Kuhn
><Marku...@cl.cam.ac.uk> wrote:
>
>>Terminal fonts usually add a dot or a slash to zeros to make them
>>distinguishable from Os.
>
>The slashes aren't a particularly good idea for us Norwegians and
>Danes - we rather like our Ø's (O slash for those without proper
>terminal software).

That's why we prefer backslashed O's for zeros (quite common practice,
although not global, unfortunately)

Tor

Tiro Typeworks

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
cc. Marku...@cl.cam.ac.uk


On Mon, 17 Aug 1998 18:36:48 +0100, p...@sktb.demon.co.uk (Paul L.
Allen) wrote:

>> In article <35D84110...@cl.cam.ac.uk> Markus Kuhn <Marku...@cl.cam.ac.uk> writes:
>> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK

>Certainly the Dutch Guilder/Gulden/Florin symbol.

It should be noted that, while U+0192 has become a de facto standard
codepoint for the florin sign (i.e. Dutch guilder), the lowercase
letter f with hook is actually a nasalised consonant in orthographies
for a number of African languages. As such, it is paired with U+0191,
the uppercase letter F with hook.

This is a typical Unicode implementation mess: reliance on a single
codepoint to map two semantically distinct characters. In this case,
the characters are not only semantically distinct but graphically
incompatible. The florin sign is traditionally slanted to the right,
in the manner of an italic or script f. Of course, whether the hooked
f as used in African languages is slanted depends on whether the font
is roman or italic.

I'm currently working on a font which supports some 300 African
languages and also includes a florin sign. I have to include two
distinct glyphs, only one of which I can encode. Because U+0192 is
used as a de facto codepoint for the florin sign, I've opted to leave
the African hooked f unencoded. This means that this important letter
will have to be accessed through a complicated 'African alternate
forms' glyph substitution routine.

Ruben Prins

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to

Dik T. Winter heeft geschreven in bericht ...

>Actually this symbol is no longer used very much. So omission is no
>problem (and it should go away anyhow by 2002). What you mainly see
>is one of: NLG, DFL, F, f, fl or simply nothing.


That's no excuse! All abbreviations you mentioned are ugly compared to the
elegant É. But I know there's no point in a discussion, since most people
don't care or don't know where to find the symbol.
It's a bit like omitting the trema (umlaut) on vowels, neither right nor
easthetically pleasing. Well, you can differ in opinion about "right" for É,
since on coins of, take, one guilder it says 1G and not É1 (or any of the
above).

Ruben Prins


Paul L. Allen

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
In article <35db2c9c...@news.portal.ca>
ti...@tiro.com (Tiro Typeworks) writes:

> On Wed, 19 Aug 1998 18:51:31 +0100, p...@sktb.demon.co.uk (Paul L.
> Allen) wrote:
>
> >And is this something you have been *told* by *experienced Italian
> >typographers or is this merely observation? It's very difficult to spot
> >any usage of the two-stroke pound symbol in the UK these days because
> >it appears to have gone out of fashion. Unless you were specifically
> >told otherwise, the same argument may apply to the Italian Lira.
>
> Really, this is a moot point.

Not entirely.

> Regardless of appearances, the Italian Lira sign and the British Sterling
> sign are semantically dictinct and are separately encoded in Unicode.

Uh-huh. I know that Unicode started out by assigning different code-points
to identical glyphs with different semantics, but they abandoned that
particular idea when they started dealing with Japanese/Chinese/Korean/etc.

If you *were* right that the Lira and Pound Sterling should have different
code-points even if the glyphs were identical because they have different
semantics, then we would need separate characters for each of the following:

Pound: Cypriot, Egyption, Falkland, Gibraltar, Lebanese, St Helena,
Sterling [UK], Sudanese, Syrian.

Punt [name is "Irishization" of pound]: Irish.

Colon: Costa Rican, El Salvadorian.

Dollar: Australian, Bahamian, Babados, Belize, Bermudian, Brunei,
Canadian, Cayman Islands, East Caribbean, Fijian, Guyanan, Hong Kong,
Jamaican, Liberian, Malaysian, Namibian, New Zealand, Singapore,
Solomon Islands, Taiwan, Trinidad and Tobago, United States (who?),
Zimbabwe.

Guilder: Aruban, Dutch, Netherlands Antilles, Surinam

Franc: Central African, Pacific, Belgian, Burkina Faso, Burundi,
Comorian, Djibouti, French, Guinea, Luxembourg, Malagasy, Malian,
Rwandan, Swiss, West African.

Lira: Italian, Maltese, Turkish.

Peseta: Andorran, Spanish.

Rupee: Indian, Mauritius, Nepalese, Pakistani, Seychelles, Sri Lankan.

Won: North Korean, South Korean.

Notes on the above:

1) All of the above are theoretically independent currencies. In
practise some are tied to a 1:1 exchange ratio with a parent currency
(e.g. Falkand Pound is tied to Pound Sterling) and are effectively
the same currency with different banknotes.

2) I don't know how many of those in the list actually use the symbol
that is assigned to a currency of that name in the Unicode table. Most
of those with the Pound use the Pound Sterling symbol; most of those
with the Dollar use the Dollar symbol.

3) There's a Bengali Rupee symbol (and a Bengali Rupee mark) as well
as the "ordinary" Rupee symbol. Perhaps somebody knows why the Bengalis
apparently need a different symbol for the Indian Rupee.

4) There may well be other currency symbols in use around the world
which Unicode has not yet assigned codepoints to - those are the ones
that I could find in the Unicode tables.

So let's at least have a little consistency here. And don't forget that
if Spain splits into separate countries we will need several more Peseta
symbols to give each one their own semantic meaning. Oh, and if Canada
becomes part of the US (as sometimes looks likely) then we will have to
scrap one of the 21 new dollar symbols (that we *must* have, according to
your theory) as no longer meaningful.

> Even if the two glyphs were identical, I would still expect an Italian
> keyboard driver to map the Lira codepoint, rather than the Sterling.

So what does a Maltese or Turkish keyboard driver map to? By your
principle, it is equally wrong for it to map to the Italian Lira as for it
to map to the Pound Sterling symbol. As for all those people in various
countries who just type $ when talking of their currency when that
code-point belongs soleley to the US...

> My recommendation for SECS and even VSECS is that _all_ encoded
> European currency signs be included. The debate then becomes which
> non-European currency signs should be included. US Dollar? Yen?

See above for other currency symbols. Not only do you have to decide
who you include and who you exclude, but also whether or not you give
them a separate code-point when they use the same glyph as a different
currency.

Which brings me back to my original question: I wonder why Unicode bothered
to give the Italian Lira a separate code-point. Let me expand on that: I
wonder why Unicode bothered to give the Italian Lira a separate code-point
but did not do so for the other 11 countries that use the same glyph for
their currency symbol (whether the currency is called pound, punt or lira).

It all seems to hinge upon whether or not the two-stroke form is merely
a matter of current typographical taste in Italy or if, as suggested, the
one-stroke form is truly unacceptable. I wonder that the Maltese and
Turkish think about this one...

--Paul

Chris Maden

unread,
Aug 19, 1998, 3:00:00 AM8/19/98
to
h...@transmeta.com (H. Peter Anvin) writes:

> Presumably because LATIN SMALL LETTER F WITH HOOK was considered an
> adequate mapping. I believe there is a usage note in the Unicode
> manual saying this is used for the Dutch guilder.

0192 [f] LATIN SMALL LETTER F WITH HOOK
= LATIN SMALL LETTER SCRIPT F
= Florin currency symbol (Dutch)
= function symbol

HTH,
Chris
--
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>

Claus André Färber

unread,
Aug 20, 1998, 3:00:00 AM8/20/98
to
Paul L. Allen <p...@sktb.demon.co.uk> schrieb:
> You're kidding, right? You never came across the notation y = f(x)
> to denote that y is some function of x?

There is no need to introduce a special character for that. The
lowercase f is enough for that purpose.

It is really only a stylistic issue whether to set this f in another
font for mathematical equations.

Paul L. Allen

unread,
Aug 20, 1998, 3:00:00 AM8/20/98
to
In article <35db3e38...@news.portal.ca>
ti...@tiro.com (Tiro Typeworks) writes:

> cc. Marku...@cl.cam.ac.uk


>
>
> On Mon, 17 Aug 1998 18:36:48 +0100, p...@sktb.demon.co.uk (Paul L.
> Allen) wrote:
>
> >> In article <35D84110...@cl.cam.ac.uk> Markus Kuhn <Marku...@cl.cam.ac.uk> writes:
> >> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
>
> >Certainly the Dutch Guilder/Gulden/Florin symbol.
>
> It should be noted that, while U+0192 has become a de facto standard
> codepoint for the florin sign (i.e. Dutch guilder), the lowercase
> letter f with hook is actually a nasalised consonant in orthographies
> for a number of African languages. As such, it is paired with U+0191,
> the uppercase letter F with hook.

I appreciate the background which explains why it isn't flagged as a
currency symbol. However, it appears that as far as Unicode are concerned
it is a de jure code-point for the florin sign because their technical
notes say that's what it is.

> This is a typical Unicode implementation mess: reliance on a single
> codepoint to map two semantically distinct characters. In this case,
> the characters are not only semantically distinct but graphically
> incompatible.

I can live with two or more characters with different semantics mapping
to the same glyph. I can live with characters that have slightly differing
national tastes for what is essentially same glyph (national versions of
fonts solve this one). But when there are two distinct uses with
incompatible glyphs then it's wrong to use the same code-point.

--Paul

Paul L. Allen

unread,
Aug 20, 1998, 3:00:00 AM8/20/98
to
In article <6-FSZ...@faerber.muc.de>
claus+...@faerber.muc.de (=?ISO-8859-1?Q?Claus_Andr=E9_F=E4rber?=) writes:

> Paul L. Allen <p...@sktb.demon.co.uk> schrieb:
> > You're kidding, right? You never came across the notation y = f(x)
> > to denote that y is some function of x?
>
> There is no need to introduce a special character for that. The
> lowercase f is enough for that purpose.

Correction. The lower-case *italic* f is *close enough* that *most* people
*wouldn't notice* the difference. You'll see others argue in the past
that Unicode has assigned different code-points to identical glyphs
with different semantics.

> It is really only a stylistic issue whether to set this f in another
> font for mathematical equations.

Tell that to mathematicians. Tell that to Donald Knuth. Tell that to
all the TeX users who know that maths-italic f is decidedly different
from ordinary italic f, which is itself different from Roman (ordinary)
lower-case f.

--Paul

Tiro Typeworks

unread,
Aug 20, 1998, 3:00:00 AM8/20/98
to

Probably because the Italians asked them to. A lot of stuff in Unicode
seems to be there because national standards organisations insisted on
its inclusion in ISO/IEC 10646.

Your points about multiplying glyphs are well taken, and I should have
clarified my position. I think that when there are recognisable, even
if inconsistent, preferences of form, it is very awkward to have only
one codepoint available if one is making a font for both markets.

H. Peter Anvin

unread,
Aug 20, 1998, 3:00:00 AM8/20/98
to
Followup to: <6rff3u$bf8$4...@o.online.no>
By author: t...@spacetec.no (Tor Arntsen)
In newsgroup: comp.std.internat

> >
> >>Terminal fonts usually add a dot or a slash to zeros to make them
> >>distinguishable from Os.
> >
> >The slashes aren't a particularly good idea for us Norwegians and
> >Danes - we rather like our Ø's (O slash for those without proper
> >terminal software).
>
> That's why we prefer backslashed O's for zeros (quite common practice,
> although not global, unfortunately)
>

I prefer either dotted 0's, or (preferred) a thinner/more rounded
form.

Lars Engebretsen

unread,
Aug 20, 1998, 3:00:00 AM8/20/98
to
p...@sktb.demon.co.uk (Paul L. Allen) writes:

>
> In article <6-FSZ...@faerber.muc.de>


> claus+...@faerber.muc.de (Claus André Färber) writes:
>
> > Paul L. Allen <p...@sktb.demon.co.uk> schrieb:
> > > You're kidding, right? You never came across the notation y = f(x)
> > > to denote that y is some function of x?
> >
> > There is no need to introduce a special character for that. The
> > lowercase f is enough for that purpose.
>
> Correction. The lower-case *italic* f is *close enough* that *most* people
> *wouldn't notice* the difference. You'll see others argue in the past
> that Unicode has assigned different code-points to identical glyphs
> with different semantics.

It seems to me that you are confusing characters with glyphs. If there
were to be a separate "function character" or "function symbol", it
would be a good idea to include it Unicode. I have never heard of such
a symbol.

A related example is the integral sign, which was originally a long s,
but then evolved to its own symbol.

> > It is really only a stylistic issue whether to set this f in another
> > font for mathematical equations.
>
> Tell that to mathematicians. Tell that to Donald Knuth. Tell that to
> all the TeX users who know that maths-italic f is decidedly different
> from ordinary italic f, which is itself different from Roman (ordinary)
> lower-case f.

But what you mention are indeed stylistic issues. With your
reasoning, it would be nescessary to have a special math version of
every character, since math italic and text italic need not be
identical for any letter, and the semantics are indeed different.

Furthermore, IMHO it would be a mess if there was to be a separate f,
distinct from the math italic f, meaning "the function of".

/Lars

Markus Kuhn

unread,
Aug 20, 1998, 3:00:00 AM8/20/98
to
Paul L. Allen wrote:
> > > You're kidding, right? You never came across the notation y=f(x)
> > > to denote that y is some function of x?

Of course I have seen f(x) = g'(x) etc., but the function was
always represented by an italics letter f, g, etc. and never by
an italics letter f with hook or by any other special symbol that
was not just a letter! I've never seen a special symbol that would
deserve a name such as FUNCTION OF SIGN. Mathematicians use any
latin and greek letter to denote functions and variables, and they
just set them in normal italics font with a kerning that makes
it clear that the symbols are individual entities and not part
of words.

> Correction. The lower-case *italic* f is *close enough* that *most* people
> *wouldn't notice* the difference.

The trouble with typographers is that they constantly think they have
discovered new characters where there actually is no new one. The
mathematical variables and functions are normal italic letters.
The OHM SIGN and KELVIN SIGN are just capital letters omega and
kelvin. The Angstroem sign is just an A WITH RING ABOVE. It is
complete nonsense that separate code points have been
introduced for these. Knuth used only a separate cmmi (math italics)
font for the formula symbols, because this was a convenient hack to
store the math-specific kerning information (variables in formulas
are spaced wider apart than letters in italic words, but they are
still the same characters).

> Tell that to mathematicians. Tell that to Donald Knuth. Tell that to
> all the TeX users who know that maths-italic f is decidedly different
> from ordinary italic f, which is itself different from Roman
> (ordinary) lower-case f.

All of these are latin letters f, just in different font styles
(roman, italic text, italic math). No need to encode separate
characters here. An f stays an f.

Markus Kuhn

unread,
Aug 20, 1998, 3:00:00 AM8/20/98
to
Helge Nareid wrote:
>
> On Wed, 19 Aug 1998 10:18:27 +0100, Markus Kuhn
> <Marku...@cl.cam.ac.uk> wrote:
>
> >Terminal fonts usually add a dot or a slash to zeros to make them
> >distinguishable from Os.
>
> The slashes aren't a particularly good idea for us Norwegians and
> Danes - we rather like our Ø's (O slash for those without proper
> terminal software).

The slash through zeros on terminal fonts is usually clipped to
inside the circle, not like with the O WITH STROKE also outside.

There are two more Unicode characters that can be mixed up with the Ø:

DIAMETER SIGN (a circle with slash)
EMPTY SET (a zero with slash)