VSECS & SECS: Small Extended Character Sets

1 view
Skip to first unread message

Markus Kuhn

unread,
Aug 14, 1998, 3:00:00 AM8/14/98
to
Small European Character Sets
-----------------------------

I have recently spent quite some time working out a proposal for two
Unicode/ISO 10646 subsets that are so small that I hope they will become
widely implemented in Europe and America. Both are specifically designed
to be suitable for systems where characters are represented in
low-resolution fixed-width fonts. This includes for instance your xterm
and Emacs window under Unix (or more general VT100 emulators and source
code editors), but also applications such as portable LCD devices
(pager, mobile phones), where only a small subset of Unicode makes sense
to be implemented and where no single 8-bit set can cover a reasonable
number of languages. These subsets are not really intended for
applications such as the publishing industry, where these display
restrictions do not exist and larger Unicode subsets or even full
implementations might be adequate.

The two subsets are:

- Very Simple European Character Set (VSECS)
345 characters, basically the superset of Latin 1-4,9,10,15 and CP1251
plus a very few ISO 6397 characters

Rows Positions (Cells)
00 20-7E A0-FF
01 00-13 16-2B 2E-31 34-3E 41-48 4A-4D 50-7E 92
02 C6-C7 D8-DD
20 13-15 18-1A 1C-1E 20-22 26 30 39-3A AC
21 22 26 5B-5E 90-93
26 6A
FF FD

- Simple European Character Set (SECS)
683 characters, covers in addition to VSECS also Cyrillic, Greek,
MS-DOS blockgraphics, and a moderate set of mathematical characters
that is likely to be used in academic email and source code comments.

Rows Positions (Cells)
00 20-7E A0-FF
01 00-13 16-2B 2E-31 34-3E 41-48 4A-4D 50-7E 92
02 BC-BD C6-C7 D8-DD
03 84-86 88-8A 8C 8E-A1 A3-CE D1 D5-D6 F1
04 01-0C 0E-4F 51-5C 5E-5F 90-91
20 13-15 17-1A 1C-1E 20-22 26 30 32-34 39-3A 70 7F-83 A7 AC
21 02 15-16 1A 1D 22 24 26 5B-5E 90-95 A4-A7 D0-D5
22 00-09 0B-0C 12-13 18-1A 1D-1E 24-2A 3C 43 45 48-49 58 5F-62 64-65
22 6A-6B 82-8B 95 97 A4-A7 C2-C3 C5
23 00 08-0B 10 15 20-21 29-2A
25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 B2
25 BA BC C4 CB
26 10-12 3A-3C 40 42 6A-6B 6D-6F
27 13 17
FF FD

VSECS is somewhat similar to ISO 6937 with some bugs fixed (e.g., the
Euro symbol is included, as are the directed quotation marks).

SECS is somewhat similar to Microsoft/Adobe WGL4. I think SECS is much
better than WGL4, because WGL4 contains many letters for which I could
not find out where they are used (for at least three I am sure they
never existed). SECS contains the following 91 characters that are not
part of WGL4:

Rows Positions (Cells)
02 BC-BD
03 D1 D5-D6 F1
20 34 70 80-83
21 02 15 1A 1D 24 A4-A7 D0-D5
22 00-01 03-05 07-09 0B-0C 13 18 1D 24-28 2A 3C 43 45 49 58 5F 62
22 6A-6B 82-8B 95 97 A4-A7 C2-C3 C5
23 00 08-0B 15 29-2A
26 10-12 6D-6F
27 13 17
FF FD

Almost all of these are a set of basic mathematic characters that most
high school students should be familiar with. They are very useful to
have available in academic email discussions and source code comments.
It would be nice if the authors of WGL4 considered seriously to extend
their Unicode subset by those few dozen elementary math symbols. Then
SECS would become a subset of WGL4. VSECS is already a subset of WGL4
except for U+FFFD.

The mathematical symbols of SECS will hopefully provide for US
developers who do not specialize in i18n issues some motivation to get
interested in 16-bit character sets, as they are more relevant for their
personal use than the accented characters of crazy Europeans.

My dream is that something like SECS becomes rather soon the common
minimum repertoire in Unix X11 fonts and printer fonts. VSECS is
intended as an intermediate step for applications where the size of the
character set is critical and only Latin script support is required.

I do not think SECS contains any useless symbol. I know for each letter
and symbol why it is in there and in which languages or fields it is
used. Just ask.

Much more information on the two sets is available from

http://www.cl.cam.ac.uk/~mgk25/ucs/vsecs.html
http://www.cl.cam.ac.uk/~mgk25/ucs/secs.html

Much better than just looking at these web pages is to download the
database (Perl needed) that generated them from

http://www.cl.cam.ac.uk/~mgk25/ucs/secs.tar.gz

Then you can play around with them and test the subset properties with
regard to other sets easily yourself.

If you want to see example glyphs on the HTML output of this script,
then you'll also need

http://www.cl.cam.ac.uk/~mgk25/ucs/glyphs.zip

The uniset Perl script allows you to comfortably build up your own
database of character collections, to merge and subtract them and to
generate Unicode subsets and study their relations with other subsets.
The mapping files from the Unicode Consortium can be used directly as
input.

Please let me know what you think about SECS and VSECS and if this is
something you would like to see widely implemented.

Markus

--
Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK
email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/>

Tiro Typeworks

unread,
Aug 14, 1998, 3:00:00 AM8/14/98
to
On Fri, 14 Aug 1998 20:35:14 +0100, Markus Kuhn
<Marku...@cl.cam.ac.uk> wrote:

>SECS is somewhat similar to Microsoft/Adobe WGL4. I think SECS is much
>better than WGL4, because WGL4 contains many letters for which I could
>not find out where they are used (for at least three I am sure they
>never existed).

Markus, I'm very interested in your proposal, but would like to know
for which WGL4 letters you could find no use. I have spent a lot of
time researching European (and non-European) orthographies, and may be
able to account for some of the lesser known letters (which is not to
say that I think WGL4 is perfect).

John Hudson, Type Director

Tiro Typeworks
Vancouver, BC
ti...@tiro.com
www.tiro.com

Markus Kuhn

unread,
Aug 14, 1998, 3:00:00 AM8/14/98
to
Tiro Typeworks wrote:
> Markus, I'm very interested in your proposal, but would like to know
> for which WGL4 letters you could find no use. I have spent a lot of
> time researching European (and non-European) orthographies, and may be
> able to account for some of the lesser known letters (which is not to
> say that I think WGL4 is perfect).

I'm very interested in hearing more about what the rationale to have
the following characters in WGL4 might be:

I don't know where the following ones come from:

0114 # LATIN CAPITAL LETTER E WITH BREVE
0115 # LATIN SMALL LETTER E WITH BREVE
012C # LATIN CAPITAL LETTER I WITH BREVE
012D # LATIN SMALL LETTER I WITH BREVE
014E # LATIN CAPITAL LETTER O WITH BREVE
014F # LATIN SMALL LETTER O WITH BREVE
01FA # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
01FB # LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE
01FC # LATIN CAPITAL LETTER AE WITH ACUTE
01FD # LATIN SMALL LETTER AE WITH ACUTE
01FE # LATIN CAPITAL LETTER O WITH STROKE AND ACUTE
01FF # LATIN SMALL LETTER O WITH STROKE AND ACUTE
02C9 # MODIFIER LETTER MACRON
02D6 # MODIFIER LETTER PLUS SIGN
0387 # GREEK ANO TELEIA

The long s might be from German Fraktur fonts which is unused
since ~1945. This letter has certainly no equivalent in modern
German roman/antiqua fonts and is certainly not needed to
write German:

017F # LATIN SMALL LETTER LONG S

I understand that the following ones were added by mistake to
ISO 6937:

0132 # LATIN CAPITAL LIGATURE IJ
0133 # LATIN SMALL LIGATURE IJ
013F # LATIN CAPITAL LETTER L WITH MIDDLE DOT
0140 # LATIN SMALL LETTER L WITH MIDDLE DOT
0149 # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE

Usage of LIGATURE IJ is now deprecated in the Netherlands
and the other ones never existed in Catalan or Afrikaans
as originally assumed (source: NL gov manual by J.W. van
Wingen).

The following are claimed to be used in Welsh, but Welsh
native speakers who I asked claimed to have never seen them,
so I suspect they are historic characters that are not in
general use.

1E80 # LATIN CAPITAL LETTER W WITH GRAVE
1E81 # LATIN SMALL LETTER W WITH GRAVE
1E82 # LATIN CAPITAL LETTER W WITH ACUTE
1E83 # LATIN SMALL LETTER W WITH ACUTE
1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
1E85 # LATIN SMALL LETTER W WITH DIAERESIS
1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
1EF3 # LATIN SMALL LETTER Y WITH GRAVE

The purpose of the following characters is also
unclear to me:

201B # SINGLE HIGH-REVERSED-9 QUOTATION MARK
203C # DOUBLE EXCLAMATION MARK
203E # OVERLINE

All these are in WGL4 but (so far) not in SECS.

There are also some mysterious characters in the MES-2
proposal which I have not found anywhere else:

01B7 # LATIN CAPITAL LETTER EZH
01C4 # LATIN CAPITAL LETTER DZ WITH CARON
01C6 # LATIN SMALL LETTER DZ WITH CARON
01C7 # LATIN CAPITAL LETTER LJ
01C9 # LATIN SMALL LETTER LJ
01CA # LATIN CAPITAL LETTER NJ
01CC # LATIN SMALL LETTER NJ
01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON
01E4 # LATIN CAPITAL LETTER G WITH STROKE
01E5 # LATIN SMALL LETTER G WITH STROKE
01E6 # LATIN CAPITAL LETTER G WITH CARON
01E7 # LATIN SMALL LETTER G WITH CARON
01E8 # LATIN CAPITAL LETTER K WITH CARON
01E9 # LATIN SMALL LETTER K WITH CARON
01EE # LATIN CAPITAL LETTER EZH WITH CARON
01EF # LATIN SMALL LETTER EZH WITH CARON
01F1 # LATIN CAPITAL LETTER DZ
01F3 # LATIN SMALL LETTER DZ
01F4 # LATIN CAPITAL LETTER G WITH ACUTE
01F5 # LATIN SMALL LETTER G WITH ACUTE
027C # LATIN SMALL LETTER R WITH LONG LEG
0292 # LATIN SMALL LETTER EZH
0374 # GREEK NUMERAL SIGN
0375 # GREEK LOWER NUMERAL SIGN
037A # GREEK YPOGEGRAMMENI
037E # GREEK QUESTION MARK

Do you know a good reason why any of these characters should
go into a simple European character set?

Tiro Typeworks

unread,
Aug 14, 1998, 3:00:00 AM8/14/98
to
cc. comp.std.internat,comp.software.international,comp.fonts,
comp.text.tex
Marku...@cl.cam.ac.uk; cmak...@COMPUSERVE.COM


My browser finally finished downloading Markus' SECS website, and I
have prepared the following comments on some of the WGL4 characters he
has excluded from SECS. I believe that some of these characters should
be included in the SECS, in accordance with Markus' criteria, and have
marked my comments on these characters with an asterisk.

I have not bothered to comment on the heavy linedraw characters, etc.,
and have confined my comments to letters and diacritics.

[I am also concerned that Markus' recommended mathematical set may be
too extensive. Is this really a _basic_ mathematical subset, or
something more?]


0114 LATIN CAPITAL LETTER E WITH BREVE

0115 LATIN SMALL LETTER E WITH BREVE

012C LATIN CAPITAL LETTER I WITH BREVE

012D LATIN SMALL LETTER I WITH BREVE

These characters are not required for the modern writing of any
European language. They are essential to much European prosody, and
are found in most Latin language textbooks and dictionaries. I believe
it would be sound to omit them from SECS if the basic, non-combining
IPA characters are also to be omitted. If the latter are included it
would make sense to include short and long vowel diacritics.


0132 LATIN CAPITAL LIGATURE IJ

0133 LATIN SMALL LIGATURE IJ

These, of course, are the Dutch digraph characters. There is no need
for them to be separately encoded, as Dutch writers commonly type /I/
followed by /J/. These characters can, I believe, be safely omitted
from SECS.


013F LATIN CAPITAL LETTER L WITH MIDDLE DOT

0140 LATIN SMALL LETTER L WITH MIDDLE DOT

These are composite rendering forms for the Catalan lateral
approximant. They are not strictly necessary in a character set which
includes an appropriately sized, positioned and spaced midpoint
character (U+00B7). I am a little concerned that in a monospaced font,
of the kind referred to in Markus' SECS criteria, reliance on the
midpoint character will produce gaping holes in the middle of many
Catalan words. I am undecided about the possible inclusion of these
characters.


0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE

This is an Hewlett Packard character, apparently used by them for
Afrikaans. I've never heard a clear explanation of its purpose, or its
inclusion in WGL4 or other character sets (other than the fact that HP
wanted it to be included). In any case, Afrikaans is beyond the scope
of SECS, so this character may be safely omitted.


014E LATIN CAPITAL LETTER O WITH BREVE

014F LATIN SMALL LETTER O WITH BREVE

These characters are not required for the modern writing of any
European language. They are essential to much European prosody, and
are found in most Latin language textbooks and dictionaries. I believe
it would be sound to omit them from SECS if the basic, non-combining
IPA characters are also to be omitted. If the latter are included it
would make sense to include short and long vowel diacritics.


017F LATIN SMALL LETTER LONG S

Archaic. This may be safely omitted.


01A0 LATIN CAPITAL LETTER O WITH HORN

01A1 LATIN SMALL LETTER O WITH HORN

01AF LATIN CAPITAL LETTER U WITH HORN

01B0 LATIN SMALL LETTER U WITH HORN

Vietnamese. These characters may be safely omitted (although there are
sizeable Vietnamese speaking populations in parts of Europe, notably
in the Netherlands).


01FA LATIN CAPITAL LETTER A WITH RING ABOVE AND
ACUTE

01FB LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE

01FC LATIN CAPITAL LETTER AE WITH ACUTE

01FD LATIN SMALL LETTER AE WITH ACUTE

01FE LATIN CAPITAL LETTER O WITH STROKE AND ACUTE

01FF LATIN SMALL LETTER O WITH STROKE AND ACUTE

* These characters are used in Danish and their inclusion in both
Unicode and the WGL4 set was at the request of the Danish standards
organization. My understanding is that there is some debate over the
status of these characters in modern Danish. Some sources claim that
they are archaic, others that they are orthographically correct and
that to omit them is a mistake. I believe they should not be omitted
from SECS without further research.


02D6 MODIFIER LETTER PLUS SIGN

This may be safely omitted.


1E80 LATIN CAPITAL LETTER W WITH GRAVE

1E81 LATIN SMALL LETTER W WITH GRAVE

1E82 LATIN CAPITAL LETTER W WITH ACUTE

1E83 LATIN SMALL LETTER W WITH ACUTE

1E84 LATIN CAPITAL LETTER W WITH DIAERESIS

1E85 LATIN SMALL LETTER W WITH DIAERESIS

1EF2 LATIN CAPITAL LETTER Y WITH GRAVE

1EF3 LATIN SMALL LETTER Y WITH GRAVE

* All these characters are used in modern Welsh and should _not_ be
omitted from SECS. Their use is less common than the W and Y
circumflex diacritics, but all are essential to semantic distinction
and or pronunciation. My source for this information is Andrew Hawke
(a...@pophost.aber.ac.uk), assistant editor of the University of Wales
dictionary of the Welsh language. I can provide a Welsh word list if
required.

Tiro Typeworks

unread,
Aug 14, 1998, 3:00:00 AM8/14/98
to
On Fri, 14 Aug 1998 23:29:28 +0100, Markus Kuhn
<Marku...@cl.cam.ac.uk> wrote:

>There are also some mysterious characters in the MES-2
>proposal which I have not found anywhere else:

>01B7 # LATIN CAPITAL LETTER EZH

Ezh (or Yogh) is found in Old and Middle English texts, and is a
letter in the orthographies of a number of African languages. The only
modern European language I associate it with is Skolt Saami (see
below). The number of speakers/writers of Skolt Saami is probably well
below the 10,000 minimum set in Markus' criteria.


>01C4 # LATIN CAPITAL LETTER DZ WITH CARON
>01C6 # LATIN SMALL LETTER DZ WITH CARON
>01C7 # LATIN CAPITAL LETTER LJ
>01C9 # LATIN SMALL LETTER LJ
>01CA # LATIN CAPITAL LETTER NJ
>01CC # LATIN SMALL LETTER NJ

These are digraphs which were separately encoded in ISO/IEC 10646 and
Unicode to facilitate compatible font mappings between Latin and
Cyrillic fonts for Serbo-Croatian. Language reform policies in the
former Yugoslav republic -- particularly in Croatia -- have greatly
reduced the need for such compatability. I believe these digraph
characters may still be of use in Serbia, if transliteration to Latin
script is a requirement, but such specialised usage may fall beyond
the proposed scope of SECS.


>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON

>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON

Unicode 2.0 identifies these characters as Lappish. In the first
place, Lappish is generally considered a derogatory term; in the
second these characters do not appear in any of the Saami
orthographies I have collected. Note that I only have Latin
orthographies for five of the nine Saami languages.


>01E4 # LATIN CAPITAL LETTER G WITH STROKE
>01E5 # LATIN SMALL LETTER G WITH STROKE

These characters are used to write Skolt Saami in the Latin script
(Skolt Saami is also written by some using the Cyrillic script). The
number of speakers/writers of Skolt Saami is probably well below the
10,000 minimum set in Markus' criteria.


>01E6 # LATIN CAPITAL LETTER G WITH CARON
>01E7 # LATIN SMALL LETTER G WITH CARON

I can find no reference for these characters. Their use in Turkish is
incorrect and an unacceptable substitute for the G breve diacritics.
Unicode 2.0 indicates 'Lappish', but they do not occur in any of the
Saami orthographies I have on file.


>01E8 # LATIN CAPITAL LETTER K WITH CARON
>01E9 # LATIN SMALL LETTER K WITH CARON
>01EE # LATIN CAPITAL LETTER EZH WITH CARON
>01EF # LATIN SMALL LETTER EZH WITH CARON

These characters are used to write Skolt Saami in the Latin script
(Skolt Saami is also written by some using the Cyrillic script). The
number of speakers/writers of Skolt Saami is probably well below the
10,000 minimum set in Markus' criteria.


>01F1 # LATIN CAPITAL LETTER DZ
>01F3 # LATIN SMALL LETTER DZ
>01F4 # LATIN CAPITAL LETTER G WITH ACUTE
>01F5 # LATIN SMALL LETTER G WITH ACUTE

Your guess is as good as mine. I believe these can be safely omitted.


>027C # LATIN SMALL LETTER R WITH LONG LEG

I know of no usage of this character outside of phonetic transcription
(strident apico-alveolar trill). I'm not even sure that it remains
part of the official IPA standard set.


>0292 # LATIN SMALL LETTER EZH

See note above for uppercase Ezh/Yogh. Of course, if it is decided to
include a basic IPA subset, this character would become necessary.


>0374 # GREEK NUMERAL SIGN
>0375 # GREEK LOWER NUMERAL SIGN

I believe these to be archaic, and are only of use when Greek letters
are serving as numerals (as they did before the introduction of
'Arabic' numerals).


>037A # GREEK YPOGEGRAMMENI

This is the Greek subscript iota. It is not used in modern, monotonic
Greek, so may be safely omitted from SECS.


>037E # GREEK QUESTION MARK

I'm unable to confirm, at this time, whether this punctuation mark is
still in use or not. I suspect not, and most readers would be unlikely
to distinguish it from a semicolon.

Moocows hate spam.

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
On Fri, 14 Aug 1998 23:29:28 +0100, Markus Kuhn
<Marku...@cl.cam.ac.uk> mooed:

>Tiro Typeworks wrote:
>> Markus, I'm very interested in your proposal, but would like to know
>> for which WGL4 letters you could find no use. I have spent a lot of
>> time researching European (and non-European) orthographies, and may be
>> able to account for some of the lesser known letters (which is not to
>> say that I think WGL4 is perfect).
>
>I'm very interested in hearing more about what the rationale to have
>the following characters in WGL4 might be:
>
>I don't know where the following ones come from:

(snip)


>The following are claimed to be used in Welsh, but Welsh
>native speakers who I asked claimed to have never seen them,
>so I suspect they are historic characters that are not in
>general use.
>

>1E80 # LATIN CAPITAL LETTER W WITH GRAVE
>1E81 # LATIN SMALL LETTER W WITH GRAVE
>1E82 # LATIN CAPITAL LETTER W WITH ACUTE
>1E83 # LATIN SMALL LETTER W WITH ACUTE
>1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
>1E85 # LATIN SMALL LETTER W WITH DIAERESIS
>1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
>1EF3 # LATIN SMALL LETTER Y WITH GRAVE

Actually, if one is doing a Welsh *pronunciation* guide these could be
potentially useful; I've also seen at least "w" and "y" acute in past.
(The others, I'll admit, *are* weird--I'm not entirely sure where they'd
be used, save *maybe* in other languages in the same subfamily of Celtic
languages Cymru/Welsh is in [for example, Breton or Manx]. I'm rather
afraid I don't speak any Celtic tongue so I can't be for sure on this;
if memory serves, there is a Manx dictionary online, though. IF my
memory of that serves at ALL well, Manx doesn't use "w" as a vowel but
*does* use "y"; I know exactly nothing on Breton.)

*POSSIBLY* w-diaresis and y-diaresis occur in *some* transcription schemes
for Native American languages (if they occur in this, it'd likely be for
Northwest languages that have vowels and consonants that literally cannot
be expressed in any other way without resorting to the International
Phonetic Alphabet).

Y-dieresis and y-dieresis do occur in the standard character sets of most
English-language Postscript and Truetype fonts.

Offhand, as an aside--I expect some of the other oddish characters
(AE-grave, etc.) are also used mostly in pronunciation guides as well.

A-dieresis-grave, etc. *may* be used in Vietnamese, but I'm not sure.

>The purpose of the following characters is also
>unclear to me:
>
>201B # SINGLE HIGH-REVERSED-9 QUOTATION MARK
>203C # DOUBLE EXCLAMATION MARK
>203E # OVERLINE
>
>All these are in WGL4 but (so far) not in SECS.

Double-exclamation sounds more like a "typesetting character"; so does
"high reversed-9 quot mark" (maybe this is equivalent to leftquot?)

>There are also some mysterious characters in the MES-2
>proposal which I have not found anywhere else:
>
>01B7 # LATIN CAPITAL LETTER EZH

>01C4 # LATIN CAPITAL LETTER DZ WITH CARON
>01C6 # LATIN SMALL LETTER DZ WITH CARON
>01C7 # LATIN CAPITAL LETTER LJ
>01C9 # LATIN SMALL LETTER LJ

EZH I'm not sure on, but it *may* be used in some Turkic languages; DZ and
its variants, and LJ and its variants, occur in some Slavic languages and
also possibly in some Turkic languages (mostly those spoken in countries
that split off from the old USSR and are going back to Romanised
chracters).

(In Cyrillic, separate letters *do* exist for each of these in regional
variants that were used before the USSR split up. This is probably why
they carry over.)

LJ/lj is roughly equivalent to slash-l in Polish, BTW.

>01CA # LATIN CAPITAL LETTER NJ
>01CC # LATIN SMALL LETTER NJ

Used in some Slavic languages, and occasionally in various African
languages. (In Slavic languages, indicates a palatalised-N (similar to
n-acute in some Slavic languages; the "j" essentially means the same as
the "soft mark" in Cyrillic); in the African languages where this is an
actual character, indicates exactly what it says--an "nj" sound (like "ng"
only one doesn't touch one's palate). :)

>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON

If memory serves, used in Vietnamese (in this case, the macron is a tone
character) and in some transcription schemes for Native American
languages.

>01E4 # LATIN CAPITAL LETTER G WITH STROKE
>01E5 # LATIN SMALL LETTER G WITH STROKE

I've only seen this offhand in *some* transcription schemes for Native
American languages [this indicates *roughly* the same as g-caron; see
below] but it may occur in Turkic languages that are converting to Roman
characters.

>01E6 # LATIN CAPITAL LETTER G WITH CARON
>01E7 # LATIN SMALL LETTER G WITH CARON

Commonly used in Turkish and some other Turkic languages to indicate a
"hard G" sound. Also occurs, for the same sound, in some Native American
language transcription schemes.

>01E8 # LATIN CAPITAL LETTER K WITH CARON
>01E9 # LATIN SMALL LETTER K WITH CARON

Less common, but does occur in some Turkic languages; indicates a "hard K"
sound (like hard G--you say it in the back of your throat). Occurs in
some transcription schemes for Native American languages as well.

(As a minor aside--you will find many, MANY standards for transcription
and, in some cases, transliteration of Native American languages. These
vary from fitting the closest Roman equivalent, to using diacritical marks
for consonants that are "sort" of close [many languages have literally two
to four different ways you can pronounce a consonant sound where we might
have one in English, for example] to using unused characters to represent
sounds ["x" for "sh" and "c" for "soft ch" are rather common] to resorting
to the IPA when there's no really good way to represent it via Roman
characters. Hence my notes on this. :)

>01EE # LATIN CAPITAL LETTER EZH WITH CARON
>01EF # LATIN SMALL LETTER EZH WITH CARON

Possibly used in some Slavic languages and Slavic transliteration schemes.
Possibly occurs in Turkic languages. (Again, an "ezh-caron" equivalent
does occur in several local variants of Cyrillic used for "minority
languages" in the USSR.)

>01F1 # LATIN CAPITAL LETTER DZ
>01F3 # LATIN SMALL LETTER DZ

Commonly used in Slavic and Turkic languages; occurs in some Native
American languages as well (most notably the Na-Dene family, which
includes Dine' [Navaho]).

>01F4 # LATIN CAPITAL LETTER G WITH ACUTE
>01F5 # LATIN SMALL LETTER G WITH ACUTE

Fairly unusual, but does occur in some Native American and Slavic (and
possibly Turkic as well, depending on the country's Romanisation scheme)
languages. Usually indicates a palatalised g sound in the few places
where I've seen it.

>027C # LATIN SMALL LETTER R WITH LONG LEG

Fairly unusual; used in some Native American languages as an R-variant.
This is borrowed from the IPA, offhand. This also, occasionally, occurs
in transcription schemes for some African languages.

Some Turkic languages may use it; not sure (at least I've not *seen* any)
however.

>Do you know a good reason why any of these characters should
>go into a simple European character set?

Some of them I'm sort of puzzled on m'self. Some (like Y-dieresis and
Y-acute-dieresis, for example) I can see as they are used in languages
with a known, large audience on Usenet (for instance, Vietnamese-language
or Cymru-language newsgroups).

Some of them, I will frankly admit (namely, *all* the Greek characters
noted and, possibly, some of the other *unusual* letters like longleg-r
and k-macron, etc.) puzzle me why they're included. (As far as I know,
longleg-r only exists in a few Native American transcription schemes and
in some African-language transcription schemes; unless there is a large
Usenet population of folks wishing to type in Salish, I'm not sure why it
should be there. [If it is in there, we should go ahead and add upside-
down K/k, upside-down T/t, cedilla-H, Latin-omega-acute-dieresis,
Latin-chi, etc. and all the other IPA characters you *have* to import from
the IPA to write some of the languages of that area. :) And, of course,
import Latin capital-schwa and Latin small-schwa for our friends in
Azerbaijan; hell, let's just import the entire IPA and be done with it :)

Ah well...I'm sure the author will be glad to explain, in any case. :)

-moo
who, incidentially, still wants to know when the author will write the
terminal patch that will allow a VT100 terminal hooked up to an IBM 3090
mainframe to actually *read* these strange and ferlie characters, or pay
for the unis still using these beasts for student Internet access to
upgrade to nice happy spanking new DEC Alphas and upgrade everyone's
computer to a Pentium whilst they're at it :)


Thomas Chan

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
On 15 Aug 1998 05:09:31 GMT, Moocows hate spam.

<pmba...@spamtrampling-moocow.slug.louisville.edu> wrote:
>A-dieresis-grave, etc. *may* be used in Vietnamese, but I'm not sure.
>>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
>>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON
>
>If memory serves, used in Vietnamese (in this case, the macron is a tone
>character) and in some transcription schemes for Native American
>languages.

No diaeresis's and macrons in Vietnamese.

One needs:

one of <a>, <i>, <u>, <e>, <o>, <y>

with possibility of a circumflex on <a>,
or a horn on <o>,
or a horn on <u>,
or a circumflex on <o>,
or a circumflex on <e>

plus nothing,
or acute accent,
or grave accent,
or "curl", (sorry, do not know technical name for this)
or tilde,
or dot underneath (is there a technical name for this?)

(Not all of the above combinations will exist.)

(Optionally, a <2> or <z>-like hybrid of "curl" and tilde
may occur in the handwriting of southern Vietnamese
speakers who do not distinguish the two tones
marked by those diacritics.)


Thomas Chan
tc...@cornell.edu

Christoph Nahr

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
On Fri, 14 Aug 1998 23:29:28 +0100, Markus Kuhn
<Marku...@cl.cam.ac.uk> wrote:

>The long s might be from German Fraktur fonts which is unused
>since ~1945. This letter has certainly no equivalent in modern
>German roman/antiqua fonts and is certainly not needed to
>write German:
>

>017F # LATIN SMALL LETTER LONG S

While I agree that this letter is not needed in a basic European
character set your reasoning is quite wrong.

The long s was actually used in *both* Fraktur and Antiqua (i.e.
non-Fraktur) typefaces for centuries, and is completely unrelated to
any "Germanness". You should see lots of long s in any older English
(French, Italian, ...) book. The only difference is that Antiqua (or
"Latin") typefaces eventually dropped the long s while Fraktur
typefaces kept it to this day.

As for Fraktur going out of fashion in Germany by 1945... well, the
connection between Nazis and Fraktur is a common misconception.
Actually, the Nazi government *discouraged* use of Fraktur in 1940
because Hitler thought it outdated and contrary to his plans to
"modernise" Germany according to Nazi ideology.

As for Fraktur being "unused" today... several new Fraktur typefaces
have been designed during the past few decades by German designers.
If you go to any newspaper stand you'll see plenty of Fraktur
headlines on newspapers of any nationality. Station and street signs
are also frequently set in Fraktur. But I agree that Fraktur
typefaces are only being used as decorative fonts these days, not as
text fonts which is the important criterium for this discussion.
--
Chris Nahr (cn...@hal9000.net, replace hal9000 with ibm to e-mail me)
Please don't e-mail me if you post! PGP key at wwwkeys.ch.pgp.net

William Ehrich

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
If we can afford to include just one letter for historical / sentimental
reasons I would like that to be:

> 017F # LATIN SMALL LETTER LONG S

It is useful for quoting most old English and German literature.

-- William Ehrich


Thomas Chan

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
On 15 Aug 1998 06:59:51 GMT, Thomas Chan <tc...@cornell.edu> wrote:
>On 15 Aug 1998 05:09:31 GMT, Moocows hate spam.
><pmba...@spamtrampling-moocow.slug.louisville.edu> wrote:
>>A-dieresis-grave, etc. *may* be used in Vietnamese, but I'm not sure.
>>>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
>>>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON
>>
>>If memory serves, used in Vietnamese (in this case, the macron is a tone
>>character) and in some transcription schemes for Native American
>>languages.
>
>No diaeresis's and macrons in Vietnamese.
>
>One needs:
>
>one of <a>, <i>, <u>, <e>, <o>, <y>
>
>with possibility of a circumflex on <a>,
>or a horn on <o>,
>or a horn on <u>,
>or a circumflex on <o>,
>or a circumflex on <e>

Correction to myself: There's also the possibility of
a breve on <a>.

Paul L. Allen

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
In article <35d594f8...@news3.newscene.com>
cn...@hal9000.net (Christoph Nahr) writes:

> As for Fraktur going out of fashion in Germany by 1945... well, the
> connection between Nazis and Fraktur is a common misconception.
> Actually, the Nazi government *discouraged* use of Fraktur in 1940
> because Hitler thought it outdated and contrary to his plans to
> "modernise" Germany according to Nazi ideology.

I don't know if he ever stated that. However, on the 23rd of January 1941,
an official order of the German Nazist Party (Anordnung 2/41; Ordnungsziffer
111) abolished Fraktur and Schwabacher from all printed items, saying:

...It is ordered that from now on only the normal type is to be used
for all printed documents. As normal type, the antiqua type is meant.
The so-called gothic type (Fraktur) is not a german type but goes back
to the schwabacher jew-letters. This type has been strongly used in
Germany because Jews owned the printing works already since typography
was introduced, and later on the newspapers...

I don't know who did the translation (possibly Yannis Haralambous) but
it was accompanied by a photostat of the order. See TUGboat vol. 12 no.
1, March 1991.

--Paul

Erland Sommarskog

unread,
Aug 15, 1998, 3:00:00 AM8/15/98
to
ti...@tiro.com (Tiro Typeworks) skriver:

>I am a little concerned that in a monospaced font, of the kind referred to in
>Markus' SECS criteria, reliance on the midpoint character will produce gaping
>holes in the middle of many Catalan words. I am undecided about the possible
>inclusion of these characters.

If you go with the Barcelona you will find that there is a station
which appears to be named Paral-lel, so thick is the middle dot,
and this is not the only instance, I've seen.

>* These characters are used in Danish and their inclusion in both
>Unicode and the WGL4 set was at the request of the Danish standards
>organization. My understanding is that there is some debate over the
>status of these characters in modern Danish. Some sources claim that
>they are archaic, others that they are orthographically correct and
>that to omit them is a mistake. I believe they should not be omitted
>from SECS without further research.

Of course, as I'm coming from a neighbour country, I cannot be taken
for an authority, but so much I can say that I have never seen them.

Then again, most Russian I have read had accented vowels, and there
appears to be no accented Cyrillic letters in Markus's set. But as
you might have guessed, I never came much further than my beginner's
textbook...


--
Erland Sommarskog, Stockholm, som...@algonet.se
This could have been my two cents worth, but alas the Swedish
government has decdided that I am not to have any cents.

Markus Kuhn

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
Tiro Typeworks wrote:
[<http://www.cl.cam.ac.uk/~mgk25/ucs/secs.html>]

> My browser finally finished downloading Markus' SECS website, and I
> have prepared the following comments on some of the WGL4 characters he
> has excluded from SECS. I believe that some of these characters should
> be included in the SECS, in accordance with Markus' criteria, and have
> marked my comments on these characters with an asterisk.

Thanks for your comments, they have been very useful.

> [I am also concerned that Markus' recommended mathematical set may be
> too extensive. Is this really a _basic_ mathematical subset, or
> something more?]

There is an international standard ISO 31-11 that defines large parts
of the mathematical notation that is commonly used all over the world.
Most of the character that I have included are from ISO 31-11. I
tried to cover this standard entirely as far as this is possible
in a fixed-width font.

The actual list of math characters that I have included is appended
below. It contains a few remarks about why I think this character
should be covered. Comments welcome.

It is a quite comprehensive set of symbols, so I certainly would not
argue that the math collection should become any larger. I admit that
there are might be a few less common symbols in it that are mostly
of concern to computer scientists, but after all, these are computer
character sets and I can well imagine that most of these symbols
will be used in source code comments etc.

MATH

Markus Kuhn

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
The Windows standard character set CP1252 extends ISO 8859-1
by the following 27 characters:

0x80 0x20AC #EURO SIGN
0x81 #UNDEFINED
0x82 0x201A #SINGLE LOW-9 QUOTATION MARK
0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
0x84 0x201E #DOUBLE LOW-9 QUOTATION MARK
0x85 0x2026 #HORIZONTAL ELLIPSIS
0x86 0x2020 #DAGGER
0x87 0x2021 #DOUBLE DAGGER
0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT
0x89 0x2030 #PER MILLE SIGN
0x8A 0x0160 #LATIN CAPITAL LETTER S WITH CARON
0x8B 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C 0x0152 #LATIN CAPITAL LIGATURE OE
0x8D #UNDEFINED
0x8E 0x017D #LATIN CAPITAL LETTER Z WITH CARON
0x8F #UNDEFINED
0x90 #UNDEFINED
0x91 0x2018 #LEFT SINGLE QUOTATION MARK
0x92 0x2019 #RIGHT SINGLE QUOTATION MARK
0x93 0x201C #LEFT DOUBLE QUOTATION MARK
0x94 0x201D #RIGHT DOUBLE QUOTATION MARK
0x95 0x2022 #BULLET
0x96 0x2013 #EN DASH
0x97 0x2014 #EM DASH
0x98 0x02DC #SMALL TILDE
0x99 0x2122 #TRADE MARK SIGN
0x9A 0x0161 #LATIN SMALL LETTER S WITH CARON
0x9B 0x203A #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C 0x0153 #LATIN SMALL LIGATURE OE
0x9D #UNDEFINED
0x9E 0x017E #LATIN SMALL LETTER Z WITH CARON
0x9F 0x0178 #LATIN CAPITAL LETTER Y WITH DIAERESIS

Most of them make perfectly sense and are useful extentions, however
I have no idea what the purpose of the following three is:

0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT
0x98 0x02DC #SMALL TILDE

Any ideas?

Dik T. Winter

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
In article <35D84110...@cl.cam.ac.uk> Markus Kuhn <Marku...@cl.cam.ac.uk> writes:
> 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK

Probably the Dutch Gulden symbol.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Paul L. Allen

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
In article <ExuC1...@cwi.nl>

d...@cwi.nl (Dik T. Winter) writes:

> In article <35D84110...@cl.cam.ac.uk> Markus Kuhn <Marku...@cl.cam.ac.uk> writes:
> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
>
> Probably the Dutch Gulden symbol.

Certainly the Dutch Guilder/Gulden/Florin symbol.

--Paul

Markus Kuhn

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
Dik T. Winter wrote:
> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
>
> Probably the Dutch Gulden symbol.

Shall we still include it in VSECS (same for the Peseta sign from
CP437)? If we include Peseta and Gulden, then we would also
have to include the Franc and Lira symbols. All these currency
symbols are expected to be superseded by the Euro symbol from
mid-2002 on and would only be of historical value.

Paul L. Allen

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
In article <35D87587...@cl.cam.ac.uk>
Markus Kuhn <Marku...@cl.cam.ac.uk> writes:

> Dik T. Winter wrote:
> > > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> >
> > Probably the Dutch Gulden symbol.
>
> Shall we still include it in VSECS (same for the Peseta sign from
> CP437)? If we include Peseta and Gulden, then we would also
> have to include the Franc and Lira symbols.

What puzzles me is why Unicode added a Lira symbol. The lira symbol
is essentially identical to the pound symbol. A couple of Unicode
fonts I've seen give the pound one cross-stroke and the lira two
cross-strokes, but that's a matter of aesthetics and font design.
Even if there is some inherent national preference one way or the other
between UK and Italian typography, typesetters in each country will tend to
use the same glyph for both symbols (or, more usually, use the symbol
for their national currency and a letter for the other one to avoid
confusion).

--Paul

Tiro Typeworks

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
On 15 Aug 1998 05:09:31 GMT,
pmba...@spamtrampling-moocow.slug.louisville.edu (Moocows hate spam.)
wrote:

>>The following are claimed to be used in Welsh, but Welsh
>>native speakers who I asked claimed to have never seen them,
>>so I suspect they are historic characters that are not in
>>general use.

>>1E80 # LATIN CAPITAL LETTER W WITH GRAVE
>>1E81 # LATIN SMALL LETTER W WITH GRAVE
>>1E82 # LATIN CAPITAL LETTER W WITH ACUTE
>>1E83 # LATIN SMALL LETTER W WITH ACUTE
>>1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
>>1E85 # LATIN SMALL LETTER W WITH DIAERESIS
>>1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
>>1EF3 # LATIN SMALL LETTER Y WITH GRAVE

For the record, as provided to me by Andrew Hawke, assistant editor of
the University of Wales Dictionary of the Welsh Language:

Modern usage of the diacritics in Welsh is as follows:

The circumflex is used solely to indicate that a vowel is long
in a context in which it would normally be expected to be
short, e.g.:

gwa^n (he pierces) vs. gwan (weak)
gwe^n (a smile) vs. gwen (white (fem.))
pi^n (pine (wood, tree)) vs. pi`n (a pin)
co^r (a choir) vs. cor (a dwarf)
bu^m (I was (perfect)) vs. bum (five (mutated))
tw^r (a tower) vs. twr (a group)
y^m (we are) vs. ym (in (before m))

The diaeresis is used to separate vowels, as in English:

prosa"ig (prosaic)
cre"wr (creator)
copi"o (to copy)
tro"edigaeth (conversion)
du"wch (blackness)
Rebacay"ddiaeth (lit. Rebaccaism)
cyw"res (concubine)

The acute accent is used to indicate unexpected stress (i.e.
not on the penultimate):

casa'u (to hate)
case't (cassette)
ricri'wt (a recruit)
paraso'l (a parasol)
rebu'wc (a rebuke)
caridy'ms (riff-raff)
gw'raidd (manly)
[this last is on the penult, but is to distinguish it
from the word gwraidd (root)which is monosyllabic]

The grave accent is used to indicate that a vowel is short in
a context in which it would normally be expected to be long:

pa`s (a pass, permit) vs. pas (a cough)
sie`d (a shed) vs. sie^d/sied (escheat)
sgi`l (a skill) vs. sgi^l/sgil (following)
no`d (a nod) vs. nod (a target, an aim)
cu`l (a hut) vs. cul (narrow)
mw`g (a mug) vs. mwg (smoke (n.))
py`g (dirty) vs. pyg (pitch, tar)

Generally speaking, diacritics in Welsh cannot reasonably be
omitted as they are used either to show unusual stress, or to
differentiate between pairs of otherwise identical words with
different pronounciations. As such they are equally necessary
in upper- and lower-case forms.

The commonest diacritic is the circumflex, followed by the
acute and diaeresis probably about equally. The grave is rare,
but as more and more words are borrowed from English, and new

compounds coined for technical terms, their use will
undoubtedly increase.

To give a very rough indication, according to the headwords in
our (unfinished) dictionary (which we estimate will contain
about about 84,500 entries), the number of accented keywords
(extrapolated to the expected finished size of the dictionary)

will be roughly:

circumflex: 2,000
diaeresis: 880
acute: 500
grave: 160


Clearly it would be a mistake to omit these diacritics from any
character set intended to support the Welsh language.

Chris Maden

unread,
Aug 17, 1998, 3:00:00 AM8/17/98
to
Markus Kuhn <Marku...@cl.cam.ac.uk> writes:

> The Windows standard character set CP1252 extends ISO 8859-1
> by the following 27 characters:
>

[...]


>
> Most of them make perfectly sense and are useful extentions, however
> I have no idea what the purpose of the following three is:
>

> 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK

This is the guilder sign. Unicode, for whatever reason, doesn't
include an actual guilder/florin sign, but the small f with hook looks
right. This mapping is an approximation. Both the Windows and
Macintosh character sets include the character, so its omission from
Unicode was a surprise to me.

> 0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT
> 0x98 0x02DC #SMALL TILDE

These are to distinguish between the character and the accent. The
circumflex (shift-6 on most US keyboards) is now used for the literal
character (for TeX superscript, regexp inversion...), and so a
distinct character is needed for the diacritic. Similarly, the tilde
is now used for home directories or approximation; a smaller tilde is
needed for using as a diacritic.

-Chris
--
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>

H. Peter Anvin

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
Followup to: <35D87587...@cl.cam.ac.uk>
By author: Markus Kuhn <Marku...@cl.cam.ac.uk>
In newsgroup: comp.std.internat

>
> Dik T. Winter wrote:
> > > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> >
> > Probably the Dutch Gulden symbol.
>
> Shall we still include it in VSECS (same for the Peseta sign from
> CP437)? If we include Peseta and Gulden, then we would also
> have to include the Franc and Lira symbols. All these currency
> symbols are expected to be superseded by the Euro symbol from
> mid-2002 on and would only be of historical value.
>

Include them. It is going to be much more painful to omit them,
IMNSHO. However, my understanding is that the Franc symbol isn't in
common use; in fact, I've had French people tell me "what Franc
symbol", pretty much what I'd tell anyone who'd ask me what the symbol
for a Swedish Crown is.

-hpa
--
PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD 1E DF FE 69 EE 35 BD 74
See http://www.zytor.com/~hpa/ for web page and full PGP public key
I am Bahá'í -- ask me about it or see http://www.bahai.org/
"To love another person is to see the face of God." -- Les Misérables

H. Peter Anvin

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
Followup to: <evale...@sktb.demon.co.uk>
By author: p...@sktb.demon.co.uk
In newsgroup: comp.std.internat

>
> What puzzles me is why Unicode added a Lira symbol. The lira symbol
> is essentially identical to the pound symbol. A couple of Unicode
> fonts I've seen give the pound one cross-stroke and the lira two
> cross-strokes, but that's a matter of aesthetics and font design.
> Even if there is some inherent national preference one way or the other
> between UK and Italian typography, typesetters in each country will tend to
> use the same glyph for both symbols (or, more usually, use the symbol
> for their national currency and a letter for the other one to avoid
> confusion).
>

*BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
exactly one cross-stroke, the Italian Lira symbol has two. You
*never* see the other way around, and they are not interchangable. It
is not like the one or two strokes on the dollar sign!

Stephen Baynes

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
H. Peter Anvin wrote:
>
> Followup to: <35D87587...@cl.cam.ac.uk>
> By author: Markus Kuhn <Marku...@cl.cam.ac.uk>
> In newsgroup: comp.std.internat
> >
> > Dik T. Winter wrote:
> > > > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> > >
> > > Probably the Dutch Gulden symbol.
> >
> > Shall we still include it in VSECS (same for the Peseta sign from
> > CP437)? If we include Peseta and Gulden, then we would also
> > have to include the Franc and Lira symbols. All these currency
> > symbols are expected to be superseded by the Euro symbol from
> > mid-2002 on and would only be of historical value.
> >
>
> Include them. It is going to be much more painful to omit them,
> IMNSHO. However, my understanding is that the Franc symbol isn't in
> common use; in fact, I've had French people tell me "what Franc
> symbol", pretty much what I'd tell anyone who'd ask me what the symbol
> for a Swedish Crown is.
>

Can anyone tell me if the Turkish Lira symbol (a sort of TL monogram similar
to the TM (trademark) symbol)
which exists in the Teletext character set but not in Unicode is another
currency symbol that is not used in practice (or just an invention of the
Teletext standards authority)?

--
Stephen Baynes CEng MBCS Stephen...@soton.sc.philips.com
Philips Semiconductors Ltd
Southampton SO15 0DJ +44 (01703) 316431
United Kingdom My views are my own.
Do you use ISO8859-1? Yes if you see © as copyright, ÷ as division and ½ as 1/2.

The Graphical Gnome

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
In article <35d4ae45....@news.portal.ca>, ti...@tiro.com (Tiro Typeworks) wrote:
>
>0132 LATIN CAPITAL LIGATURE IJ
>
>0133 LATIN SMALL LIGATURE IJ
>
>These, of course, are the Dutch digraph characters. There is no need
>for them to be separately encoded, as Dutch writers commonly type /I/
>followed by /J/. These characters can, I believe, be safely omitted
>from SECS.
>
You can also write oe, ue and ae. Does this mean that the o-umlaut, u-umlaut
and a-umlaut should be removed?? The same applies for the German Sharp s tou
can write it as ss.

We write IJ and ij because the glyph is not available. If you look at TeX you
see that it is added because we (the Dutch) wanted and needed it.

<smily on>
We do not have much of a culture, so don't take away the little we have left.
<smily off>

The Graphical Gnome (r...@ktibv.nl)
Sr. Software Engineer IT Department
-----------------------------------------
The Unofficial Delphi Developers FAQ
http://www.gnomehome.demon.nl/uddf/index.htm

The Graphical Gnome

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
In article <35D4BA48...@cl.cam.ac.uk>, Markus Kuhn <Marku...@cl.cam.ac.uk> wrote:
>0132 # LATIN CAPITAL LIGATURE IJ
>0133 # LATIN SMALL LIGATURE IJ
>
>Usage of LIGATURE IJ is now deprecated in the Netherlands
>and the other ones never existed in Catalan or Afrikaans
>as originally assumed (source: NL gov manual by J.W. van
>Wingen).
Say What!!!!!!.

Because of the fact that most old typewriting systems could not cope with this
glyph does not mean it is deprecated in the Netherlands. It's a Dutch glyph,
and we are mighty proud of it!.

Stewart C. Russell

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
h...@transmeta.com (H. Peter Anvin) wrote:
>*BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
>exactly one cross-stroke, the Italian Lira symbol has two. You
>*never* see the other way around, and they are not interchangable.

At school, I was taught to write a pound sign with two strokes
(Scotland, mid-70's). I don't write it that way now, for with age
comes laziness.

Peter, there are ways of disagreeing with people that are not so
inflammatory. Always think it possible that you might be mistaken.

--
Stewart C. Russell, Glasgow, Scotland - scr...@enterprise.net
"Hang on... This is the real thing... The truth, my friend,
and nothing but the truth" - Mervyn Peake
http://homepages.enterprise.net/scruss/

Paul L. Allen

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
In article <6rb916$fuo$1...@palladium.transmeta.com>

h...@transmeta.com (H. Peter Anvin) writes:

> Followup to: <evale...@sktb.demon.co.uk>
> By author: p...@sktb.demon.co.uk
> In newsgroup: comp.std.internat
> >
> > What puzzles me is why Unicode added a Lira symbol. The lira symbol
> > is essentially identical to the pound symbol. A couple of Unicode
> > fonts I've seen give the pound one cross-stroke and the lira two
> > cross-strokes, but that's a matter of aesthetics and font design.
> > Even if there is some inherent national preference one way or the other
> > between UK and Italian typography, typesetters in each country will tend to
> > use the same glyph for both symbols (or, more usually, use the symbol
> > for their national currency and a letter for the other one to avoid
> > confusion).
> >
>

> *BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
> exactly one cross-stroke, the Italian Lira symbol has two.

*BZZZZZZT*. Totally wrong answer! I'm in the UK and have been for
the 40-odd years of my life. My father was a printer, as was my brother,
grandfather, three uncles and various cousins (in case you're interested, my
father was a laserjet). I'm old enough to remember when the two
cross-stroke form was the norm in the UK. In fact I'm old enough that I was
*taught* that the two-stroke form should be used.

> You *never* see the other way around,

I admit that the one-stroke form predominates in the UK these days. But
that is a matter of typographic style, not an absolute rule. Either is
acceptable.

> and they are not interchangable.

<panto>Oh yes they are</panto>.

Take a look at Whittaker's Almanac in the foreign currency section. It
uses the one-stroke form for pound, punt and lira.

> It is not like the one or two strokes on the dollar sign!

Ah, but it is. There may be national preferences involved and these may
change over time, but one- or two-cross stroke forms are entirely
interchangeable in the UK. Dunno about Italy.

--Paul

Tiro Typeworks

unread,
Aug 18, 1998, 3:00:00 AM8/18/98
to
On Tue, 18 Aug 1998 09:52:08 GMT, r...@ktibv.nl (The Graphical Gnome)
wrote:

>You can also write oe, ue and ae. Does this mean that the o-umlaut, u-umlaut
>and a-umlaut should be removed?? The same applies for the German Sharp s tou
>can write it as ss.

These examples are hardly parallel. Apart from spacing considerations,
the IJ glyph is identical in appearence to an I followed by a J.
Obviously the same cannot be said of the o-umlaut which, in any case,
is also required as an o-diaeresis for non Germanic languages. The
German eszett cannot, in standard German, be replaced by /ss/, as
there exist words which are semantically distinguished by the use of
/ss/ or eszett.

That said, I'm perfectly happy to endorse inclusion of the IJ and ij
digraphs as characters in any font I make for Dutch clients, if they
want them. Most Dutch type designers I know (and I know a _lot_) seem
quite ambivalent about this digraph.

John Hudson