I have recently spent quite some time working out a proposal for two
Unicode/ISO 10646 subsets that are so small that I hope they will become
widely implemented in Europe and America. Both are specifically designed
to be suitable for systems where characters are represented in
low-resolution fixed-width fonts. This includes for instance your xterm
and Emacs window under Unix (or more general VT100 emulators and source
code editors), but also applications such as portable LCD devices
(pager, mobile phones), where only a small subset of Unicode makes sense
to be implemented and where no single 8-bit set can cover a reasonable
number of languages. These subsets are not really intended for
applications such as the publishing industry, where these display
restrictions do not exist and larger Unicode subsets or even full
implementations might be adequate.
The two subsets are:
- Very Simple European Character Set (VSECS)
345 characters, basically the superset of Latin 1-4,9,10,15 and CP1251
plus a very few ISO 6397 characters
Rows Positions (Cells)
00 20-7E A0-FF
01 00-13 16-2B 2E-31 34-3E 41-48 4A-4D 50-7E 92
02 C6-C7 D8-DD
20 13-15 18-1A 1C-1E 20-22 26 30 39-3A AC
21 22 26 5B-5E 90-93
26 6A
FF FD
- Simple European Character Set (SECS)
683 characters, covers in addition to VSECS also Cyrillic, Greek,
MS-DOS blockgraphics, and a moderate set of mathematical characters
that is likely to be used in academic email and source code comments.
Rows Positions (Cells)
00 20-7E A0-FF
01 00-13 16-2B 2E-31 34-3E 41-48 4A-4D 50-7E 92
02 BC-BD C6-C7 D8-DD
03 84-86 88-8A 8C 8E-A1 A3-CE D1 D5-D6 F1
04 01-0C 0E-4F 51-5C 5E-5F 90-91
20 13-15 17-1A 1C-1E 20-22 26 30 32-34 39-3A 70 7F-83 A7 AC
21 02 15-16 1A 1D 22 24 26 5B-5E 90-95 A4-A7 D0-D5
22 00-09 0B-0C 12-13 18-1A 1D-1E 24-2A 3C 43 45 48-49 58 5F-62 64-65
22 6A-6B 82-8B 95 97 A4-A7 C2-C3 C5
23 00 08-0B 10 15 20-21 29-2A
25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 B2
25 BA BC C4 CB
26 10-12 3A-3C 40 42 6A-6B 6D-6F
27 13 17
FF FD
VSECS is somewhat similar to ISO 6937 with some bugs fixed (e.g., the
Euro symbol is included, as are the directed quotation marks).
SECS is somewhat similar to Microsoft/Adobe WGL4. I think SECS is much
better than WGL4, because WGL4 contains many letters for which I could
not find out where they are used (for at least three I am sure they
never existed). SECS contains the following 91 characters that are not
part of WGL4:
Rows Positions (Cells)
02 BC-BD
03 D1 D5-D6 F1
20 34 70 80-83
21 02 15 1A 1D 24 A4-A7 D0-D5
22 00-01 03-05 07-09 0B-0C 13 18 1D 24-28 2A 3C 43 45 49 58 5F 62
22 6A-6B 82-8B 95 97 A4-A7 C2-C3 C5
23 00 08-0B 15 29-2A
26 10-12 6D-6F
27 13 17
FF FD
Almost all of these are a set of basic mathematic characters that most
high school students should be familiar with. They are very useful to
have available in academic email discussions and source code comments.
It would be nice if the authors of WGL4 considered seriously to extend
their Unicode subset by those few dozen elementary math symbols. Then
SECS would become a subset of WGL4. VSECS is already a subset of WGL4
except for U+FFFD.
The mathematical symbols of SECS will hopefully provide for US
developers who do not specialize in i18n issues some motivation to get
interested in 16-bit character sets, as they are more relevant for their
personal use than the accented characters of crazy Europeans.
My dream is that something like SECS becomes rather soon the common
minimum repertoire in Unix X11 fonts and printer fonts. VSECS is
intended as an intermediate step for applications where the size of the
character set is critical and only Latin script support is required.
I do not think SECS contains any useless symbol. I know for each letter
and symbol why it is in there and in which languages or fields it is
used. Just ask.
Much more information on the two sets is available from
http://www.cl.cam.ac.uk/~mgk25/ucs/vsecs.html
http://www.cl.cam.ac.uk/~mgk25/ucs/secs.html
Much better than just looking at these web pages is to download the
database (Perl needed) that generated them from
http://www.cl.cam.ac.uk/~mgk25/ucs/secs.tar.gz
Then you can play around with them and test the subset properties with
regard to other sets easily yourself.
If you want to see example glyphs on the HTML output of this script,
then you'll also need
http://www.cl.cam.ac.uk/~mgk25/ucs/glyphs.zip
The uniset Perl script allows you to comfortably build up your own
database of character collections, to merge and subtract them and to
generate Unicode subsets and study their relations with other subsets.
The mapping files from the Unicode Consortium can be used directly as
input.
Please let me know what you think about SECS and VSECS and if this is
something you would like to see widely implemented.
Markus
--
Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK
email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/>
>SECS is somewhat similar to Microsoft/Adobe WGL4. I think SECS is much
>better than WGL4, because WGL4 contains many letters for which I could
>not find out where they are used (for at least three I am sure they
>never existed).
Markus, I'm very interested in your proposal, but would like to know
for which WGL4 letters you could find no use. I have spent a lot of
time researching European (and non-European) orthographies, and may be
able to account for some of the lesser known letters (which is not to
say that I think WGL4 is perfect).
John Hudson, Type Director
Tiro Typeworks
Vancouver, BC
ti...@tiro.com
www.tiro.com
I'm very interested in hearing more about what the rationale to have
the following characters in WGL4 might be:
I don't know where the following ones come from:
0114 # LATIN CAPITAL LETTER E WITH BREVE
0115 # LATIN SMALL LETTER E WITH BREVE
012C # LATIN CAPITAL LETTER I WITH BREVE
012D # LATIN SMALL LETTER I WITH BREVE
014E # LATIN CAPITAL LETTER O WITH BREVE
014F # LATIN SMALL LETTER O WITH BREVE
01FA # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
01FB # LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE
01FC # LATIN CAPITAL LETTER AE WITH ACUTE
01FD # LATIN SMALL LETTER AE WITH ACUTE
01FE # LATIN CAPITAL LETTER O WITH STROKE AND ACUTE
01FF # LATIN SMALL LETTER O WITH STROKE AND ACUTE
02C9 # MODIFIER LETTER MACRON
02D6 # MODIFIER LETTER PLUS SIGN
0387 # GREEK ANO TELEIA
The long s might be from German Fraktur fonts which is unused
since ~1945. This letter has certainly no equivalent in modern
German roman/antiqua fonts and is certainly not needed to
write German:
017F # LATIN SMALL LETTER LONG S
I understand that the following ones were added by mistake to
ISO 6937:
0132 # LATIN CAPITAL LIGATURE IJ
0133 # LATIN SMALL LIGATURE IJ
013F # LATIN CAPITAL LETTER L WITH MIDDLE DOT
0140 # LATIN SMALL LETTER L WITH MIDDLE DOT
0149 # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
Usage of LIGATURE IJ is now deprecated in the Netherlands
and the other ones never existed in Catalan or Afrikaans
as originally assumed (source: NL gov manual by J.W. van
Wingen).
The following are claimed to be used in Welsh, but Welsh
native speakers who I asked claimed to have never seen them,
so I suspect they are historic characters that are not in
general use.
1E80 # LATIN CAPITAL LETTER W WITH GRAVE
1E81 # LATIN SMALL LETTER W WITH GRAVE
1E82 # LATIN CAPITAL LETTER W WITH ACUTE
1E83 # LATIN SMALL LETTER W WITH ACUTE
1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
1E85 # LATIN SMALL LETTER W WITH DIAERESIS
1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
1EF3 # LATIN SMALL LETTER Y WITH GRAVE
The purpose of the following characters is also
unclear to me:
201B # SINGLE HIGH-REVERSED-9 QUOTATION MARK
203C # DOUBLE EXCLAMATION MARK
203E # OVERLINE
All these are in WGL4 but (so far) not in SECS.
There are also some mysterious characters in the MES-2
proposal which I have not found anywhere else:
01B7 # LATIN CAPITAL LETTER EZH
01C4 # LATIN CAPITAL LETTER DZ WITH CARON
01C6 # LATIN SMALL LETTER DZ WITH CARON
01C7 # LATIN CAPITAL LETTER LJ
01C9 # LATIN SMALL LETTER LJ
01CA # LATIN CAPITAL LETTER NJ
01CC # LATIN SMALL LETTER NJ
01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON
01E4 # LATIN CAPITAL LETTER G WITH STROKE
01E5 # LATIN SMALL LETTER G WITH STROKE
01E6 # LATIN CAPITAL LETTER G WITH CARON
01E7 # LATIN SMALL LETTER G WITH CARON
01E8 # LATIN CAPITAL LETTER K WITH CARON
01E9 # LATIN SMALL LETTER K WITH CARON
01EE # LATIN CAPITAL LETTER EZH WITH CARON
01EF # LATIN SMALL LETTER EZH WITH CARON
01F1 # LATIN CAPITAL LETTER DZ
01F3 # LATIN SMALL LETTER DZ
01F4 # LATIN CAPITAL LETTER G WITH ACUTE
01F5 # LATIN SMALL LETTER G WITH ACUTE
027C # LATIN SMALL LETTER R WITH LONG LEG
0292 # LATIN SMALL LETTER EZH
0374 # GREEK NUMERAL SIGN
0375 # GREEK LOWER NUMERAL SIGN
037A # GREEK YPOGEGRAMMENI
037E # GREEK QUESTION MARK
Do you know a good reason why any of these characters should
go into a simple European character set?
My browser finally finished downloading Markus' SECS website, and I
have prepared the following comments on some of the WGL4 characters he
has excluded from SECS. I believe that some of these characters should
be included in the SECS, in accordance with Markus' criteria, and have
marked my comments on these characters with an asterisk.
I have not bothered to comment on the heavy linedraw characters, etc.,
and have confined my comments to letters and diacritics.
[I am also concerned that Markus' recommended mathematical set may be
too extensive. Is this really a _basic_ mathematical subset, or
something more?]
0114 LATIN CAPITAL LETTER E WITH BREVE
0115 LATIN SMALL LETTER E WITH BREVE
012C LATIN CAPITAL LETTER I WITH BREVE
012D LATIN SMALL LETTER I WITH BREVE
These characters are not required for the modern writing of any
European language. They are essential to much European prosody, and
are found in most Latin language textbooks and dictionaries. I believe
it would be sound to omit them from SECS if the basic, non-combining
IPA characters are also to be omitted. If the latter are included it
would make sense to include short and long vowel diacritics.
0132 LATIN CAPITAL LIGATURE IJ
0133 LATIN SMALL LIGATURE IJ
These, of course, are the Dutch digraph characters. There is no need
for them to be separately encoded, as Dutch writers commonly type /I/
followed by /J/. These characters can, I believe, be safely omitted
from SECS.
013F LATIN CAPITAL LETTER L WITH MIDDLE DOT
0140 LATIN SMALL LETTER L WITH MIDDLE DOT
These are composite rendering forms for the Catalan lateral
approximant. They are not strictly necessary in a character set which
includes an appropriately sized, positioned and spaced midpoint
character (U+00B7). I am a little concerned that in a monospaced font,
of the kind referred to in Markus' SECS criteria, reliance on the
midpoint character will produce gaping holes in the middle of many
Catalan words. I am undecided about the possible inclusion of these
characters.
0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
This is an Hewlett Packard character, apparently used by them for
Afrikaans. I've never heard a clear explanation of its purpose, or its
inclusion in WGL4 or other character sets (other than the fact that HP
wanted it to be included). In any case, Afrikaans is beyond the scope
of SECS, so this character may be safely omitted.
014E LATIN CAPITAL LETTER O WITH BREVE
014F LATIN SMALL LETTER O WITH BREVE
These characters are not required for the modern writing of any
European language. They are essential to much European prosody, and
are found in most Latin language textbooks and dictionaries. I believe
it would be sound to omit them from SECS if the basic, non-combining
IPA characters are also to be omitted. If the latter are included it
would make sense to include short and long vowel diacritics.
017F LATIN SMALL LETTER LONG S
Archaic. This may be safely omitted.
01A0 LATIN CAPITAL LETTER O WITH HORN
01A1 LATIN SMALL LETTER O WITH HORN
01AF LATIN CAPITAL LETTER U WITH HORN
01B0 LATIN SMALL LETTER U WITH HORN
Vietnamese. These characters may be safely omitted (although there are
sizeable Vietnamese speaking populations in parts of Europe, notably
in the Netherlands).
01FA LATIN CAPITAL LETTER A WITH RING ABOVE AND
ACUTE
01FB LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE
01FC LATIN CAPITAL LETTER AE WITH ACUTE
01FD LATIN SMALL LETTER AE WITH ACUTE
01FE LATIN CAPITAL LETTER O WITH STROKE AND ACUTE
01FF LATIN SMALL LETTER O WITH STROKE AND ACUTE
* These characters are used in Danish and their inclusion in both
Unicode and the WGL4 set was at the request of the Danish standards
organization. My understanding is that there is some debate over the
status of these characters in modern Danish. Some sources claim that
they are archaic, others that they are orthographically correct and
that to omit them is a mistake. I believe they should not be omitted
from SECS without further research.
02D6 MODIFIER LETTER PLUS SIGN
This may be safely omitted.
1E80 LATIN CAPITAL LETTER W WITH GRAVE
1E81 LATIN SMALL LETTER W WITH GRAVE
1E82 LATIN CAPITAL LETTER W WITH ACUTE
1E83 LATIN SMALL LETTER W WITH ACUTE
1E84 LATIN CAPITAL LETTER W WITH DIAERESIS
1E85 LATIN SMALL LETTER W WITH DIAERESIS
1EF2 LATIN CAPITAL LETTER Y WITH GRAVE
1EF3 LATIN SMALL LETTER Y WITH GRAVE
* All these characters are used in modern Welsh and should _not_ be
omitted from SECS. Their use is less common than the W and Y
circumflex diacritics, but all are essential to semantic distinction
and or pronunciation. My source for this information is Andrew Hawke
(a...@pophost.aber.ac.uk), assistant editor of the University of Wales
dictionary of the Welsh language. I can provide a Welsh word list if
required.
>There are also some mysterious characters in the MES-2
>proposal which I have not found anywhere else:
>01B7 # LATIN CAPITAL LETTER EZH
Ezh (or Yogh) is found in Old and Middle English texts, and is a
letter in the orthographies of a number of African languages. The only
modern European language I associate it with is Skolt Saami (see
below). The number of speakers/writers of Skolt Saami is probably well
below the 10,000 minimum set in Markus' criteria.
>01C4 # LATIN CAPITAL LETTER DZ WITH CARON
>01C6 # LATIN SMALL LETTER DZ WITH CARON
>01C7 # LATIN CAPITAL LETTER LJ
>01C9 # LATIN SMALL LETTER LJ
>01CA # LATIN CAPITAL LETTER NJ
>01CC # LATIN SMALL LETTER NJ
These are digraphs which were separately encoded in ISO/IEC 10646 and
Unicode to facilitate compatible font mappings between Latin and
Cyrillic fonts for Serbo-Croatian. Language reform policies in the
former Yugoslav republic -- particularly in Croatia -- have greatly
reduced the need for such compatability. I believe these digraph
characters may still be of use in Serbia, if transliteration to Latin
script is a requirement, but such specialised usage may fall beyond
the proposed scope of SECS.
>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON
Unicode 2.0 identifies these characters as Lappish. In the first
place, Lappish is generally considered a derogatory term; in the
second these characters do not appear in any of the Saami
orthographies I have collected. Note that I only have Latin
orthographies for five of the nine Saami languages.
>01E4 # LATIN CAPITAL LETTER G WITH STROKE
>01E5 # LATIN SMALL LETTER G WITH STROKE
These characters are used to write Skolt Saami in the Latin script
(Skolt Saami is also written by some using the Cyrillic script). The
number of speakers/writers of Skolt Saami is probably well below the
10,000 minimum set in Markus' criteria.
>01E6 # LATIN CAPITAL LETTER G WITH CARON
>01E7 # LATIN SMALL LETTER G WITH CARON
I can find no reference for these characters. Their use in Turkish is
incorrect and an unacceptable substitute for the G breve diacritics.
Unicode 2.0 indicates 'Lappish', but they do not occur in any of the
Saami orthographies I have on file.
>01E8 # LATIN CAPITAL LETTER K WITH CARON
>01E9 # LATIN SMALL LETTER K WITH CARON
>01EE # LATIN CAPITAL LETTER EZH WITH CARON
>01EF # LATIN SMALL LETTER EZH WITH CARON
These characters are used to write Skolt Saami in the Latin script
(Skolt Saami is also written by some using the Cyrillic script). The
number of speakers/writers of Skolt Saami is probably well below the
10,000 minimum set in Markus' criteria.
>01F1 # LATIN CAPITAL LETTER DZ
>01F3 # LATIN SMALL LETTER DZ
>01F4 # LATIN CAPITAL LETTER G WITH ACUTE
>01F5 # LATIN SMALL LETTER G WITH ACUTE
Your guess is as good as mine. I believe these can be safely omitted.
>027C # LATIN SMALL LETTER R WITH LONG LEG
I know of no usage of this character outside of phonetic transcription
(strident apico-alveolar trill). I'm not even sure that it remains
part of the official IPA standard set.
>0292 # LATIN SMALL LETTER EZH
See note above for uppercase Ezh/Yogh. Of course, if it is decided to
include a basic IPA subset, this character would become necessary.
>0374 # GREEK NUMERAL SIGN
>0375 # GREEK LOWER NUMERAL SIGN
I believe these to be archaic, and are only of use when Greek letters
are serving as numerals (as they did before the introduction of
'Arabic' numerals).
>037A # GREEK YPOGEGRAMMENI
This is the Greek subscript iota. It is not used in modern, monotonic
Greek, so may be safely omitted from SECS.
>037E # GREEK QUESTION MARK
I'm unable to confirm, at this time, whether this punctuation mark is
still in use or not. I suspect not, and most readers would be unlikely
to distinguish it from a semicolon.
(snip)
>The following are claimed to be used in Welsh, but Welsh
>native speakers who I asked claimed to have never seen them,
>so I suspect they are historic characters that are not in
>general use.
>
>1E80 # LATIN CAPITAL LETTER W WITH GRAVE
>1E81 # LATIN SMALL LETTER W WITH GRAVE
>1E82 # LATIN CAPITAL LETTER W WITH ACUTE
>1E83 # LATIN SMALL LETTER W WITH ACUTE
>1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
>1E85 # LATIN SMALL LETTER W WITH DIAERESIS
>1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
>1EF3 # LATIN SMALL LETTER Y WITH GRAVE
Actually, if one is doing a Welsh *pronunciation* guide these could be
potentially useful; I've also seen at least "w" and "y" acute in past.
(The others, I'll admit, *are* weird--I'm not entirely sure where they'd
be used, save *maybe* in other languages in the same subfamily of Celtic
languages Cymru/Welsh is in [for example, Breton or Manx]. I'm rather
afraid I don't speak any Celtic tongue so I can't be for sure on this;
if memory serves, there is a Manx dictionary online, though. IF my
memory of that serves at ALL well, Manx doesn't use "w" as a vowel but
*does* use "y"; I know exactly nothing on Breton.)
*POSSIBLY* w-diaresis and y-diaresis occur in *some* transcription schemes
for Native American languages (if they occur in this, it'd likely be for
Northwest languages that have vowels and consonants that literally cannot
be expressed in any other way without resorting to the International
Phonetic Alphabet).
Y-dieresis and y-dieresis do occur in the standard character sets of most
English-language Postscript and Truetype fonts.
Offhand, as an aside--I expect some of the other oddish characters
(AE-grave, etc.) are also used mostly in pronunciation guides as well.
A-dieresis-grave, etc. *may* be used in Vietnamese, but I'm not sure.
>The purpose of the following characters is also
>unclear to me:
>
>201B # SINGLE HIGH-REVERSED-9 QUOTATION MARK
>203C # DOUBLE EXCLAMATION MARK
>203E # OVERLINE
>
>All these are in WGL4 but (so far) not in SECS.
Double-exclamation sounds more like a "typesetting character"; so does
"high reversed-9 quot mark" (maybe this is equivalent to leftquot?)
>There are also some mysterious characters in the MES-2
>proposal which I have not found anywhere else:
>
>01B7 # LATIN CAPITAL LETTER EZH
>01C4 # LATIN CAPITAL LETTER DZ WITH CARON
>01C6 # LATIN SMALL LETTER DZ WITH CARON
>01C7 # LATIN CAPITAL LETTER LJ
>01C9 # LATIN SMALL LETTER LJ
EZH I'm not sure on, but it *may* be used in some Turkic languages; DZ and
its variants, and LJ and its variants, occur in some Slavic languages and
also possibly in some Turkic languages (mostly those spoken in countries
that split off from the old USSR and are going back to Romanised
chracters).
(In Cyrillic, separate letters *do* exist for each of these in regional
variants that were used before the USSR split up. This is probably why
they carry over.)
LJ/lj is roughly equivalent to slash-l in Polish, BTW.
>01CA # LATIN CAPITAL LETTER NJ
>01CC # LATIN SMALL LETTER NJ
Used in some Slavic languages, and occasionally in various African
languages. (In Slavic languages, indicates a palatalised-N (similar to
n-acute in some Slavic languages; the "j" essentially means the same as
the "soft mark" in Cyrillic); in the African languages where this is an
actual character, indicates exactly what it says--an "nj" sound (like "ng"
only one doesn't touch one's palate). :)
>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON
If memory serves, used in Vietnamese (in this case, the macron is a tone
character) and in some transcription schemes for Native American
languages.
>01E4 # LATIN CAPITAL LETTER G WITH STROKE
>01E5 # LATIN SMALL LETTER G WITH STROKE
I've only seen this offhand in *some* transcription schemes for Native
American languages [this indicates *roughly* the same as g-caron; see
below] but it may occur in Turkic languages that are converting to Roman
characters.
>01E6 # LATIN CAPITAL LETTER G WITH CARON
>01E7 # LATIN SMALL LETTER G WITH CARON
Commonly used in Turkish and some other Turkic languages to indicate a
"hard G" sound. Also occurs, for the same sound, in some Native American
language transcription schemes.
>01E8 # LATIN CAPITAL LETTER K WITH CARON
>01E9 # LATIN SMALL LETTER K WITH CARON
Less common, but does occur in some Turkic languages; indicates a "hard K"
sound (like hard G--you say it in the back of your throat). Occurs in
some transcription schemes for Native American languages as well.
(As a minor aside--you will find many, MANY standards for transcription
and, in some cases, transliteration of Native American languages. These
vary from fitting the closest Roman equivalent, to using diacritical marks
for consonants that are "sort" of close [many languages have literally two
to four different ways you can pronounce a consonant sound where we might
have one in English, for example] to using unused characters to represent
sounds ["x" for "sh" and "c" for "soft ch" are rather common] to resorting
to the IPA when there's no really good way to represent it via Roman
characters. Hence my notes on this. :)
>01EE # LATIN CAPITAL LETTER EZH WITH CARON
>01EF # LATIN SMALL LETTER EZH WITH CARON
Possibly used in some Slavic languages and Slavic transliteration schemes.
Possibly occurs in Turkic languages. (Again, an "ezh-caron" equivalent
does occur in several local variants of Cyrillic used for "minority
languages" in the USSR.)
>01F1 # LATIN CAPITAL LETTER DZ
>01F3 # LATIN SMALL LETTER DZ
Commonly used in Slavic and Turkic languages; occurs in some Native
American languages as well (most notably the Na-Dene family, which
includes Dine' [Navaho]).
>01F4 # LATIN CAPITAL LETTER G WITH ACUTE
>01F5 # LATIN SMALL LETTER G WITH ACUTE
Fairly unusual, but does occur in some Native American and Slavic (and
possibly Turkic as well, depending on the country's Romanisation scheme)
languages. Usually indicates a palatalised g sound in the few places
where I've seen it.
>027C # LATIN SMALL LETTER R WITH LONG LEG
Fairly unusual; used in some Native American languages as an R-variant.
This is borrowed from the IPA, offhand. This also, occasionally, occurs
in transcription schemes for some African languages.
Some Turkic languages may use it; not sure (at least I've not *seen* any)
however.
>Do you know a good reason why any of these characters should
>go into a simple European character set?
Some of them I'm sort of puzzled on m'self. Some (like Y-dieresis and
Y-acute-dieresis, for example) I can see as they are used in languages
with a known, large audience on Usenet (for instance, Vietnamese-language
or Cymru-language newsgroups).
Some of them, I will frankly admit (namely, *all* the Greek characters
noted and, possibly, some of the other *unusual* letters like longleg-r
and k-macron, etc.) puzzle me why they're included. (As far as I know,
longleg-r only exists in a few Native American transcription schemes and
in some African-language transcription schemes; unless there is a large
Usenet population of folks wishing to type in Salish, I'm not sure why it
should be there. [If it is in there, we should go ahead and add upside-
down K/k, upside-down T/t, cedilla-H, Latin-omega-acute-dieresis,
Latin-chi, etc. and all the other IPA characters you *have* to import from
the IPA to write some of the languages of that area. :) And, of course,
import Latin capital-schwa and Latin small-schwa for our friends in
Azerbaijan; hell, let's just import the entire IPA and be done with it :)
Ah well...I'm sure the author will be glad to explain, in any case. :)
-moo
who, incidentially, still wants to know when the author will write the
terminal patch that will allow a VT100 terminal hooked up to an IBM 3090
mainframe to actually *read* these strange and ferlie characters, or pay
for the unis still using these beasts for student Internet access to
upgrade to nice happy spanking new DEC Alphas and upgrade everyone's
computer to a Pentium whilst they're at it :)
No diaeresis's and macrons in Vietnamese.
One needs:
one of <a>, <i>, <u>, <e>, <o>, <y>
with possibility of a circumflex on <a>,
or a horn on <o>,
or a horn on <u>,
or a circumflex on <o>,
or a circumflex on <e>
plus nothing,
or acute accent,
or grave accent,
or "curl", (sorry, do not know technical name for this)
or tilde,
or dot underneath (is there a technical name for this?)
(Not all of the above combinations will exist.)
(Optionally, a <2> or <z>-like hybrid of "curl" and tilde
may occur in the handwriting of southern Vietnamese
speakers who do not distinguish the two tones
marked by those diacritics.)
Thomas Chan
tc...@cornell.edu
>The long s might be from German Fraktur fonts which is unused
>since ~1945. This letter has certainly no equivalent in modern
>German roman/antiqua fonts and is certainly not needed to
>write German:
>
>017F # LATIN SMALL LETTER LONG S
While I agree that this letter is not needed in a basic European
character set your reasoning is quite wrong.
The long s was actually used in *both* Fraktur and Antiqua (i.e.
non-Fraktur) typefaces for centuries, and is completely unrelated to
any "Germanness". You should see lots of long s in any older English
(French, Italian, ...) book. The only difference is that Antiqua (or
"Latin") typefaces eventually dropped the long s while Fraktur
typefaces kept it to this day.
As for Fraktur going out of fashion in Germany by 1945... well, the
connection between Nazis and Fraktur is a common misconception.
Actually, the Nazi government *discouraged* use of Fraktur in 1940
because Hitler thought it outdated and contrary to his plans to
"modernise" Germany according to Nazi ideology.
As for Fraktur being "unused" today... several new Fraktur typefaces
have been designed during the past few decades by German designers.
If you go to any newspaper stand you'll see plenty of Fraktur
headlines on newspapers of any nationality. Station and street signs
are also frequently set in Fraktur. But I agree that Fraktur
typefaces are only being used as decorative fonts these days, not as
text fonts which is the important criterium for this discussion.
--
Chris Nahr (cn...@hal9000.net, replace hal9000 with ibm to e-mail me)
Please don't e-mail me if you post! PGP key at wwwkeys.ch.pgp.net
> 017F # LATIN SMALL LETTER LONG S
It is useful for quoting most old English and German literature.
-- William Ehrich
Correction to myself: There's also the possibility of
a breve on <a>.
> As for Fraktur going out of fashion in Germany by 1945... well, the
> connection between Nazis and Fraktur is a common misconception.
> Actually, the Nazi government *discouraged* use of Fraktur in 1940
> because Hitler thought it outdated and contrary to his plans to
> "modernise" Germany according to Nazi ideology.
I don't know if he ever stated that. However, on the 23rd of January 1941,
an official order of the German Nazist Party (Anordnung 2/41; Ordnungsziffer
111) abolished Fraktur and Schwabacher from all printed items, saying:
...It is ordered that from now on only the normal type is to be used
for all printed documents. As normal type, the antiqua type is meant.
The so-called gothic type (Fraktur) is not a german type but goes back
to the schwabacher jew-letters. This type has been strongly used in
Germany because Jews owned the printing works already since typography
was introduced, and later on the newspapers...
I don't know who did the translation (possibly Yannis Haralambous) but
it was accompanied by a photostat of the order. See TUGboat vol. 12 no.
1, March 1991.
--Paul
If you go with the Barcelona you will find that there is a station
which appears to be named Paral-lel, so thick is the middle dot,
and this is not the only instance, I've seen.
>* These characters are used in Danish and their inclusion in both
>Unicode and the WGL4 set was at the request of the Danish standards
>organization. My understanding is that there is some debate over the
>status of these characters in modern Danish. Some sources claim that
>they are archaic, others that they are orthographically correct and
>that to omit them is a mistake. I believe they should not be omitted
>from SECS without further research.
Of course, as I'm coming from a neighbour country, I cannot be taken
for an authority, but so much I can say that I have never seen them.
Then again, most Russian I have read had accented vowels, and there
appears to be no accented Cyrillic letters in Markus's set. But as
you might have guessed, I never came much further than my beginner's
textbook...
--
Erland Sommarskog, Stockholm, som...@algonet.se
This could have been my two cents worth, but alas the Swedish
government has decdided that I am not to have any cents.
Thanks for your comments, they have been very useful.
> [I am also concerned that Markus' recommended mathematical set may be
> too extensive. Is this really a _basic_ mathematical subset, or
> something more?]
There is an international standard ISO 31-11 that defines large parts
of the mathematical notation that is commonly used all over the world.
Most of the character that I have included are from ISO 31-11. I
tried to cover this standard entirely as far as this is possible
in a fixed-width font.
The actual list of math characters that I have included is appended
below. It contains a few remarks about why I think this character
should be covered. Comments welcome.
It is a quite comprehensive set of symbols, so I certainly would not
argue that the math collection should become any larger. I admit that
there are might be a few less common symbols in it that are mostly
of concern to computer scientists, but after all, these are computer
character sets and I can well imagine that most of these symbols
will be used in source code comments etc.
0x80 0x20AC #EURO SIGN
0x81 #UNDEFINED
0x82 0x201A #SINGLE LOW-9 QUOTATION MARK
0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
0x84 0x201E #DOUBLE LOW-9 QUOTATION MARK
0x85 0x2026 #HORIZONTAL ELLIPSIS
0x86 0x2020 #DAGGER
0x87 0x2021 #DOUBLE DAGGER
0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT
0x89 0x2030 #PER MILLE SIGN
0x8A 0x0160 #LATIN CAPITAL LETTER S WITH CARON
0x8B 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C 0x0152 #LATIN CAPITAL LIGATURE OE
0x8D #UNDEFINED
0x8E 0x017D #LATIN CAPITAL LETTER Z WITH CARON
0x8F #UNDEFINED
0x90 #UNDEFINED
0x91 0x2018 #LEFT SINGLE QUOTATION MARK
0x92 0x2019 #RIGHT SINGLE QUOTATION MARK
0x93 0x201C #LEFT DOUBLE QUOTATION MARK
0x94 0x201D #RIGHT DOUBLE QUOTATION MARK
0x95 0x2022 #BULLET
0x96 0x2013 #EN DASH
0x97 0x2014 #EM DASH
0x98 0x02DC #SMALL TILDE
0x99 0x2122 #TRADE MARK SIGN
0x9A 0x0161 #LATIN SMALL LETTER S WITH CARON
0x9B 0x203A #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C 0x0153 #LATIN SMALL LIGATURE OE
0x9D #UNDEFINED
0x9E 0x017E #LATIN SMALL LETTER Z WITH CARON
0x9F 0x0178 #LATIN CAPITAL LETTER Y WITH DIAERESIS
Most of them make perfectly sense and are useful extentions, however
I have no idea what the purpose of the following three is:
0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT
0x98 0x02DC #SMALL TILDE
Any ideas?
Probably the Dutch Gulden symbol.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
> In article <35D84110...@cl.cam.ac.uk> Markus Kuhn <Marku...@cl.cam.ac.uk> writes:
> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
>
> Probably the Dutch Gulden symbol.
Certainly the Dutch Guilder/Gulden/Florin symbol.
--Paul
Shall we still include it in VSECS (same for the Peseta sign from
CP437)? If we include Peseta and Gulden, then we would also
have to include the Franc and Lira symbols. All these currency
symbols are expected to be superseded by the Euro symbol from
mid-2002 on and would only be of historical value.
> Dik T. Winter wrote:
> > > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> >
> > Probably the Dutch Gulden symbol.
>
> Shall we still include it in VSECS (same for the Peseta sign from
> CP437)? If we include Peseta and Gulden, then we would also
> have to include the Franc and Lira symbols.
What puzzles me is why Unicode added a Lira symbol. The lira symbol
is essentially identical to the pound symbol. A couple of Unicode
fonts I've seen give the pound one cross-stroke and the lira two
cross-strokes, but that's a matter of aesthetics and font design.
Even if there is some inherent national preference one way or the other
between UK and Italian typography, typesetters in each country will tend to
use the same glyph for both symbols (or, more usually, use the symbol
for their national currency and a letter for the other one to avoid
confusion).
--Paul
>>The following are claimed to be used in Welsh, but Welsh
>>native speakers who I asked claimed to have never seen them,
>>so I suspect they are historic characters that are not in
>>general use.
>>1E80 # LATIN CAPITAL LETTER W WITH GRAVE
>>1E81 # LATIN SMALL LETTER W WITH GRAVE
>>1E82 # LATIN CAPITAL LETTER W WITH ACUTE
>>1E83 # LATIN SMALL LETTER W WITH ACUTE
>>1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
>>1E85 # LATIN SMALL LETTER W WITH DIAERESIS
>>1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
>>1EF3 # LATIN SMALL LETTER Y WITH GRAVE
For the record, as provided to me by Andrew Hawke, assistant editor of
the University of Wales Dictionary of the Welsh Language:
Modern usage of the diacritics in Welsh is as follows:
The circumflex is used solely to indicate that a vowel is long
in a context in which it would normally be expected to be
short, e.g.:
gwa^n (he pierces) vs. gwan (weak)
gwe^n (a smile) vs. gwen (white (fem.))
pi^n (pine (wood, tree)) vs. pi`n (a pin)
co^r (a choir) vs. cor (a dwarf)
bu^m (I was (perfect)) vs. bum (five (mutated))
tw^r (a tower) vs. twr (a group)
y^m (we are) vs. ym (in (before m))
The diaeresis is used to separate vowels, as in English:
prosa"ig (prosaic)
cre"wr (creator)
copi"o (to copy)
tro"edigaeth (conversion)
du"wch (blackness)
Rebacay"ddiaeth (lit. Rebaccaism)
cyw"res (concubine)
The acute accent is used to indicate unexpected stress (i.e.
not on the penultimate):
casa'u (to hate)
case't (cassette)
ricri'wt (a recruit)
paraso'l (a parasol)
rebu'wc (a rebuke)
caridy'ms (riff-raff)
gw'raidd (manly)
[this last is on the penult, but is to distinguish it
from the word gwraidd (root)which is monosyllabic]
The grave accent is used to indicate that a vowel is short in
a context in which it would normally be expected to be long:
pa`s (a pass, permit) vs. pas (a cough)
sie`d (a shed) vs. sie^d/sied (escheat)
sgi`l (a skill) vs. sgi^l/sgil (following)
no`d (a nod) vs. nod (a target, an aim)
cu`l (a hut) vs. cul (narrow)
mw`g (a mug) vs. mwg (smoke (n.))
py`g (dirty) vs. pyg (pitch, tar)
Generally speaking, diacritics in Welsh cannot reasonably be
omitted as they are used either to show unusual stress, or to
differentiate between pairs of otherwise identical words with
different pronounciations. As such they are equally necessary
in upper- and lower-case forms.
The commonest diacritic is the circumflex, followed by the
acute and diaeresis probably about equally. The grave is rare,
but as more and more words are borrowed from English, and new
compounds coined for technical terms, their use will
undoubtedly increase.
To give a very rough indication, according to the headwords in
our (unfinished) dictionary (which we estimate will contain
about about 84,500 entries), the number of accented keywords
(extrapolated to the expected finished size of the dictionary)
will be roughly:
circumflex: 2,000
diaeresis: 880
acute: 500
grave: 160
Clearly it would be a mistake to omit these diacritics from any
character set intended to support the Welsh language.
> The Windows standard character set CP1252 extends ISO 8859-1
> by the following 27 characters:
>
[...]
>
> Most of them make perfectly sense and are useful extentions, however
> I have no idea what the purpose of the following three is:
>
> 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
This is the guilder sign. Unicode, for whatever reason, doesn't
include an actual guilder/florin sign, but the small f with hook looks
right. This mapping is an approximation. Both the Windows and
Macintosh character sets include the character, so its omission from
Unicode was a surprise to me.
> 0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT
> 0x98 0x02DC #SMALL TILDE
These are to distinguish between the character and the accent. The
circumflex (shift-6 on most US keyboards) is now used for the literal
character (for TeX superscript, regexp inversion...), and so a
distinct character is needed for the diacritic. Similarly, the tilde
is now used for home directories or approximation; a smaller tilde is
needed for using as a diacritic.
-Chris
--
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
Include them. It is going to be much more painful to omit them,
IMNSHO. However, my understanding is that the Franc symbol isn't in
common use; in fact, I've had French people tell me "what Franc
symbol", pretty much what I'd tell anyone who'd ask me what the symbol
for a Swedish Crown is.
-hpa
--
PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD 1E DF FE 69 EE 35 BD 74
See http://www.zytor.com/~hpa/ for web page and full PGP public key
I am Bahá'í -- ask me about it or see http://www.bahai.org/
"To love another person is to see the face of God." -- Les Misérables
*BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
exactly one cross-stroke, the Italian Lira symbol has two. You
*never* see the other way around, and they are not interchangable. It
is not like the one or two strokes on the dollar sign!
Can anyone tell me if the Turkish Lira symbol (a sort of TL monogram similar
to the TM (trademark) symbol)
which exists in the Teletext character set but not in Unicode is another
currency symbol that is not used in practice (or just an invention of the
Teletext standards authority)?
--
Stephen Baynes CEng MBCS Stephen...@soton.sc.philips.com
Philips Semiconductors Ltd
Southampton SO15 0DJ +44 (01703) 316431
United Kingdom My views are my own.
Do you use ISO8859-1? Yes if you see © as copyright, ÷ as division and ½ as 1/2.
We write IJ and ij because the glyph is not available. If you look at TeX you
see that it is added because we (the Dutch) wanted and needed it.
<smily on>
We do not have much of a culture, so don't take away the little we have left.
<smily off>
The Graphical Gnome (r...@ktibv.nl)
Sr. Software Engineer IT Department
-----------------------------------------
The Unofficial Delphi Developers FAQ
http://www.gnomehome.demon.nl/uddf/index.htm
Because of the fact that most old typewriting systems could not cope with this
glyph does not mean it is deprecated in the Netherlands. It's a Dutch glyph,
and we are mighty proud of it!.
At school, I was taught to write a pound sign with two strokes
(Scotland, mid-70's). I don't write it that way now, for with age
comes laziness.
Peter, there are ways of disagreeing with people that are not so
inflammatory. Always think it possible that you might be mistaken.
--
Stewart C. Russell, Glasgow, Scotland - scr...@enterprise.net
"Hang on... This is the real thing... The truth, my friend,
and nothing but the truth" - Mervyn Peake
http://homepages.enterprise.net/scruss/
> Followup to: <evale...@sktb.demon.co.uk>
> By author: p...@sktb.demon.co.uk
> In newsgroup: comp.std.internat
> >
> > What puzzles me is why Unicode added a Lira symbol. The lira symbol
> > is essentially identical to the pound symbol. A couple of Unicode
> > fonts I've seen give the pound one cross-stroke and the lira two
> > cross-strokes, but that's a matter of aesthetics and font design.
> > Even if there is some inherent national preference one way or the other
> > between UK and Italian typography, typesetters in each country will tend to
> > use the same glyph for both symbols (or, more usually, use the symbol
> > for their national currency and a letter for the other one to avoid
> > confusion).
> >
>
> *BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
> exactly one cross-stroke, the Italian Lira symbol has two.
*BZZZZZZT*. Totally wrong answer! I'm in the UK and have been for
the 40-odd years of my life. My father was a printer, as was my brother,
grandfather, three uncles and various cousins (in case you're interested, my
father was a laserjet). I'm old enough to remember when the two
cross-stroke form was the norm in the UK. In fact I'm old enough that I was
*taught* that the two-stroke form should be used.
> You *never* see the other way around,
I admit that the one-stroke form predominates in the UK these days. But
that is a matter of typographic style, not an absolute rule. Either is
acceptable.
> and they are not interchangable.
<panto>Oh yes they are</panto>.
Take a look at Whittaker's Almanac in the foreign currency section. It
uses the one-stroke form for pound, punt and lira.
> It is not like the one or two strokes on the dollar sign!
Ah, but it is. There may be national preferences involved and these may
change over time, but one- or two-cross stroke forms are entirely
interchangeable in the UK. Dunno about Italy.
--Paul
>You can also write oe, ue and ae. Does this mean that the o-umlaut, u-umlaut
>and a-umlaut should be removed?? The same applies for the German Sharp s tou
>can write it as ss.
These examples are hardly parallel. Apart from spacing considerations,
the IJ glyph is identical in appearence to an I followed by a J.
Obviously the same cannot be said of the o-umlaut which, in any case,
is also required as an o-diaeresis for non Germanic languages. The
German eszett cannot, in standard German, be replaced by /ss/, as
there exist words which are semantically distinguished by the use of
/ss/ or eszett.
That said, I'm perfectly happy to endorse inclusion of the IJ and ij
digraphs as characters in any font I make for Dutch clients, if they
want them. Most Dutch type designers I know (and I know a _lot_) seem
quite ambivalent about this digraph.
John Hudson
Although the glyphs may be the same or similar, they signify different
things. For much the same reason, the shape A turns up at least twice in
Unicode: once in the Roman alphabet and once in the Cyrillic. (I think
it was John Hudson who mentioned this a while ago.)
Likewise, the ring accent, a superscript zero, and a degree mark all
look about the same, but they are (I hope!) distinct in Unicode.
----
Rodger Whitlock
Actually this symbol is no longer used very much. So omission is no
problem (and it should go away anyhow by 2002). What you mainly see
is one of: NLG, DFL, F, f, fl or simply nothing.
In some fonts, O and 0 look very similar. Still, they are different
characters. Same for A and Greek Alpha and numerous other characters.
--
Claus André Färber <http://www.muc.de/~cfaerber/> Fax: +49_8061_3361
PGP: ID=1024/527CADCD FP=12 20 49 F3 E1 04 9E 9E 25 56 69 A5 C6 A0 C9 DC
I'm affraid you're a bit wrong there, since we TYPE i+j, but WRITE y+umlaut
(mostly, many people even omit the two dots); also, when you use an italics
font, it usually looks better when you have a round y (like Computer Modern)
with two dots instead of i+j (it looks downright awful with italic CM); the
same holds even stronger for script fonts (ever seen the captial IJ that
kids learn at school in the Netherlands? It's neither Y nor I+J). Also, IJ
is not always I+J in roman types, since some designers use a raised smaller
I just above the curl of the J (for capitals that is, in this case ij is
often a round y with dots too). This is NOT a kerning/ligature matter, for
Dutch also knows words like bijectie (bijection) where ij really is i+j,
besides, you'd need a different font just for typesetting Dutch!
Another omission in font/code page design is (well, you can't help that),
that the Dutch can also accentuate ij (and IJ) by putting two acutes on top
of it (like a Hungarian umlaut, omitting the dots on the i end j); now, I
haven't seen that option anywhere yet, not even for j+acute. Note that this
is not even an informal rule, but an official one by the Dutch and Flemish
governments (i+acute+j can be used however, as a last resort).
Add to this that IJ is often treated as a single letter in alphabetation,
you can safely say it is a letter that LOOKS like i+j (in standard typing),
but is quite different.
PS
The reason, I think, that many printers are ambivalent, is because they're
not used to the glyph ij since you can hardly find it anywhere and there are
no computer keyboards (that I know of) that support it either. Probably also
a reason why the /oe/ is infrequent in French (all arguments between
using/not using IJ and OE are interchangable, I think).
Ruben Prins
Het spijt me, maar betweterigheid zit 'm in de genen.
You're right, I'm sorry. Let it suffice to say I wasn't in the right
frame of mind while posting that message.
Anyway, HOWEVER, I gather while to the Brits the dual-stroke £
character may be acceptable (and hence, in Britain this being a
stylistic difference), the same is -- as far as I understand --
distinctly NOT true for the Italians. In Italy you invariably see the
two-stroke version, unless there has been a dramatic change very
recently. The spacing between the two strokes in the Lira symbol is
usually pretty wide; I don't know if that is a stylistic difference or
not.
Either way, I not believe it is correct to say that they are
interchangeable. At least the Brits permit the single-stroke form.
Presumably because LATIN SMALL LETTER F WITH HOOK was considered an
adequate mapping. I believe there is a usage note in the Unicode
manual saying this is used for the Dutch guilder.
I have found another potential origin for the LATIN SMALL LETTER F
WITH HOOK: The ISO registered character set number 143 for mathematical
symbols (<http://www.cl.cam.ac.uk/~mgk25/ucs/IR-143.pdf>) contains
on position 05/13 a symbol called FUNCTION OF SIGN which looks
very similar to LATIN SMALL LETTER F WITH HOOK.
I think I have a quite decent mathematical background, but I have
never seen a FUNCTION OF SIGN used in any math courses that I
visited or math book that I read. It is also not defined in
ISO 31-11, a standard that covers large parts of the global
mathematical notation. What is it good for and where is this
symbol widely used?
Good fonts for data processing usage should be carefully designed to
make the glyphs easily distinguishable, and standardized simple Unicode
subsets should support this and should avoid homoglyphs wherever
this is feasible.
Terminal fonts usually add a dot or a slash to zeros to make them
distinguishable from Os. In OCR-B, the O looks more like a square
while the 0 looks more like a lozenge: /\
\/
Designing an OCR-B extension for all of Unicode or even for a big
subset such as MES-3 should be a quite challanging task.
: Because of the fact that most old typewriting systems could not cope
: with this glyph does not mean it is deprecated in the
: Netherlands. It's a Dutch glyph, and we are mighty proud of it!.
Yes indeed! Most of the time it is considered to be a single letter,
the 'lange ij' ('long ij'). Battus, in his marvelous
'Opperlandse taal en letterkunde', recognizes three alphabets:
Oudhollands, Old Dutch: ... x ij z;
PTT: ... x y z; and
tolerant: ... x y ij z.
The distinction is relevant for pangrams (sentences containing each
letter of the alphabet) and the like.
Some dictionaries and encyclopedias alphabetize the ij after the x,
as do all crossword puzzles.
It is similiar to the CH in Hungarian (or was that Czech): there you will
find signs saying
W
E
CH
S
E
L
S
T
U
B
E
!
--
Jeroen Nijhof J.H.B....@aston.ac.uk
Accordion Links http://www-th.phys.rug.nl/~nijhof/accordions.html
In fact, there are (or at least, were) keyboards that support the glyph ij.
I have an obsolete Burroughs B20 system with Dutch keyboards (it's a small
network). There are keys for ij and even f (the italic long f discussed
somewhere
else in this thread). The accompanying daisywheel printer, a standard Diablo
630, has no problems printing the ij since the correct kerning is present in
the driver table.
--
Marc Joosen
> You can also write oe, ue and ae.
As replacements for "u¹, ä, and ö? No, you can't. Here in Finland,
ae and ue are _not_ accetable replacements for ä and ö - a and o are
much better.
Do you know how much we Finns cry every time we see a reference to the
skier Marja-Liisa Haemaelaeinen (now Kirvesniemi due to marriage)? I
tell you, very much. An "ae" instead of "ä" _hurts_ the eye, and it's
hard to read, too!
Antti-Juhani
¹ Sorry, my keyboard config does not give me fast access to u diaeresis.
--
Antti-Juhani Kaijanaho <ga...@iki.fi> ** <URL:http://www.iki.fi/gaia/> **
All GNU users have more. Most of them have less.
Some of them have most.
> Followup to: <6rcfk1$h92$1...@news.enterprise.net>
> By author: scr...@enterprise.net
> In newsgroup: comp.std.internat
> >
> > h...@transmeta.com (H. Peter Anvin) wrote:
> > >*BZZZZT* Wrong answer! The British Pound (Sterling) symbol has
> > >exactly one cross-stroke, the Italian Lira symbol has two. You
> > >*never* see the other way around, and they are not interchangable.
> >
> > At school, I was taught to write a pound sign with two strokes
> > (Scotland, mid-70's). I don't write it that way now, for with age
> > comes laziness.
> >
> > Peter, there are ways of disagreeing with people that are not so
> > inflammatory. Always think it possible that you might be mistaken.
>
> You're right, I'm sorry. Let it suffice to say I wasn't in the right
> frame of mind while posting that message.
>
> Anyway, HOWEVER, I gather while to the Brits the dual-stroke £
> character may be acceptable (and hence, in Britain this being a
> stylistic difference), the same is -- as far as I understand --
> distinctly NOT true for the Italians.
And is this something you have been *told* by *experienced Italian
typographers or is this merely observation? It's very difficult to spot
any usage of the two-stroke pound symbol in the UK these days because
it appears to have gone out of fashion. Unless you were specifically
told otherwise, the same argument may apply to the Italian Lira.
> In Italy you invariably see the two-stroke version, unless there has been
> a dramatic change very recently.
Unless there was a dramatic change a long time ago...
> Either way, I not believe it is correct to say that they are
> interchangeable. At least the Brits permit the single-stroke form.
Which is something you were adamant we did not until two people
corrected you. Which is why I'd prefer an Italian typographer's comment
on this one...
--Paul
> Dik T. Winter wrote:
> > > > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> > >
> > > This is the guilder sign. Unicode, for whatever reason, doesn't
> > > include an actual guilder/florin sign, but the small f with hook
> > > looks right.
> >
> > Actually this symbol is no longer used very much. So omission is no
> > problem (and it should go away anyhow by 2002). What you mainly see
> > is one of: NLG, DFL, F, f, fl or simply nothing.
>
> I have found another potential origin for the LATIN SMALL LETTER F
> WITH HOOK: The ISO registered character set number 143 for mathematical
> symbols (<http://www.cl.cam.ac.uk/~mgk25/ucs/IR-143.pdf>) contains
> on position 05/13 a symbol called FUNCTION OF SIGN which looks
> very similar to LATIN SMALL LETTER F WITH HOOK.
>
> I think I have a quite decent mathematical background, but I have
> never seen a FUNCTION OF SIGN used in any math courses that I
> visited or math book that I read.
You're kidding, right? You never came across the notation y = f(x)
to denote that y is some function of x?
> It is also not defined in ISO 31-11, a standard that covers large parts
> of the global mathematical notation. What is it good for and where is this
> symbol widely used?
It's used in just about every aspect of mathematics at secondary level
and beyond. The f is an italic f. Actually, in well-designed fonts,
maths italic is slightly different from text italic in the design and
kerning of various letters (but you need to compare the two side-by-side
for most people to notice it).
--Paul
>Terminal fonts usually add a dot or a slash to zeros to make them
>distinguishable from Os.
The slashes aren't a particularly good idea for us Norwegians and
Danes - we rather like our Ø's (O slash for those without proper
terminal software).
--
- Helge Nareid
Nordmann i utlendighet, Aberdeen, Scotland
>And is this something you have been *told* by *experienced Italian
>typographers or is this merely observation? It's very difficult to spot
>any usage of the two-stroke pound symbol in the UK these days because
>it appears to have gone out of fashion. Unless you were specifically
>told otherwise, the same argument may apply to the Italian Lira.
Really, this is a moot point. Regardless of appearances, the Italian
Lira sign and the British Sterling sign are semantically dictinct and
are separately encoded in Unicode. Even if the two glyphs were
identical, I would still expect an Italian keyboard driver to map the
Lira codepoint, rather than the Sterling.
My recommendation for SECS and even VSECS is that _all_ encoded
European currency signs be included. The debate then becomes which
non-European currency signs should be included. US Dollar? Yen?
John Hudson, Type Director
Really? I know that there were some typewriters that knew the ij (mostly
only lowercase), but not of any computer. But since no Windows/DOS/Unix/OS2
is supporting it, it's pretty useless to have such a computer keyboard
anyway. (And probably they never will--I've got an official Dutch IBM
keyboard, but no IJ or florin/guilder.) Sob, snivel :(
I guess you can say I'm a little hooked on that letter, ah well...
Ruben Prins
WIJ EISEN IJ'S!
[vrij naar Annie M.G. Schmidt; for non-Dutch/Flemmish readers:
we "demand IJs", but "ijs" means "icecream" as well]
That's why we prefer backslashed O's for zeros (quite common practice,
although not global, unfortunately)
Tor
On Mon, 17 Aug 1998 18:36:48 +0100, p...@sktb.demon.co.uk (Paul L.
Allen) wrote:
>> In article <35D84110...@cl.cam.ac.uk> Markus Kuhn <Marku...@cl.cam.ac.uk> writes:
>> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
>Certainly the Dutch Guilder/Gulden/Florin symbol.
It should be noted that, while U+0192 has become a de facto standard
codepoint for the florin sign (i.e. Dutch guilder), the lowercase
letter f with hook is actually a nasalised consonant in orthographies
for a number of African languages. As such, it is paired with U+0191,
the uppercase letter F with hook.
This is a typical Unicode implementation mess: reliance on a single
codepoint to map two semantically distinct characters. In this case,
the characters are not only semantically distinct but graphically
incompatible. The florin sign is traditionally slanted to the right,
in the manner of an italic or script f. Of course, whether the hooked
f as used in African languages is slanted depends on whether the font
is roman or italic.
I'm currently working on a font which supports some 300 African
languages and also includes a florin sign. I have to include two
distinct glyphs, only one of which I can encode. Because U+0192 is
used as a de facto codepoint for the florin sign, I've opted to leave
the African hooked f unencoded. This means that this important letter
will have to be accessed through a complicated 'African alternate
forms' glyph substitution routine.
That's no excuse! All abbreviations you mentioned are ugly compared to the
elegant É. But I know there's no point in a discussion, since most people
don't care or don't know where to find the symbol.
It's a bit like omitting the trema (umlaut) on vowels, neither right nor
easthetically pleasing. Well, you can differ in opinion about "right" for É,
since on coins of, take, one guilder it says 1G and not É1 (or any of the
above).
Ruben Prins
> On Wed, 19 Aug 1998 18:51:31 +0100, p...@sktb.demon.co.uk (Paul L.
> Allen) wrote:
>
> >And is this something you have been *told* by *experienced Italian
> >typographers or is this merely observation? It's very difficult to spot
> >any usage of the two-stroke pound symbol in the UK these days because
> >it appears to have gone out of fashion. Unless you were specifically
> >told otherwise, the same argument may apply to the Italian Lira.
>
> Really, this is a moot point.
Not entirely.
> Regardless of appearances, the Italian Lira sign and the British Sterling
> sign are semantically dictinct and are separately encoded in Unicode.
Uh-huh. I know that Unicode started out by assigning different code-points
to identical glyphs with different semantics, but they abandoned that
particular idea when they started dealing with Japanese/Chinese/Korean/etc.
If you *were* right that the Lira and Pound Sterling should have different
code-points even if the glyphs were identical because they have different
semantics, then we would need separate characters for each of the following:
Pound: Cypriot, Egyption, Falkland, Gibraltar, Lebanese, St Helena,
Sterling [UK], Sudanese, Syrian.
Punt [name is "Irishization" of pound]: Irish.
Colon: Costa Rican, El Salvadorian.
Dollar: Australian, Bahamian, Babados, Belize, Bermudian, Brunei,
Canadian, Cayman Islands, East Caribbean, Fijian, Guyanan, Hong Kong,
Jamaican, Liberian, Malaysian, Namibian, New Zealand, Singapore,
Solomon Islands, Taiwan, Trinidad and Tobago, United States (who?),
Zimbabwe.
Guilder: Aruban, Dutch, Netherlands Antilles, Surinam
Franc: Central African, Pacific, Belgian, Burkina Faso, Burundi,
Comorian, Djibouti, French, Guinea, Luxembourg, Malagasy, Malian,
Rwandan, Swiss, West African.
Lira: Italian, Maltese, Turkish.
Peseta: Andorran, Spanish.
Rupee: Indian, Mauritius, Nepalese, Pakistani, Seychelles, Sri Lankan.
Won: North Korean, South Korean.
Notes on the above:
1) All of the above are theoretically independent currencies. In
practise some are tied to a 1:1 exchange ratio with a parent currency
(e.g. Falkand Pound is tied to Pound Sterling) and are effectively
the same currency with different banknotes.
2) I don't know how many of those in the list actually use the symbol
that is assigned to a currency of that name in the Unicode table. Most
of those with the Pound use the Pound Sterling symbol; most of those
with the Dollar use the Dollar symbol.
3) There's a Bengali Rupee symbol (and a Bengali Rupee mark) as well
as the "ordinary" Rupee symbol. Perhaps somebody knows why the Bengalis
apparently need a different symbol for the Indian Rupee.
4) There may well be other currency symbols in use around the world
which Unicode has not yet assigned codepoints to - those are the ones
that I could find in the Unicode tables.
So let's at least have a little consistency here. And don't forget that
if Spain splits into separate countries we will need several more Peseta
symbols to give each one their own semantic meaning. Oh, and if Canada
becomes part of the US (as sometimes looks likely) then we will have to
scrap one of the 21 new dollar symbols (that we *must* have, according to
your theory) as no longer meaningful.
> Even if the two glyphs were identical, I would still expect an Italian
> keyboard driver to map the Lira codepoint, rather than the Sterling.
So what does a Maltese or Turkish keyboard driver map to? By your
principle, it is equally wrong for it to map to the Italian Lira as for it
to map to the Pound Sterling symbol. As for all those people in various
countries who just type $ when talking of their currency when that
code-point belongs soleley to the US...
> My recommendation for SECS and even VSECS is that _all_ encoded
> European currency signs be included. The debate then becomes which
> non-European currency signs should be included. US Dollar? Yen?
See above for other currency symbols. Not only do you have to decide
who you include and who you exclude, but also whether or not you give
them a separate code-point when they use the same glyph as a different
currency.
Which brings me back to my original question: I wonder why Unicode bothered
to give the Italian Lira a separate code-point. Let me expand on that: I
wonder why Unicode bothered to give the Italian Lira a separate code-point
but did not do so for the other 11 countries that use the same glyph for
their currency symbol (whether the currency is called pound, punt or lira).
It all seems to hinge upon whether or not the two-stroke form is merely
a matter of current typographical taste in Italy or if, as suggested, the
one-stroke form is truly unacceptable. I wonder that the Maltese and
Turkish think about this one...
--Paul
> Presumably because LATIN SMALL LETTER F WITH HOOK was considered an
> adequate mapping. I believe there is a usage note in the Unicode
> manual saying this is used for the Dutch guilder.
0192 [f] LATIN SMALL LETTER F WITH HOOK
= LATIN SMALL LETTER SCRIPT F
= Florin currency symbol (Dutch)
= function symbol
HTH,
Chris
--
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
There is no need to introduce a special character for that. The
lowercase f is enough for that purpose.
It is really only a stylistic issue whether to set this f in another
font for mathematical equations.
> cc. Marku...@cl.cam.ac.uk
>
>
> On Mon, 17 Aug 1998 18:36:48 +0100, p...@sktb.demon.co.uk (Paul L.
> Allen) wrote:
>
> >> In article <35D84110...@cl.cam.ac.uk> Markus Kuhn <Marku...@cl.cam.ac.uk> writes:
> >> > 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
>
> >Certainly the Dutch Guilder/Gulden/Florin symbol.
>
> It should be noted that, while U+0192 has become a de facto standard
> codepoint for the florin sign (i.e. Dutch guilder), the lowercase
> letter f with hook is actually a nasalised consonant in orthographies
> for a number of African languages. As such, it is paired with U+0191,
> the uppercase letter F with hook.
I appreciate the background which explains why it isn't flagged as a
currency symbol. However, it appears that as far as Unicode are concerned
it is a de jure code-point for the florin sign because their technical
notes say that's what it is.
> This is a typical Unicode implementation mess: reliance on a single
> codepoint to map two semantically distinct characters. In this case,
> the characters are not only semantically distinct but graphically
> incompatible.
I can live with two or more characters with different semantics mapping
to the same glyph. I can live with characters that have slightly differing
national tastes for what is essentially same glyph (national versions of
fonts solve this one). But when there are two distinct uses with
incompatible glyphs then it's wrong to use the same code-point.
--Paul
> Paul L. Allen <p...@sktb.demon.co.uk> schrieb:
> > You're kidding, right? You never came across the notation y = f(x)
> > to denote that y is some function of x?
>
> There is no need to introduce a special character for that. The
> lowercase f is enough for that purpose.
Correction. The lower-case *italic* f is *close enough* that *most* people
*wouldn't notice* the difference. You'll see others argue in the past
that Unicode has assigned different code-points to identical glyphs
with different semantics.
> It is really only a stylistic issue whether to set this f in another
> font for mathematical equations.
Tell that to mathematicians. Tell that to Donald Knuth. Tell that to
all the TeX users who know that maths-italic f is decidedly different
from ordinary italic f, which is itself different from Roman (ordinary)
lower-case f.
--Paul
Probably because the Italians asked them to. A lot of stuff in Unicode
seems to be there because national standards organisations insisted on
its inclusion in ISO/IEC 10646.
Your points about multiplying glyphs are well taken, and I should have
clarified my position. I think that when there are recognisable, even
if inconsistent, preferences of form, it is very awkward to have only
one codepoint available if one is making a font for both markets.
I prefer either dotted 0's, or (preferred) a thinner/more rounded
form.
>
> In article <6-FSZ...@faerber.muc.de>
> claus+...@faerber.muc.de (Claus André Färber) writes:
>
> > Paul L. Allen <p...@sktb.demon.co.uk> schrieb:
> > > You're kidding, right? You never came across the notation y = f(x)
> > > to denote that y is some function of x?
> >
> > There is no need to introduce a special character for that. The
> > lowercase f is enough for that purpose.
>
> Correction. The lower-case *italic* f is *close enough* that *most* people
> *wouldn't notice* the difference. You'll see others argue in the past
> that Unicode has assigned different code-points to identical glyphs
> with different semantics.
It seems to me that you are confusing characters with glyphs. If there
were to be a separate "function character" or "function symbol", it
would be a good idea to include it Unicode. I have never heard of such
a symbol.
A related example is the integral sign, which was originally a long s,
but then evolved to its own symbol.
> > It is really only a stylistic issue whether to set this f in another
> > font for mathematical equations.
>
> Tell that to mathematicians. Tell that to Donald Knuth. Tell that to
> all the TeX users who know that maths-italic f is decidedly different
> from ordinary italic f, which is itself different from Roman (ordinary)
> lower-case f.
But what you mention are indeed stylistic issues. With your
reasoning, it would be nescessary to have a special math version of
every character, since math italic and text italic need not be
identical for any letter, and the semantics are indeed different.
Furthermore, IMHO it would be a mess if there was to be a separate f,
distinct from the math italic f, meaning "the function of".
/Lars
Of course I have seen f(x) = g'(x) etc., but the function was
always represented by an italics letter f, g, etc. and never by
an italics letter f with hook or by any other special symbol that
was not just a letter! I've never seen a special symbol that would
deserve a name such as FUNCTION OF SIGN. Mathematicians use any
latin and greek letter to denote functions and variables, and they
just set them in normal italics font with a kerning that makes
it clear that the symbols are individual entities and not part
of words.
> Correction. The lower-case *italic* f is *close enough* that *most* people
> *wouldn't notice* the difference.
The trouble with typographers is that they constantly think they have
discovered new characters where there actually is no new one. The
mathematical variables and functions are normal italic letters.
The OHM SIGN and KELVIN SIGN are just capital letters omega and
kelvin. The Angstroem sign is just an A WITH RING ABOVE. It is
complete nonsense that separate code points have been
introduced for these. Knuth used only a separate cmmi (math italics)
font for the formula symbols, because this was a convenient hack to
store the math-specific kerning information (variables in formulas
are spaced wider apart than letters in italic words, but they are
still the same characters).
> Tell that to mathematicians. Tell that to Donald Knuth. Tell that to
> all the TeX users who know that maths-italic f is decidedly different
> from ordinary italic f, which is itself different from Roman
> (ordinary) lower-case f.
All of these are latin letters f, just in different font styles
(roman, italic text, italic math). No need to encode separate
characters here. An f stays an f.
The slash through zeros on terminal fonts is usually clipped to
inside the circle, not like with the O WITH STROKE also outside.
There are two more Unicode characters that can be mixed up with the Ø:
DIAMETER SIGN (a circle with slash)
EMPTY SET (a zero with slash)
The trema is _no_ umlaut. A trema indicates that two vowels are
pronounced separatly, i.e. not as a diphtong. A trema consists of to
_dots_. You probably can omit them, as most people know to pronounce the
vowels that way.
An umlaut, however, makes the vowel a different vowel. The first
syllable in Färber is pronounced like "fair", while in Farber it would
be pronounced like "far". The umlaut originally consisted of two
strokes, not dots.
> > I have found another potential origin for the LATIN SMALL LETTER F
> > WITH HOOK: The ISO registered character set number 143 for mathematical
> > symbols (<http://www.cl.cam.ac.uk/~mgk25/ucs/IR-143.pdf>) contains
> > on position 05/13 a symbol called FUNCTION OF SIGN which looks
> > very similar to LATIN SMALL LETTER F WITH HOOK.
> >
> > I think I have a quite decent mathematical background, but I have
> > never seen a FUNCTION OF SIGN used in any math courses that I
> > visited or math book that I read.
>
> You're kidding, right? You never came across the notation y = f(x)
> to denote that y is some function of x?
Ditto y = g(x) , y = h(x) ...
I don't think there is anything particually special about 'f', except it
is the most common choice, just as 'x' and 'y' are the most common choice
for the variables.
--
Stephen Baynes CEng MBCS Stephen...@soton.sc.philips.com
Philips Semiconductors Ltd
Southampton SO15 0DJ +44 (01703) 316431
United Kingdom My views are my own.
Do you use ISO8859-1? Yes if you see © as copyright, ÷ as division and ½ as 1/2.
Now you know how we feel about the ij.
I was only thinking about the german characters. My finnish is rather rusty.
But I agree I was not thinking as global as I should. I made the same mistake
as the commity that say "remove the ij". Sorry.
The Graphical Gnome (r...@ktibv.nl)
Sr. Software Engineer IT Department
-----------------------------------------
The Unofficial Delphi Developers FAQ
http://www.gnomehome.demon.nl/uddf/index.htm
Is it Nlg 60 or Nlg 160.
On my screen the one and l are slightly different, but it is asking for
trouble.
However
> Knuth used only a separate cmmi (math italics)
> font for the formula symbols, because this was a convenient hack to
> store the math-specific kerning information (variables in formulas
> are spaced wider apart than letters in italic words, but they are
> still the same characters).
cmmi does not only differ from the normal text in the kerning. The
glyphs have different shapes, however this is still only a stylistic
difference, an f is still an f.
David
> That's no excuse! All abbreviations you mentioned are ugly compared to the
> elegant É. But I know there's no point in a discussion, since most people
> don't care or don't know where to find the symbol.
Incidentally, I researched about the guilder sign some time ago,
and found out that the authorative answer (by Karel Trebus) is
to use a plain, straight letter f (with no abbreviation dot behind)
for Dutch Guilders.
--J"org Knappen
>
> Ruben Prins
[On should the poind and lira symbols be regarded as separate symbols]
>
> It all seems to hinge upon whether or not the two-stroke form is merely
> a matter of current typographical taste in Italy or if, as suggested, the
> one-stroke form is truly unacceptable. I wonder that the Maltese and
> Turkish think about this one...
>
Well in the teletext character set, turkish lira has its own very different
symbol 'TL'. I have never managed to verify if it is actually used.
On the other hand things like NLG and FF do not.
Which leads to another question, when does a composite symbol become regarded
as symmbol in its own right. 'TM' for tradmark seems to have become one for example.
Why? Why not just use a small 'T' and a small 'M'? I assume for pragmatic reasons, it
was easier to add to character sets a single TM symbol than a whole alphabet
of small letters.
It seems unlikely that Spain could split soon enough (less than 4 years)
to avoid having the advent of the euro make the peseta a moot point.
----------------------------------------------------------------------
Bob Goudreau Data General Corporation
goud...@dg-rtp.dg.com 62 Alexander Drive
+1 919 248 6231 Research Triangle Park, NC 27709, USA
I don't like the dotted one much, but I agree that the thinner/more rounded
form is even better than the backslashed O form (the latter was good back
in the days when screen letters had low pixel resolution, which shouldn't
be a problem anylonger).
Tor
Less than 4 _months_.
The battle to distinguish 0 and O is barely won. I think most typewriters
these days do distinguish them. But ask people to read out the digits
1 0 3, many will say "one oh three" rather than "one zero three". Or look
at the US line 21 subtitling standard which, if I recall correctly, does
not distinguish them.
> Do you use ISO8859-1? Yes if you see © as copyright, ÷ as division and
> ½ as 1/2.
I don't believe that your diagnostic works. These characters appear to
be the same in iso-8859-9 and iso-8859-13, quite apart from the
question of distinguishing from, say, windows-1254.
best regards
: > It seems unlikely that Spain could split soon enough (less than 4 years)
: > to avoid having the advent of the euro make the peseta a moot point.
:
: Less than 4 _months_.
Sorry, but that's incorrect. The peseta will still exist in 4
months, and for several years after that, so any need to write
peseta symbols now will still be valid in 2001. It's not until
after the peseta (and mark, etc.) is completely phased out several
years later (mid-2002, IIRC) that it will become a moot point.
It's a stylistic difference because you need different shapes and kerning
to set maths well. But it's also a *semantic* difference. And there are
some who would argue that if the character has different semantics then
Unicode should assign it a different code-point even if the glyph is
identical. In fact Unicode have done this for a great many characters
where it makes a good deal less sense.
--Paul
> claus+...@faerber.muc.de (=?ISO-8859-1?Q?Claus_Andr=E9_F=E4rber?=) wrote:
>
> : > It seems unlikely that Spain could split soon enough (less than 4 years)
> : > to avoid having the advent of the euro make the peseta a moot point.
> :
> : Less than 4 _months_.
>
> Sorry, but that's incorrect. The peseta will still exist in 4
> months, and for several years after that, so any need to write
> peseta symbols now will still be valid in 2001. It's not until
> after the peseta (and mark, etc.) is completely phased out several
> years later (mid-2002, IIRC) that it will become a moot point.
Long, long, *long* after that for any currency symbol phased out. Or
do you think that there will never be a desire to refer to historical
matters and use historical symbols?
--Paul
The Dutch language has a bit of an image problem; why else is there more
focus on Scandinavian languages while there are more speakers of Dutch in
Europe than of all Scandinavian languages added up? Getverderrie. (Just try
to translate that!)
Ruben Prins
: In article <6rk1se$g...@dg-rtp.dg.com>
Certainly not; such references will still be necessary. But, you
seem to be changing the context. I was speaking solely of the
hypothetical need for a separate *new* Catalan (or Basque, Galician,
etc.) peseta symbol (distinct from the existing Spanish peseta symbol)
which would presumably arise only if Spain split apart politically
*before* euro notes and coins started appearing (thus rendering moot
the need for separate national currencies or currency symbols for
the new nations). Remember, this was a topic that you *yourself*
broached in the article to which I originally responded:
|Paul L. Allen (p...@sktb.demon.co.uk) wrote:
|: So let's at least have a little consistency here. And don't forget that
|: if Spain splits into separate countries we will need several more Peseta
|: symbols to give each one their own semantic meaning.
So I think my point remains valid: the possible time window for
needing new peseta symbols remains quite small, but not as small
as Claus contended. Realistically, I see no possibility at all that
such a thing could happen by 2002.
> p...@sktb.demon.co.uk (Paul L. Allen) wrote:
>
> : In article <6rk1se$g...@dg-rtp.dg.com>
> : goud...@dg-rtp.dg.com (Bob Goudreau) writes:
> :
> : > claus+...@faerber.muc.de (=?ISO-8859-1?Q?Claus_Andr=E9_F=E4rber?=)
> : > wrote:
> : >
> : > : > It seems unlikely that Spain could split soon enough (less than 4
> : > : > years) to avoid having the advent of the euro make the peseta a
> : > : > moot point.
> : > :
> : > : Less than 4 _months_.
> : >
> : > Sorry, but that's incorrect. The peseta will still exist in 4
> : > months, and for several years after that, so any need to write
> : > peseta symbols now will still be valid in 2001. It's not until
> : > after the peseta (and mark, etc.) is completely phased out several
> : > years later (mid-2002, IIRC) that it will become a moot point.
> :
> : Long, long, *long* after that for any currency symbol phased out. Or
> : do you think that there will never be a desire to refer to historical
> : matters and use historical symbols?
>
> Certainly not; such references will still be necessary. But, you
> seem to be changing the context.
No, I was responding to what you said.
> I was speaking solely of the hypothetical need for a separate *new* Catalan
> (or Basque, Galician, etc.) peseta symbol (distinct from the existing
> Spanish peseta symbol) which would presumably arise only if Spain split
> apart politically
That was not clear from what you said:
The peseta will still exist in 4 months, and for several years after
that, so any need to write peseta symbols now will still be valid in
2001.
No mention of different flavours of peseta, just the already existing
one. And just to make sure I didn't misinterpret you, the next
sentence:
It's not until after the peseta (and mark, etc.) is completely phased
out several years later (mid-2002, IIRC)
So here you're talking about existing currencies, not new ones.
that it will become a moot point.
In the context of the rest of the material you quoted, and your own
sentences, that "moot point" appears to refer to the need to represent
any existing pre-Euro currency symbols.
> *before* euro notes and coins started appearing (thus rendering moot
> the need for separate national currencies or currency symbols for
> the new nations). Remember, this was a topic that you *yourself*
> broached in the article to which I originally responded:
Umm, but I brought that point up for those who were insisting that each
country that used a particular currency symbol needed a separate code-point
because it was semantically different. I brought it up to show the
absurdity of their stance. You seemed to be arguing that there would
be no further need for peseta/pound/lira symbols (irrespective of
country bifurcations) after the currencies themselves are phased out
in 2002. This is not the case.
> So I think my point remains valid: the possible time window for
> needing new peseta symbols remains quite small, but not as small
> as Claus contended.
That remains valid, although that didn't appear to be the point you
were making.
> Realistically, I see no possibility at all that such a thing could happen
> by 2002.
Really? Scotland devolves into a separate country shortly. Scots banks
have long issued their own notes which are accepted as legal tender in
Scotland and usually acceptable (but not strictly legal tender) in
England. There is a possibility that they might demand a currency split.
I doubt they would, but stranger things have happend. At the moment, the
devolution plans give Westminster the power over such matters, but who is
to say what might happen 18 months from now when a new Scottish Parliament
wants more say in home affairs? The speed with which the division between
the Germanies dissolved and later the Soviet Union split assunder should
tell you that there are no certainties...
--Paul
As far as I know, there are no notes that are legal tender in Scotland.
[Legal tender has a precise meaning in law, it is a set of rules on
how a payment can be made up and if met the the recipient must accept the
payment. For example there is a limit on how much coper coinage you
can include, and the amount offered must equal (not exceed) the amount
to be paid (so you can't offer more and insist the recipinet gives
change or even keeps the change). Legal tender is distinct from
legally recognized or generally acceptable money.]
--
Stephen Baynes CEng MBCS Stephen...@soton.sc.philips.com
Philips Semiconductors Ltd
Southampton SO15 0DJ +44 (01703) 316431
United Kingdom My views are my own.
> As far as I know, there are no notes that are legal tender in Scotland.
Correct, though what it's got to do with any of these usenet groups
is unclear.
Due to some weird anomaly, Bank of England (only!) ten shilling and one
pound notes were legal tender in Scotland, but they were withdrawn from
circulation at different times in the past. So it's now true that no
banknotes are now "legal tender" in the technical sense in Scotland.
(You presumably realise that historically the Pound Scots was not the
same as the English Pound or Pound Sterling?)
: In article <6sjqdr$a...@dg-rtp.dg.com>
: goud...@dg-rtp.dg.com (Bob Goudreau) writes:
:
: > p...@sktb.demon.co.uk (Paul L. Allen) wrote:
: >
: > : In article <6rk1se$g...@dg-rtp.dg.com>
: > : goud...@dg-rtp.dg.com (Bob Goudreau) writes:
: > :
: > : > claus+...@faerber.muc.de (=?ISO-8859-1?Q?Claus_Andr=E9_F=E4rber?=)
: > : > wrote:
: > : >
: > : > : > It seems unlikely that Spain could split soon enough (less than 4
: > : > : > years) to avoid having the advent of the euro make the peseta a
: > : > : > moot point.
: > : > :
: > : > : Less than 4 _months_.
: > : >
: > : > Sorry, but that's incorrect. The peseta will still exist in 4
: > : > months, and for several years after that, so any need to write
: > : > peseta symbols now will still be valid in 2001. It's not until
: > : > after the peseta (and mark, etc.) is completely phased out several
: > : > years later (mid-2002, IIRC) that it will become a moot point.
: > :
: > : Long, long, *long* after that for any currency symbol phased out. Or
: > : do you think that there will never be a desire to refer to historical
: > : matters and use historical symbols?
: >
: > Certainly not; such references will still be necessary. But, you
: > seem to be changing the context.
:
: No, I was responding to what you said.
... which was in the context that you yourself originated (but which
got snipped in Claus's original reply) regarding a hypothetical
breakup of the Spanish peseta into multiple new pesetas for the
successor states.
: > I was speaking solely of the hypothetical need for a separate *new* Catalan
: > (or Basque, Galician, etc.) peseta symbol (distinct from the existing
: > Spanish peseta symbol) which would presumably arise only if Spain split
: > apart politically
:
: That was not clear from what you said:
:
: The peseta will still exist in 4 months, and for several years after
: that, so any need to write peseta symbols now will still be valid in
: 2001.
:
: No mention of different flavours of peseta, just the already existing
: one.
Again, go back and read what was snipped out of the earlier articles
in the thread.
: And just to make sure I didn't misinterpret you, the next sentence:
:
: It's not until after the peseta (and mark, etc.) is completely phased
: out several years later (mid-2002, IIRC)
:
: So here you're talking about existing currencies, not new ones.
Talking about them, but not in the way you seem to think. The point
was that that notes and coins of the existing currencies will still
be in circulation until 2002 (at which point they will phase out in
favor of the euro), so any derivative "successor" currencies (such as
a putative Catalan peseta, or, say, a Bavarian mark) would only have a
brief opportunity to be created before the whole issue of separate
new currencies became a moot point.
: that it will become a moot point.
:
: In the context of the rest of the material you quoted, and your own
: sentences, that "moot point" appears to refer to the need to represent
: any existing pre-Euro currency symbols.
Nope -- it refers to the need to represent new symbols for "successor"
currencies such as a Catalan peseta.
: > *before* euro notes and coins started appearing (thus rendering moot
: > the need for separate national currencies or currency symbols for
: > the new nations). Remember, this was a topic that you *yourself*
: > broached in the article to which I originally responded:
:
: Umm, but I brought that point up for those who were insisting that each
: country that used a particular currency symbol needed a separate code-point
: because it was semantically different. I brought it up to show the
: absurdity of their stance.
... and my subsequent comment was to show the absurdity of the chance
of the peseta breakup scenario ever happening before the euro renders
the point moot.
: You seemed to be arguing that there would
: be no further need for peseta/pound/lira symbols (irrespective of
: country bifurcations) after the currencies themselves are phased out
: in 2002. This is not the case.
And it was not what I was arguing, so I guess we agree.
: > So I think my point remains valid: the possible time window for
: > needing new peseta symbols remains quite small, but not as small
: > as Claus contended.
:
: That remains valid, although that didn't appear to be the point you
: were making.
My apologies for not making it more clearly.
: > Realistically, I see no possibility at all that such a thing could happen
: > by 2002.
:
: Really? Scotland devolves into a separate country shortly.
Really? Tell me, what foreign policy powers will the new Scottish
Parliament have, exactly? I know that the SNP wants true independence,
but if the powers of the forthcoming Parliament (which hasn't yet even
been elected) are those of a "separate country", then the US has
already devolved into 50 "separate countries"! Our state legislatures
already have far more taxing, spending and legislative powers in their
respective territories than the new Scottish Parliament will have in
Scotland. (What's the level of "Tartan tax" being mooted? -- a bare
few pence per pound, IIRC, easily beaten by many state income and
sales taxes.)
: Scots banks
: have long issued their own notes which are accepted as legal tender in
: Scotland and usually acceptable (but not strictly legal tender) in
: England. There is a possibility that they might demand a currency split.
: I doubt they would, but stranger things have happend. At the moment, the
: devolution plans give Westminster the power over such matters, but who is
: to say what might happen 18 months from now when a new Scottish Parliament
: wants more say in home affairs?
None of which really addresses my point, as the UK is not Spain, and
is not subject to the 2002 introduction of the euro. But I think your
own example disproves your contention about the rapidity required for
the Spanish scenario to happen in the next 46 months. How long has the
Scottish devolution process taken? First, Labour had to promise in its
election manifesto (written in, what, 1995-6?) to hold the referendum.
Then it had to win the general election, pass the legislation enabling
the referendum, and allow a decent interval for the referendum campaign
itself to take place. Even after all that, the new Scottish Parliament
won't actually be seated until, what, sometime next year? All this
for a legislature with powers far short of independence, and look how
long it's taking!
: The speed with which the division between the Germanies dissolved
: and later the Soviet Union split assunder should tell you that there
: are no certainties...
I'm actually quite certain that the current Spanish state is nothing
like the two examples you cited, which were states held together only
by totalitarian force, and which collapsed suddenly when that force
was removed. A sudden breakup of Spain in less than four years is
something I'm quite willing to bet against, if you want to wager
with me :-).
I generally respond to what's in a post. I read far too much news to
memorise every detail of every post and I expire posts very quickly -
usually immediately after reading.
> Again, go back and read what was snipped out of the earlier articles
> in the thread.
Nope. Make your point clearly rather than rely on previous articles
which people may no longer have.
> : And just to make sure I didn't misinterpret you, the next sentence:
> :
> : It's not until after the peseta (and mark, etc.) is completely phased
> : out several years later (mid-2002, IIRC)
> :
> : So here you're talking about existing currencies, not new ones.
>
> Talking about them, but not in the way you seem to think. The point
> was that that notes and coins of the existing currencies will still
> be in circulation until 2002 (at which point they will phase out in
> favor of the euro), so any derivative "successor" currencies (such as
> a putative Catalan peseta, or, say, a Bavarian mark) would only have a
> brief opportunity to be created before the whole issue of separate
> new currencies became a moot point.
But the issue of needing symbols for whatever currencies exist at
that point, whether any of them have been created between now and
then, will continue for far longer.
> : > Realistically, I see no possibility at all that such a thing could happen
> : > by 2002.
> :
> : Really? Scotland devolves into a separate country shortly.
>
> Really? Tell me, what foreign policy powers will the new Scottish
> Parliament have, exactly?
Bugger-all. But it gets to think of itself as somewhat separate.
> I know that the SNP wants true independence,
And so do a lot of the people. By some estimates it will happen very
quickly.
> None of which really addresses my point, as the UK is not Spain, and
> is not subject to the 2002 introduction of the euro. But I think your
> own example disproves your contention about the rapidity required for
> the Spanish scenario to happen in the next 46 months. How long has the
> Scottish devolution process taken? First, Labour had to promise in its
> election manifesto (written in, what, 1995-6?)
If you're starting from first principles, since somebody stole the
Stone of Destiny, long before the Tory referendum on the matter or the
Labour promises.
> to hold the referendum. Then it had to win the general election, pass
> the legislation enabling the referendum, and allow a decent interval for
> the referendum campaign itself to take place.
As political things go, the time between Labour winning the election and
the new parliament coming into being is very fast.
> Even after all that, the new Scottish Parliament won't actually be seated
> until, what, sometime next year?
Yeah.
> All this for a legislature with powers far short of independence, and
> look how long it's taking!
I'm surprised it's going so quickly.
--Paul
>: In article <6rk1se$g...@dg-rtp.dg.com>
>: goud...@dg-rtp.dg.com (Bob Goudreau) writes:
>:> claus+...@faerber.muc.de (=?ISO-8859-1?Q?Claus_Andr=E9_F=E4rber?=)
>:> wrote:
>:>:> It seems unlikely that Spain could split soon enough (less than 4
>:>:> years) to avoid having the advent of the euro make the peseta a moot
>:>:> point.
>:>:
>:>: Less than 4 _months_.
>:>
>:> Sorry, but that's incorrect. The peseta will still exist in 4
>:> months, and for several years after that, so any need to write
>:> peseta symbols now will still be valid in 2001.
Actually, that incorrect either. The Peseta will cease to exist as a
independent currency at 1999-01-01. From then on, it will be just a
subunit of the Euro.
So although the decision would have to be made within 4 months. It does
neither make much sense to introduce new currency symbols for currencies
whose fate is sealed within 3 jears (if there is/was not already a
Peseta symbol) nor does it make sense to introduce new currencies (i.e.
subunits of the Euro) for only a few years (if Spain would split into
multiple countries).
So the timeframe for events to happen that would make a new Peseta
symbol necessary is actually the 4 months.
Sorry, you're still in error...
: The Peseta will cease to exist as a independent currency at
: 1999-01-01. From then on, it will be just a subunit of the Euro.
"Independence" is irrelevant. (By that argument, no one should ever
need to refer to the Argentine Peso, which the Argentine currency board
by law fixes 1-to-1 to the US dollar.) What matters is that, even
in 1999, Spanish pesetas will continue to circulate as legal tender
currency in Spain, and prices will still be marked in pesetas. Indeed,
euro notes and coins won't even appear until years later, so there's
really no alternative to the peseta in the meantime. In the unlikely
(except to Paul Allen) event that Spain breaks up into multiple
independent countries before 2002, even the new countries would be
unable to use the euro as their currency before then, as the European
Central Bank isn't supposed to start issuing the new currency before
that year. If the new nations wanted a non-Spanish currency, they'd
have to either use some other country's, or else issue their own
interim currency (which I suppose they could call "euro" if they wanted
-- it just wouldn't be a "real" euro that other countries would accept.
: So although the decision would have to be made within 4 months. It does
: neither make much sense to introduce new currency symbols for currencies
: whose fate is sealed within 3 jears (if there is/was not already a
: Peseta symbol) nor does it make sense to introduce new currencies (i.e.
: subunits of the Euro) for only a few years (if Spain would split into
: multiple countries).
I quite agree that it would make no sense to introduce a new currency
fated for death in only a few years.
: So the timeframe for events to happen that would make a new Peseta
: symbol necessary is actually the 4 months.
Nope; it's really almost 40 months (between now and Jan. 1, 2002),
since physical euros won't be available until then. The exchange-rate
freeze of early 1999 is not the important date; the issuance of
actual euro coins and notes is what matters.
: > : > Certainly not; such references will still be necessary. But, you
: > : > seem to be changing the context.
: > :
: > : No, I was responding to what you said.
: >
: > ... which was in the context that you yourself originated (but which
: > got snipped in Claus's original reply) regarding a hypothetical
: > breakup of the Spanish peseta into multiple new pesetas for the
: > successor states.
:
: I generally respond to what's in a post. I read far too much news to
: memorise every detail of every post and I expire posts very quickly -
: usually immediately after reading.
:
: > Again, go back and read what was snipped out of the earlier articles
: > in the thread.
:
: Nope. Make your point clearly rather than rely on previous articles
: which people may no longer have.
May I diffidently suggest that if you can't be bothered to pay
attention to threads which you yourself initiate, that you might
want to reconsider raising them in the first place?
: > Talking about them, but not in the way you seem to think. The point
: > was that that notes and coins of the existing currencies will still
: > be in circulation until 2002 (at which point they will phase out in
: > favor of the euro), so any derivative "successor" currencies (such as
: > a putative Catalan peseta, or, say, a Bavarian mark) would only have a
: > brief opportunity to be created before the whole issue of separate
: > new currencies became a moot point.
:
: But the issue of needing symbols for whatever currencies exist at
: that point, whether any of them have been created between now and
: then, will continue for far longer.
I'm not arguing otherwise. The whole point I was making was that
the window of opportunity to create new currencies in Spain "between
now and then" is so small as to make the issue unthinkable. It
jest ain't gonna happen in real life.
: As political things go, the time between Labour winning the election and
: the new parliament coming into being is very fast.
:
: > Even after all that, the new Scottish Parliament won't actually be seated
: > until, what, sometime next year?
:
: Yeah.
:
: > All this for a legislature with powers far short of independence, and
: > look how long it's taking!
:
: I'm surprised it's going so quickly.
And that's my point -- if this multi-year movement to something
not yet close to indepence is what you call "quick", then any breakup
of Spain before 2002 would have to be positively superluminal in
comparison :-).
> In the unlikely (except to Paul Allen) event that Spain breaks up into
> multiple independent countries before 2002
I assume that you have evidence to back up your *lie* that I have ever
said it *likely* that Spain will break up into independent countries
before 2002. If not, I hope you have the grace to apologise.
--Paul
> p...@sktb.demon.co.uk (Paul L. Allen) wrote:
>
> : > : > Certainly not; such references will still be necessary. But, you
> : > : > seem to be changing the context.
> : > :
> : > : No, I was responding to what you said.
> : >
> : > ... which was in the context that you yourself originated (but which
> : > got snipped in Claus's original reply) regarding a hypothetical
> : > breakup of the Spanish peseta into multiple new pesetas for the
> : > successor states.
> :
> : I generally respond to what's in a post. I read far too much news to
> : memorise every detail of every post and I expire posts very quickly -
> : usually immediately after reading.
> :
> : > Again, go back and read what was snipped out of the earlier articles
> : > in the thread.
> :
> : Nope. Make your point clearly rather than rely on previous articles
> : which people may no longer have.
>
> May I diffidently suggest that if you can't be bothered to pay
> attention to threads which you yourself initiate, that you might
> want to reconsider raising them in the first place?
You can suggest whatever you wish. And I shall continue to respond
to the points you make in posts and the matter you include in those
points in support of the points you make. If the matter is not there
or you do not explicitly draw attention to it, I assume you consider it
irrelevant to the point you are making.
Or, to put it simply: I'm not going to read your mind or guess your
intentions. If you say "A" then I respond to "A". If, what you really mean
is "A, bearing in mind B that was mentioned earlier" then say it.
I'd also suggest you don't try shifting the blame for your own mistakes
and omissions onto others.
--Paul
Sigh. Relax, Paul. It was a *joke*. You may want to try switching
to decaf... :-)
> Paul L. Allen (p...@sktb.demon.co.uk) wrote:
> : In article <6sp02c$a...@dg-rtp.dg.com>
> : goud...@dg-rtp.dg.com (Bob Goudreau) writes:
>
> : > In the unlikely (except to Paul Allen) event that Spain breaks up into
> : > multiple independent countries before 2002
>
> : I assume that you have evidence to back up your *lie* that I have ever
> : said it *likely* that Spain will break up into independent countries
> : before 2002. If not, I hope you have the grace to apologise.
>
> Sigh. Relax, Paul. It was a *joke*. You may want to try switching
> to decaf... :-)
Jokes of that form make use of the words "maybe", "perhaps", "allegedly" or
similar circumlocutions to avoid being treated as libellous. On usenet,
smilies are also used. Your response shows that you understand these
conventions when need be, but not apologies. Oh well, I suppose that's
as close as I'll get to an apology from you.
--Paul
My first impression was that you were either too obtuse or too
humorless to get the joke, which is why I subsequently included a
smiley in the followup message. But, after reading your obvious
jape about libel (!), and realizing the irony of being pressed for
an apology by the same sensitive soul who recently called another
comp.std.internat poster a "clueless fuckwit", I must pay
tribute to your keen and subtle sense of humor after all.