I wanted to present the lowercase 'i' but without the top dot in my
web page. If anyone knows the code of it, please teach me. I also
wonder if this i without the dot is used in any writing system. If
yes, what are they?
Thanks in advance for your help.
Fulio
The dotless i is used in Turkish and is at Unicode position U+0131 (305
decimal)
Thanks a lot for the information.
Fulio
Correct. If you wish to check this and other unicode characters, here
is an index to the various code pages.
http://rudhar.com/lingtics/uniclnks.htm and thence
http://unicode.org/charts/PDF/U0100.pdf
Knowing that the character is U+0131 (305 decimal), you can represent
it in a webpage as " ı " (without the quotes " ) or as ı ,
as explained here: http://rudhar.com/sfreview/unigglen.htm .
See also: http://rudhar.com/sfreview/html_en/entities.htm and
http://www.cs.vassar.edu/CES/sgml/ISOlat1
http://www.cs.vassar.edu/CES/sgml/ISOlat2
which mentions:
<!ENTITY inodot SDATA "[inodot]"--=small i without dot-->
and also
<!ENTITY Idot SDATA "[Idot ]"--=capital I, dot above-->
So as an alternative to the hardly readible ı etc. you can also
use ı etc. in html.
This uppercase I with a dot is also used in Turkish. The normal
dotless I is their uppercase version of the Turkish special character
dotless i. That means they also an uppercase version for the dotted i.
They use it in the name Istambul, for example.
--
Ruud Harmsen
http://rudhar.com
> I wanted to present the lowercase 'i' but without the top dot in my
> web page. If anyone knows the code of it, please teach me.
Look at the source text of
http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html
to find
iı
--
I used to believe in reincarnation in a former life.
istanbul
: --
: Ruud Harmsen
: http://rudhar.com
>
> : This uppercase I with a dot is also used in Turkish. The normal
> : dotless I is their uppercase version of the Turkish special character
> : dotless i. That means they also an uppercase version for the dotted i.
> : They use it in the name Istambul, for example.
>
> istanbul
İstanbul.
!stanbul. Can we all play?
Nigel
--
ScriptMaster language resources (Chinese/Modern & Classical Greek/IPA/
Persian/Russian/Turkish):
http://www.elgin.free-online.co.uk
Istanbul was Constantinople
Now it's Istanbul, not Constantinople
Been a long time gone, Oh Constantinople
Now it's Turkish delight on a moonlit night
Every gal in Constantinople
Lives in Istanbul, not Constantinople
So if you've a date in Constantinople
She'll be waiting in Istanbul
(They Might Be Giants)
Well, actually, by Jimmy Kennedy and Nat Simon. Big 1953 hit record by
the Four Lads. But it's nice to see the young folks remember some of
the old songs...
Ross Clark
Thanks, I had a suspicion that the TMBG version was a cover version but
I've never seen it ascribed to anyone else.
Your newsreader failed to correctly interpret directive
Content-Type: text/plain; charset=UTF-8; format=flowed
included in Harlan's post's header.
It's a reminder to get a better newsreader. :-)
pjk
> !stanbul. Can we all play?
إstanbul (same pronunciation, too.)
Beat that!
Marc
إstanbul
No, I think he got it okay. His joke was to use an upside down
exclamation point - like an upside down version of the capital dotted
I used in the previous message.
Marc
Whatever.
It was supposed to be an alif with the hamza at the bottom.
Marc
It looked all right here (but will it now?):
First attempt:
>>>>> إstanbul (same pronunciation, too.)
>>>>>
>>>>> Beat that!
Second attempt:
>>>> إstanbul
>>>>
>>>> (See if that works...)
Third attempt:
>>> I can't believe encoding is screwing up my masterpiece
>>>
>>> إstanbul
--
Trond Engen
- would have expected Åstanbul to be somewhere in the Eastern Norwegian
woodlands
> Organization: http://groups.google.com
> User-Agent: G2/1.0
>
> It was supposed to be an alif with the hamza at the bottom.
Google Groups is severely broken. The message
<news:Pine.GSO.4.44.06080...@s5b004.rrzn.uni-hannover.de>
has charset=ISO-8859-1 and contains therefore only Latin-1
characters. However, stupid Google shows Arabic letters in
http://groups.google.com/group/sci.lang/msg/eb55255e1925350f
--
Solipsists of the world - unite!
> Solipsists of the world - unite!
Yeah, unfortunately, though, you can't use gmail as a newsreader, and
I don't want the hassle of using a totally separate newsreader.
Also, the interface of gmail is good.
If only they'd make all messages unicode...
Marc
On a side note: is it the only case where the capitalization of a
lowercase letter depends on the language?
--
The nice thing about standards is that you have so many to choose
from; furthermore, if you do not like any of them, you can just
wait for next year's model.
Andrew Tanenbaum, _Computer Networks_ (1981), p. 168.
Oliver> On a side note: is it the only case where the
Oliver> capitalization of a lowercase letter depends on the
Oliver> language?
No. There are many other cases.
e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".
--
Lee Sau Dan 李守敦 ~{@nJX6X~}
E-mail: dan...@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
That's true, there are no global 100% reliable rules of capitalization.
Compare the Dutch "ij"/"IJ" with a Czech letter "ch".
"ch" is always capitalized as "Ch".
The capitalization rules for letters with diacritics are also
heavily language dependent.
pjk
> On a side note: is it the only case where the capitalization of a
> lowercase letter depends on the language?
At least the Turkish system seems logical: the key on the keyboard
marked (dotless) "I" produces either a dotless lowercase "i" or a
dotless UC "I". I suppose that if the designers of the modern Turkish
alphabet had been entirely consistent they would have removed the dot
from LC "j": but of course there's no ambiguity in the j/J pair they
retained from other Latin alphabets.
Years ago I had my manual typewriter adapted to cope with Turkish.
The simplest method, suggested by the technician in the workshop, was
to add a key for dotted "i" (LC/UC) & physically remove the now-
redundant LC dot from the existing "I" key with a file. (Note for
geeks: a file is a rasping tool.)
[Test for Google Groups (doomed to failure?): dotless ı dotted İ. My
browser encoding is set to UTF-8; but clever Google may know better.]
> [Test for Google Groups (doomed to failure?): dotless ý dotted Ý. My
> browser encoding is set to UTF-8; but clever Google may know better.]
Yup, failed again!
Nigel
But I saw those right in the parent message. My newsreader Gnus said
the message was "MIME/Ltn-5" (ISO-8859-9) but your follow-up was in the
regular ISO-8859-1/Latin-1.
Here's my test:
ı dotless i
%GÄ° %@ dotted I ... er I guess my font hasn't got that character as I
get two escaped codes here.
Toni
> (Note for
> geeks: a file is a rasping tool.)
Do you have a URL where I could download one of those?
Marc
> No. There are many other cases.
> e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".
That's incorrect, by the way. I+J/i+j is a single letter: IJ/ij
That oughta get through Google Groups, right?
Marc
>On Mar 14, 7:58 pm, LEE Sau Dan <dan...@informatik.uni-freiburg.de>
>wrote:
>
>> No. There are many other cases.
>> e.g. Dutch "ij" --> "IJ", German "帕" -> "SS".
>
>That's incorrect, by the way. I+J/i+j is a single letter: 谷/岫
No, it's not. Or rather: that is heavily disputed, so much so that in
nl.taal you are considered a troll if you even mention the subject.
My stance on the matter is here:
http://rudhar.com/lingtics/nlij_en.htm
No it isn't :-) Sometimes you capitalize it as "CH", e.g. when writing in ALL CAPS.
Do you know that "oficially", according to the codified Slovak orthography
rules, "ch" should be capitalized only as "CH"? Nobody thought about the
issue through all the editions of the Rules, and now that I brought this
forward, "Ch" is going to be put into the next edition. Finally. Maybe. Not
that I care, but there are people who take the Rules as an unquestionable
holy scripture. I usually slap them with the book when they write "Ch" :-)
>
> The capitalization rules for letters with diacritics are also
> heavily language dependent.
And there is also the Greek ς/σ → Σ
--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
It was sent correctly, but Google can't interpret it's own doings!
--
WinErr 008: Erroneous error. Nothing is wrong.
>>>>>> "Oliver" == Oliver Cromm <lispa...@yahoo.de> writes:
>
> Oliver> On a side note: is it the only case where the
> Oliver> capitalization of a lowercase letter depends on the
> Oliver> language?
>
> No. There are many other cases.
> e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".
I didn't ask for any kind of unusual capitalization rule, I am
interested if there is another letter or character than "i" that has two
different capitalizations, depending on which language we're in. The
opposite, one capital letter having two lower case counterparts would be
close enough, but your examples aren't.
--
XML combines all the inefficiency of text-based formats with most of the
unreadability of binary formats. -- Oren Tirosh, comp.lang.python
Whichever way, it's not an example of what I was looking for. It is at
best a case that a letter is capitalized in a context where it's not in
other languages, but not an example for two competing capitalized forms
of the same letter. It will result in a language-dependent
capitalization of a word or phrase, but not of a letter.
--
'Ah yes, we got that keyboard from Small Gods when they threw out their
organ. Unfortunately for complex theological reasons they would only
give us the white keys, so we can only program in C'.
Colin Fine in sci.lang
>* Ruud Harmsen wrote:
>
>> Sat, 15 Mar 2008 07:15:15 -0700 (PDT): Marc <marc....@gmail.com>: in
>> sci.lang:
>>
>>>On Mar 14, 7:58 pm, LEE Sau Dan <dan...@informatik.uni-freiburg.de>
>>>wrote:
>>>
>>>> No. There are many other cases.
>>>> e.g. Dutch "ij" --> "IJ", German "帕" -> "SS".
>>>
>>>That's incorrect, by the way. I+J/i+j is a single letter: 谷/岫
>>
>> No, it's not. Or rather: that is heavily disputed, so much so that in
>> nl.taal you are considered a troll if you even mention the subject.
>>
>> My stance on the matter is here:
>> http://rudhar.com/lingtics/nlij_en.htm
>
>Whichever way, it's not an example of what I was looking for. It is at
>best a case that a letter is capitalized in a context where it's not in
>other languages, but not an example for two competing capitalized forms
>of the same letter. It will result in a language-dependent
>capitalization of a word or phrase, but not of a letter.
I don't understand. Of a letter or two of them, but not just a word or
phrase. This combination ij is frequent in Dutch.
Oh, blast! Of course you're right. I shouldn't have said "always".
The capitalization I considered was the kind used at the beginning
of a sentence or a proper name; or in a heading where only the leading
letters of each word are capitalized.
> Do you know that "oficially", according to the codified Slovak orthography
> rules, "ch" should be capitalized only as "CH"? Nobody thought about the
> issue through all the editions of the Rules, and now that I brought this
> forward, "Ch" is going to be put into the next edition. Finally. Maybe. Not
> that I care, but there are people who take the Rules as an unquestionable
> holy scripture. I usually slap them with the book when they write "Ch" :-)
I don't have a Slovak dictionary. How do Slovak dictionaries treat
letters with diacritics? My elderly Czech-English dictionary sorts
words beginning with palatalized letters into separate sections
under their own headings each immediately after the corresponding
plain letter. That is as one would normally expect. However, for the
sorting purposes the letters inside the words are treated as if they
did not have diacritics at all. The words end up sorted according to
alphabetic value of the letters that follow them.
pjk
Well, in that case, the letter "ch" is good example of what you are
looking for. In Czech, it's normally capitalized as "Ch" and as
Radovan says in Slovak it's capitalized as "CH".
(However, not for much longer :-)
In both languages "ch" is strictly a single letter, not digraph.
pjk
If I understood well, you should write CHlad instead of Chlad (cold)
in the beginning of a sentence?
I would say, visually it looks much better as "Ch"... is it not?
DV
No, but I keep getting spam advertising them.
> On a side note: is it the only case where the capitalization of a
> lowercase letter depends on the language?
Does anyone know what happens when you capitalize entire phrases in
Irish (eg Banc na hÉireann)? My impression is that the LC "h" is
preserved: if so, there's an example for you.
> Sat, 15 Mar 2008 16:58:17 -0400: Oliver Cromm <lispa...@yahoo.de>:
> in sci.lang:
>
>>* Ruud Harmsen wrote:
>>
>>> Sat, 15 Mar 2008 07:15:15 -0700 (PDT): Marc <marc....@gmail.com>: in
>>> sci.lang:
>>>
>>>>On Mar 14, 7:58 pm, LEE Sau Dan <dan...@informatik.uni-freiburg.de>
>>>>wrote:
>>>>
>>>>> No. There are many other cases.
>>>>> e.g. Dutch "ij" --> "IJ", German "帕" -> "SS".
>>>>
>>>>That's incorrect, by the way. I+J/i+j is a single letter: 谷/岫
>>>
>>> No, it's not. Or rather: that is heavily disputed, so much so that in
>>> nl.taal you are considered a troll if you even mention the subject.
>>>
>>> My stance on the matter is here:
>>
>>Whichever way, it's not an example of what I was looking for. It is at
>>best a case that a letter is capitalized in a context where it's not in
>>other languages, but not an example for two competing capitalized forms
>>of the same letter. It will result in a language-dependent
>>capitalization of a word or phrase, but not of a letter.
>
> I don't understand. Of a letter or two of them, but not just a word or
> phrase.
Wow, simple words can be difficult.
When you capitalize "ijsbeer", which letter has a capitalized form that
is different from another language?
--
Oliver C.
...
>
> I don't have a Slovak dictionary. How do Slovak dictionaries treat
> letters with diacritics? My elderly Czech-English dictionary sorts
> words beginning with palatalized letters into separate sections
> under their own headings each immediately after the corresponding
> plain letter. That is as one would normally expect.
ä, č, dz, dž, ch, ô, š, ž have their own sections, the rest have not.
Some dictionaries, however, keep ä, dz, dž and ô inside a, d, o sections. It
does not matter much, since dz and dž are at the end of the section anyway,
and hardly any words start with ä. However, words beginning with ô- come
after those beginning with ož-, which might be confusing.
> However, for the sorting purposes the letters inside the words are treated
> as if they did not have diacritics at all. The words end up sorted
> according to alphabetic value of the letters that follow them.
Collation order is double-keyed: first, you sort the entried disregarding
the acute accent and háček in ďťňľ (but sorting the čšž after the corresponding háčekless
letters). Then you do a second pass and put the letters with acute accent and ďťňľ
after those accentless, if possible. So you'd get something like this:
asa asá ása ásá asb ásb aša ašá ašb ata atá aťa aťá áta atb
Though not everyone adheres strictly to this scheme :-)
I am pretty sure the official (as per technical standards) Czech collation is
almost identical.
I wouldn't be surprised if the official rules of Czech orthography
forgot to mention "Ch" as well :-)
The capitalization is such an obvious and straightforward issue that no one
ever thought about writing it down into the book, and so "Ch" just did not
get there... I was not aware of it either, until I found out about the
Dutch IJ.
(For Oliver: everyone happily uses Ch, unless writing in all caps, when CH is used.
As one would expect.)
For Oliver too:
There is also the Azeri ə capitalized as Ə, and the Nigerian ǝ capitalized
as Ǝ. They however got different codepoints in Unicode (but, I vaguely recall
that the turkish i and non-turkish i got different codepoints in some encoding...
but it was not ISO_8859-9)
>>
>> Do you know that "oficially", according to the codified Slovak orthography
>> rules, "ch" should be capitalized only as "CH"?
>
> If I understood well, you should write CHlad instead of Chlad (cold)
> in the beginning of a sentence?
Yes, but nobody ever writes like this. Indeed, it would be considered a typo
if seen in a text, and a mistake if e.g. a schoolchild was to write in this
way.
>
> I would say, visually it looks much better as "Ch"... is it not?
To me, it looks horrible, but I'd say it is just a matter of being used to.
>The capitalization is such an obvious and straightforward issue that no one
>ever thought about writing it down into the book, and so "Ch" just did not
>get there... I was not aware of it either, until I found out about the
>Dutch IJ.
Klingon!
garabik-news-2005-05> I wouldn't be surprised if the official
garabik-news-2005-05> rules of Czech orthography forgot to mention
garabik-news-2005-05> "Ch" as well :-)
garabik-news-2005-05> The capitalization is such an obvious and
garabik-news-2005-05> straightforward issue that no one ever
garabik-news-2005-05> thought about writing it down into the book,
garabik-news-2005-05> and so "Ch" just did not get there...
No. Like the grammar of a language, the capitalization rules may
appear "simple and straight-forward" to the native speakers and fluent
speakers, but they are tricky enough to trip new learners and computer
algorithms over and over.
Capitalization has been a non-trivial issue for l12n of software.
People building i18n libraries and writing internationalized software
need to understand the issues and pay special attention when coding
their software.
garabik-news-2005-05> I was not aware of it either, until I found
garabik-news-2005-05> out about the Dutch IJ.
And you may not be aware of how compulsory tense marking can be a very
difficult feature for some learners on the other side of the global.
I happen to have Akademická Pravidla C^eského Pravopisu (my capitalization
according to English language rules :-) issued by Academia Praha in 1993.
I have quickly looked through the relevant chapter and it seems they indeed do
not bother to deal with "ch/Ch/CH" explicitely. However, on page 40 while
they deal with proper names they include an example with "ch" and "Ch".
"chrám sv. Víta" is any church of St. Vitus, while "Chrám sv. Víta" is
_the one and only_ church (cathedral) of St. Vitus at Hrad^any in Prague.
pjk
You are missing the point. The Czech and Slovak rulebooks
of orthography are written explicitely for native or fluent speakers.
Great majority of chapters deal with unusual and esoteric
issues which a foreign learner will hardly ever be faced with.
The foreign beginner learners are usually introduced to charts
of capitalized and non-capitalized alphabets in the first lesson.
Apart from many letters with diacritics, several letters are
handwritten differently than equivalent letters in other
languages. In case of "ch" all they are probably told is that
it is a single letter and that it looks like "ch" or "Ch".
>Capitalization has been a non-trivial issue for l12n of software.
>People building i18n libraries and writing internationalized software
>need to understand the issues and pay special attention when coding
>their software.
>
>
> garabik-news-2005-05> I was not aware of it either, until I found
> garabik-news-2005-05> out about the Dutch IJ.
>
>And you may not be aware of how compulsory tense marking can be a very
>difficult feature for some learners on the other side of the global.
Well, that's neither here or there. You may be equally unaware of
how substantially different is Czech and Slovak verb tense
marking in multi-verb sentences from either English or Chinese
language.
You are not the only one who has had hard time learning
to use English tenses. :-)
pjk
Oh, yes, I agree. What I was describing can be found in this Dr. Alois
C^ermák's highly idiosyncratic Cz-E and E-Cz dictionary printed in
Třebíč in 1940. It contains a lot of esoteric Cz dialectal words while
many relatively common words are not included. The English quite
often feels early Victorian. I suspect he based it on some mid 19th
century dictionary.
pjk
P.S.
This is my cut&paste tool: "příliš žluťoučký kůň úpěl ďábelské ódy"
The sentence contains all fifteen Cz letters with diacritics "říšžťčýůňúěďáéó",
each of them only once.
Is there a Slovak equivalent?
Recently I realized that one also needs a capitalized version of "říšžťčýůňúěďáéó"
For example, I can't type "T" or "D" with a hacek, or "U" with a krouzek.
[...]
> For Oliver too:
> There is also the Azeri ə capitalized as Ə, and the Nigerian ǝ
> capitalized
> as Ǝ.
What Nigerian language is that? None of the ones I'm familiar with has
<ǝ>, though of course there's a few hundred languages in Nigeria that
I'm not familiar with.
[...]
John.
> <ǝ>, though of course there's a few hundred languages in Nigeria that
"there's", not "there are"? Is that standard English?
Curiously,
Joachim
Yes, though some pedants don't like it. Both are in common use.
J.
Tue, 01 Apr 2008 17:28:42 +0200: Joachim Pense <sn...@pense-mainz.eu>:
in sci.lang:
>"there's", not "there are"? Is that standard English?
(Deliberately not looking at John's answer yet.)
Both are possible, but "there's" is a bit more informal.
--
Ruud Harmsen
http://rudhar.com/index/whatsnew.htm