Look for code of a special character

Fulio Open

unread,

Mar 12, 2008, 10:50:46 PM3/12/08

to

Hello,

I wanted to present the lowercase 'i' but without the top dot in my
web page. If anyone knows the code of it, please teach me. I also
wonder if this i without the dot is used in any writing system. If
yes, what are they?

Thanks in advance for your help.

Fulio

Harlan Messinger

unread,

Mar 12, 2008, 11:17:28 PM3/12/08

to

Fulio Open wrote:
> Hello,
>
> I wanted to present the lowercase 'i' but without the top dot in my
> web page. If anyone knows the code of it, please teach me. I also
> wonder if this i without the dot is used in any writing system. If
> yes, what are they?

The dotless i is used in Turkish and is at Unicode position U+0131 (305
decimal)

flz

unread,

Mar 13, 2008, 12:33:18 AM3/13/08

to

On Mar 13, 11:17 am, Harlan Messinger

Thanks a lot for the information.

Fulio

Ruud Harmsen

unread,

Mar 13, 2008, 4:18:44 AM3/13/08

to

Wed, 12 Mar 2008 23:17:28 -0400: Harlan Messinger
<hmessinger...@comcast.net>: in sci.lang:

Correct. If you wish to check this and other unicode characters, here
is an index to the various code pages.
http://rudhar.com/lingtics/uniclnks.htm and thence
http://unicode.org/charts/PDF/U0100.pdf

Knowing that the character is U+0131 (305 decimal), you can represent
it in a webpage as " ı " (without the quotes " ) or as &#305 ,
as explained here: http://rudhar.com/sfreview/unigglen.htm .

See also: http://rudhar.com/sfreview/html_en/entities.htm and
http://www.cs.vassar.edu/CES/sgml/ISOlat1
http://www.cs.vassar.edu/CES/sgml/ISOlat2
which mentions:
<!ENTITY inodot SDATA "[inodot]"--=small i without dot-->
and also
<!ENTITY Idot SDATA "[Idot ]"--=capital I, dot above-->

So as an alternative to the hardly readible &#305 etc. you can also
use &inodot; etc. in html.

This uppercase I with a dot is also used in Turkish. The normal
dotless I is their uppercase version of the Turkish special character
dotless i. That means they also an uppercase version for the dotted i.
They use it in the name Istambul, for example.

--
Ruud Harmsen
http://rudhar.com

Andreas Prilop

unread,

Mar 13, 2008, 12:11:13 PM3/13/08

to

On Wed, 12 Mar 2008, Fulio Open wrote:

> I wanted to present the lowercase 'i' but without the top dot in my
> web page. If anyone knows the code of it, please teach me.

Look at the source text of
http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html
to find
iı

--
I used to believe in reincarnation in a former life.

Yusuf B Gursey

unread,

Mar 13, 2008, 1:22:27 PM3/13/08

to

In sci.lang Ruud Harmsen <realema...@rudhar.com.invalid> wrote in <fooht3dpcggbh5nd7...@4ax.com>:
: Wed, 12 Mar 2008 23:17:28 -0400: Harlan Messinger
: <hmessinger...@comcast.net>: in sci.lang:

istanbul

: --
: Ruud Harmsen
: http://rudhar.com

Harlan Messinger

unread,

Mar 13, 2008, 2:58:07 PM3/13/08

to

Yusuf B Gursey wrote:
> In sci.lang Ruud Harmsen <realema...@rudhar.com.invalid> wrote in <fooht3dpcggbh5nd7...@4ax.com>:

>

> : This uppercase I with a dot is also used in Turkish. The normal
> : dotless I is their uppercase version of the Turkish special character
> : dotless i. That means they also an uppercase version for the dotted i.
> : They use it in the name Istambul, for example.
>
> istanbul

İstanbul.

Nigel Greenwood

unread,

Mar 13, 2008, 4:32:57 PM3/13/08

to

On Mar 13, 6:58 pm, Harlan Messinger

<hmessinger.removet...@comcast.net> wrote:
> Yusuf B Gursey wrote:

> > In sci.lang Ruud Harmsen <realemailons...@rudhar.com.invalid> wrote in <fooht3dpcggbh5nd79fpb4iofaflt5b...@4ax.com>:

>
> > : This uppercase I with a dot is also used in Turkish. The normal
> > : dotless I is their uppercase version of the Turkish special character
> > : dotless i. That means they also an uppercase version for the dotted i.
> > : They use it in the name Istambul, for example.
>
> > istanbul
>
> İstanbul.

!stanbul. Can we all play?

Nigel

--
ScriptMaster language resources (Chinese/Modern & Classical Greek/IPA/
Persian/Russian/Turkish):
http://www.elgin.free-online.co.uk

Harlan Messinger

unread,

Mar 13, 2008, 4:45:35 PM3/13/08

to

Nigel Greenwood wrote:
> On Mar 13, 6:58 pm, Harlan Messinger
> <hmessinger.removet...@comcast.net> wrote:
>> Yusuf B Gursey wrote:
>>> In sci.lang Ruud Harmsen <realemailons...@rudhar.com.invalid> wrote in <fooht3dpcggbh5nd79fpb4iofaflt5b...@4ax.com>:
>>> : This uppercase I with a dot is also used in Turkish. The normal
>>> : dotless I is their uppercase version of the Turkish special character
>>> : dotless i. That means they also an uppercase version for the dotted i.
>>> : They use it in the name Istambul, for example.
>>> istanbul
>> İstanbul.
>
> !stanbul. Can we all play?

Istanbul was Constantinople
Now it's Istanbul, not Constantinople
Been a long time gone, Oh Constantinople
Now it's Turkish delight on a moonlit night
Every gal in Constantinople
Lives in Istanbul, not Constantinople
So if you've a date in Constantinople
She'll be waiting in Istanbul

(They Might Be Giants)

benl...@ihug.co.nz

unread,

Mar 13, 2008, 5:23:36 PM3/13/08

to

On Mar 14, 9:45 am, Harlan Messinger

<hmessinger.removet...@comcast.net> wrote:
> Nigel Greenwood wrote:
> > On Mar 13, 6:58 pm, Harlan Messinger
> > <hmessinger.removet...@comcast.net> wrote:
> >> Yusuf B Gursey wrote:
> >>> In sci.lang Ruud Harmsen <realemailons...@rudhar.com.invalid> wrote in <fooht3dpcggbh5nd79fpb4iofaflt5b...@4ax.com>:
> >>> : This uppercase I with a dot is also used in Turkish. The normal
> >>> : dotless I is their uppercase version of the Turkish special character
> >>> : dotless i. That means they also an uppercase version for the dotted i.
> >>> : They use it in the name Istambul, for example.
> >>> istanbul

> >> Ýstanbul.

>
> > !stanbul. Can we all play?
>
> Istanbul was Constantinople
> Now it's Istanbul, not Constantinople
> Been a long time gone, Oh Constantinople
> Now it's Turkish delight on a moonlit night
> Every gal in Constantinople
> Lives in Istanbul, not Constantinople
> So if you've a date in Constantinople
> She'll be waiting in Istanbul
>
> (They Might Be Giants)

Well, actually, by Jimmy Kennedy and Nat Simon. Big 1953 hit record by
the Four Lads. But it's nice to see the young folks remember some of
the old songs...

Ross Clark

Harlan Messinger

unread,

Mar 13, 2008, 5:27:06 PM3/13/08

to

benl...@ihug.co.nz wrote:
> On Mar 14, 9:45 am, Harlan Messinger
> <hmessinger.removet...@comcast.net> wrote:
>> Nigel Greenwood wrote:
>>> On Mar 13, 6:58 pm, Harlan Messinger
>>> <hmessinger.removet...@comcast.net> wrote:
>>>> Yusuf B Gursey wrote:
>>>>> In sci.lang Ruud Harmsen <realemailons...@rudhar.com.invalid> wrote in <fooht3dpcggbh5nd79fpb4iofaflt5b...@4ax.com>:
>>>>> : This uppercase I with a dot is also used in Turkish. The normal
>>>>> : dotless I is their uppercase version of the Turkish special character
>>>>> : dotless i. That means they also an uppercase version for the dotted i.
>>>>> : They use it in the name Istambul, for example.
>>>>> istanbul

>>>> İstanbul.

>>> !stanbul. Can we all play?
>> Istanbul was Constantinople
>> Now it's Istanbul, not Constantinople
>> Been a long time gone, Oh Constantinople
>> Now it's Turkish delight on a moonlit night
>> Every gal in Constantinople
>> Lives in Istanbul, not Constantinople
>> So if you've a date in Constantinople
>> She'll be waiting in Istanbul
>>
>> (They Might Be Giants)
>
> Well, actually, by Jimmy Kennedy and Nat Simon. Big 1953 hit record by
> the Four Lads. But it's nice to see the young folks remember some of
> the old songs...

Thanks, I had a suspicion that the TMBG version was a cover version but
I've never seen it ascribed to anyone else.

Paul J Kriha

unread,

Mar 14, 2008, 12:28:57 AM3/14/08

to

"Nigel Greenwood" <ndsg...@yahoo.co.uk> wrote in message
news:0955499a-f281-41c1...@i29g2000prf.googlegroups.com...

>On Mar 13, 6:58 pm, Harlan Messinger
><hmessinger.removet...@comcast.net> wrote:
>> Yusuf B Gursey wrote:
>> > In sci.lang Ruud Harmsen <realemailons...@rudhar.com.invalid> wrote in
><fooht3dpcggbh5nd79fpb4iofaflt5b...@4ax.com>:
>>
>> > : This uppercase I with a dot is also used in Turkish. The normal
>> > : dotless I is their uppercase version of the Turkish special character
>> > : dotless i. That means they also an uppercase version for the dotted i.
>> > : They use it in the name Istambul, for example.
>>
>> > istanbul
>>
>> İstanbul.
>
>!stanbul. Can we all play?
>
>Nigel

Your newsreader failed to correctly interpret directive
Content-Type: text/plain; charset=UTF-8; format=flowed
included in Harlan's post's header.

It's a reminder to get a better newsreader. :-)

pjk

Marc

unread,

Mar 14, 2008, 10:02:43 AM3/14/08

to

On Mar 13, 3:32 pm, Nigel Greenwood <ndsg_m...@yahoo.co.uk> wrote:

> !stanbul. Can we all play?

إstanbul (same pronunciation, too.)

Beat that!

Marc

unread,

Mar 14, 2008, 10:04:24 AM3/14/08

to

On Mar 14, 9:02 am, Marc <marc.ad...@gmail.com> wrote:

إstanbul

(See if that works...)

Marc

unread,

Mar 14, 2008, 10:05:25 AM3/14/08

to

I can't believe encoding is screwing up my masterpiece

إstanbul

Marc

unread,

Mar 14, 2008, 10:07:02 AM3/14/08

to

On Mar 13, 11:28 pm, "Paul J Kriha"

> Your newsreader failed to correctly interpret directive
> Content-Type: text/plain; charset=UTF-8; format=flowed
> included in Harlan's post's header.
>
> It's a reminder to get a better newsreader. :-)

No, I think he got it okay. His joke was to use an upside down
exclamation point - like an upside down version of the capital dotted
I used in the previous message.

Marc

unread,

Mar 14, 2008, 10:07:48 AM3/14/08

to

On Mar 14, 9:05 am, Marc <marc.ad...@gmail.com> wrote:
> I can't believe encoding is screwing up my masterpiece
>

> Åstanbul

Whatever.

It was supposed to be an alif with the hamza at the bottom.

Marc

Trond Engen

unread,

Mar 14, 2008, 10:30:37 AM3/14/08

to

Marc skreiv:

It looked all right here (but will it now?):

First attempt:

>>>>> إstanbul (same pronunciation, too.)
>>>>>
>>>>> Beat that!

Second attempt:

>>>> إstanbul
>>>>
>>>> (See if that works...)

Third attempt:

>>> I can't believe encoding is screwing up my masterpiece
>>>

>>> إstanbul

--
Trond Engen
- would have expected Åstanbul to be somewhere in the Eastern Norwegian
woodlands

Andreas Prilop

unread,

Mar 14, 2008, 11:56:25 AM3/14/08

to

On Fri, 14 Mar 2008, Marc wrote:

> Organization: http://groups.google.com
> User-Agent: G2/1.0

>
> It was supposed to be an alif with the hamza at the bottom.

Google Groups is severely broken. The message
<news:Pine.GSO.4.44.06080...@s5b004.rrzn.uni-hannover.de>
has charset=ISO-8859-1 and contains therefore only Latin-1
characters. However, stupid Google shows Arabic letters in
http://groups.google.com/group/sci.lang/msg/eb55255e1925350f

--
Solipsists of the world - unite!

Marc

unread,

Mar 14, 2008, 12:59:31 PM3/14/08

to

On Mar 14, 10:56 am, Andreas Prilop <aprilop2...@trashmail.net> wrote:

> Solipsists of the world - unite!

Yeah, unfortunately, though, you can't use gmail as a newsreader, and
I don't want the hassle of using a totally separate newsreader.

Also, the interface of gmail is good.

If only they'd make all messages unicode...

Marc

Oliver Cromm

unread,

Mar 14, 2008, 7:27:33 PM3/14/08

to

On a side note: is it the only case where the capitalization of a
lowercase letter depends on the language?

--
The nice thing about standards is that you have so many to choose
from; furthermore, if you do not like any of them, you can just
wait for next year's model.
Andrew Tanenbaum, _Computer Networks_ (1981), p. 168.

LEE Sau Dan

unread,

Mar 14, 2008, 8:58:33 PM3/14/08

to

>>>>> "Oliver" == Oliver Cromm <lispa...@yahoo.de> writes:

Oliver> On a side note: is it the only case where the
Oliver> capitalization of a lowercase letter depends on the
Oliver> language?

No. There are many other cases.
e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".

--
Lee Sau Dan 李守敦 ~{@nJX6X~}

E-mail: dan...@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

Paul J Kriha

unread,

Mar 15, 2008, 1:44:56 AM3/15/08

to

"LEE Sau Dan" <dan...@informatik.uni-freiburg.de> wrote in message
news:87y78kv...@informatik.uni-freiburg.de...

> >>>>> "Oliver" == Oliver Cromm <lispa...@yahoo.de> writes:
>
> Oliver> On a side note: is it the only case where the
> Oliver> capitalization of a lowercase letter depends on the
> Oliver> language?
>
> No. There are many other cases.
> e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".

That's true, there are no global 100% reliable rules of capitalization.

Compare the Dutch "ij"/"IJ" with a Czech letter "ch".
"ch" is always capitalized as "Ch".

The capitalization rules for letters with diacritics are also
heavily language dependent.

pjk

Nigel Greenwood

unread,

Mar 15, 2008, 7:36:42 AM3/15/08

to

On Mar 14, 11:27 pm, Oliver Cromm <lispamat...@yahoo.de> wrote:

> On a side note: is it the only case where the capitalization of a
> lowercase letter depends on the language?

At least the Turkish system seems logical: the key on the keyboard
marked (dotless) "I" produces either a dotless lowercase "i" or a
dotless UC "I". I suppose that if the designers of the modern Turkish
alphabet had been entirely consistent they would have removed the dot
from LC "j": but of course there's no ambiguity in the j/J pair they
retained from other Latin alphabets.

Years ago I had my manual typewriter adapted to cope with Turkish.
The simplest method, suggested by the technician in the workshop, was
to add a key for dotted "i" (LC/UC) & physically remove the now-
redundant LC dot from the existing "I" key with a file. (Note for
geeks: a file is a rasping tool.)

[Test for Google Groups (doomed to failure?): dotless ı dotted İ. My
browser encoding is set to UTF-8; but clever Google may know better.]

Nigel Greenwood

unread,

Mar 15, 2008, 7:38:30 AM3/15/08

to

On Mar 15, 11:36 am, Nigel Greenwood <ndsg_m...@yahoo.co.uk> wrote:

> [Test for Google Groups (doomed to failure?): dotless ý dotted Ý. My

> browser encoding is set to UTF-8; but clever Google may know better.]

Yup, failed again!

Nigel

Toni Keskitalo

unread,

Mar 15, 2008, 9:08:03 AM3/15/08

to

Nigel Greenwood <ndsg...@yahoo.co.uk> writes:

But I saw those right in the parent message. My newsreader Gnus said
the message was "MIME/Ltn-5" (ISO-8859-9) but your follow-up was in the
regular ISO-8859-1/Latin-1.

Here's my test:
ı dotless i
%GÄ° %@ dotted I ... er I guess my font hasn't got that character as I
get two escaped codes here.

Toni

Marc

unread,

Mar 15, 2008, 10:12:50 AM3/15/08

to

On Mar 15, 6:36 am, Nigel Greenwood <ndsg_m...@yahoo.co.uk> wrote:

> (Note for
> geeks: a file is a rasping tool.)

Do you have a URL where I could download one of those?

Marc

unread,

Mar 15, 2008, 10:15:15 AM3/15/08

to

On Mar 14, 7:58 pm, LEE Sau Dan <dan...@informatik.uni-freiburg.de>
wrote:

> No. There are many other cases.
> e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".

That's incorrect, by the way. I+J/i+j is a single letter: Ĳ/ĳ

That oughta get through Google Groups, right?

Marc

Ruud Harmsen

unread,

Mar 15, 2008, 10:29:39 AM3/15/08

to

Sat, 15 Mar 2008 07:15:15 -0700 (PDT): Marc <marc....@gmail.com>: in
sci.lang:

>On Mar 14, 7:58 pm, LEE Sau Dan <dan...@informatik.uni-freiburg.de>
>wrote:
>
>> No. There are many other cases.

>> e.g. Dutch "ij" --> "IJ", German "帕" -> "SS".
>
>That's incorrect, by the way. I+J/i+j is a single letter: 谷/岫

No, it's not. Or rather: that is heavily disputed, so much so that in
nl.taal you are considered a troll if you even mention the subject.

My stance on the matter is here:
http://rudhar.com/lingtics/nlij_en.htm

garabik-ne...@kassiopeia.juls.savba.sk

unread,

Mar 15, 2008, 10:36:05 AM3/15/08

to

Paul J Kriha <paul.nos...@paradise.net.nz> wrote:
> "LEE Sau Dan" <dan...@informatik.uni-freiburg.de> wrote in message
> news:87y78kv...@informatik.uni-freiburg.de...
>> >>>>> "Oliver" == Oliver Cromm <lispa...@yahoo.de> writes:
>>
>> Oliver> On a side note: is it the only case where the
>> Oliver> capitalization of a lowercase letter depends on the
>> Oliver> language?
>>
>> No. There are many other cases.
>> e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".
>
> That's true, there are no global 100% reliable rules of capitalization.
>
> Compare the Dutch "ij"/"IJ" with a Czech letter "ch".
> "ch" is always capitalized as "Ch".

No it isn't :-) Sometimes you capitalize it as "CH", e.g. when writing in ALL CAPS.

Do you know that "oficially", according to the codified Slovak orthography
rules, "ch" should be capitalized only as "CH"? Nobody thought about the
issue through all the editions of the Rules, and now that I brought this
forward, "Ch" is going to be put into the next edition. Finally. Maybe. Not
that I care, but there are people who take the Rules as an unquestionable
holy scripture. I usually slap them with the book when they write "Ch" :-)

>
> The capitalization rules for letters with diacritics are also
> heavily language dependent.

And there is also the Greek ς/σ → Σ

--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

Oliver Cromm

unread,

Mar 15, 2008, 4:38:14 PM3/15/08

to

* Nigel Greenwood wrote:

It was sent correctly, but Google can't interpret it's own doings!

--
WinErr 008: Erroneous error. Nothing is wrong.

Oliver Cromm

unread,

Mar 15, 2008, 4:52:35 PM3/15/08

to

* /r LEE Sau Dan wrote:

>>>>>> "Oliver" == Oliver Cromm <lispa...@yahoo.de> writes:
>
> Oliver> On a side note: is it the only case where the
> Oliver> capitalization of a lowercase letter depends on the
> Oliver> language?
>
> No. There are many other cases.
> e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".

I didn't ask for any kind of unusual capitalization rule, I am
interested if there is another letter or character than "i" that has two
different capitalizations, depending on which language we're in. The
opposite, one capital letter having two lower case counterparts would be
close enough, but your examples aren't.

--
XML combines all the inefficiency of text-based formats with most of the
unreadability of binary formats. -- Oren Tirosh, comp.lang.python

Oliver Cromm

unread,

Mar 15, 2008, 4:58:17 PM3/15/08

to

* Ruud Harmsen wrote:

Whichever way, it's not an example of what I was looking for. It is at
best a case that a letter is capitalized in a context where it's not in
other languages, but not an example for two competing capitalized forms
of the same letter. It will result in a language-dependent
capitalization of a word or phrase, but not of a letter.

--
'Ah yes, we got that keyboard from Small Gods when they threw out their
organ. Unfortunately for complex theological reasons they would only
give us the white keys, so we can only program in C'.
Colin Fine in sci.lang

Ruud Harmsen

unread,

Mar 15, 2008, 5:56:13 PM3/15/08

to

Sat, 15 Mar 2008 16:58:17 -0400: Oliver Cromm <lispa...@yahoo.de>:
in sci.lang:

>* Ruud Harmsen wrote:
>
>> Sat, 15 Mar 2008 07:15:15 -0700 (PDT): Marc <marc....@gmail.com>: in
>> sci.lang:
>>
>>>On Mar 14, 7:58 pm, LEE Sau Dan <dan...@informatik.uni-freiburg.de>
>>>wrote:
>>>
>>>> No. There are many other cases.
>>>> e.g. Dutch "ij" --> "IJ", German "帕" -> "SS".
>>>
>>>That's incorrect, by the way. I+J/i+j is a single letter: 谷/岫
>>
>> No, it's not. Or rather: that is heavily disputed, so much so that in
>> nl.taal you are considered a troll if you even mention the subject.
>>
>> My stance on the matter is here:
>> http://rudhar.com/lingtics/nlij_en.htm
>
>Whichever way, it's not an example of what I was looking for. It is at
>best a case that a letter is capitalized in a context where it's not in
>other languages, but not an example for two competing capitalized forms
>of the same letter. It will result in a language-dependent
>capitalization of a word or phrase, but not of a letter.

I don't understand. Of a letter or two of them, but not just a word or
phrase. This combination ij is frequent in Dutch.

Paul J Kriha

unread,

Mar 15, 2008, 11:15:18 PM3/15/08

to

<garabik-ne...@kassiopeia.juls.savba.sk> wrote in message
news:frgmsl$gqe$1...@ns.felk.cvut.cz...

> Paul J Kriha <paul.nos...@paradise.net.nz> wrote:
> > "LEE Sau Dan" <dan...@informatik.uni-freiburg.de> wrote in message
> > news:87y78kv...@informatik.uni-freiburg.de...
> >> >>>>> "Oliver" == Oliver Cromm <lispa...@yahoo.de> writes:
> >>
> >> Oliver> On a side note: is it the only case where the
> >> Oliver> capitalization of a lowercase letter depends on the
> >> Oliver> language?
> >>
> >> No. There are many other cases.
> >> e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".
> >
> > That's true, there are no global 100% reliable rules of capitalization.
> >
> > Compare the Dutch "ij"/"IJ" with a Czech letter "ch".
> > "ch" is always capitalized as "Ch".
>
> No it isn't :-) Sometimes you capitalize it as "CH", e.g. when writing in ALL CAPS.

Oh, blast! Of course you're right. I shouldn't have said "always".
The capitalization I considered was the kind used at the beginning
of a sentence or a proper name; or in a heading where only the leading
letters of each word are capitalized.

> Do you know that "oficially", according to the codified Slovak orthography
> rules, "ch" should be capitalized only as "CH"? Nobody thought about the
> issue through all the editions of the Rules, and now that I brought this
> forward, "Ch" is going to be put into the next edition. Finally. Maybe. Not
> that I care, but there are people who take the Rules as an unquestionable
> holy scripture. I usually slap them with the book when they write "Ch" :-)

I don't have a Slovak dictionary. How do Slovak dictionaries treat
letters with diacritics? My elderly Czech-English dictionary sorts
words beginning with palatalized letters into separate sections
under their own headings each immediately after the corresponding
plain letter. That is as one would normally expect. However, for the
sorting purposes the letters inside the words are treated as if they
did not have diacritics at all. The words end up sorted according to
alphabetic value of the letters that follow them.

pjk

Paul J Kriha

unread,

Mar 15, 2008, 11:22:52 PM3/15/08

to

"Oliver Cromm" <lispa...@yahoo.de> wrote in message
news:x6vogekt297g$.dlg@mid.crommatograph.info...

> * /r LEE Sau Dan wrote:
>
> >>>>>> "Oliver" == Oliver Cromm <lispa...@yahoo.de> writes:
> >
> > Oliver> On a side note: is it the only case where the
> > Oliver> capitalization of a lowercase letter depends on the
> > Oliver> language?
> >
> > No. There are many other cases.
> > e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".
>
> I didn't ask for any kind of unusual capitalization rule, I am
> interested if there is another letter or character than "i" that has two
> different capitalizations, depending on which language we're in. The
> opposite, one capital letter having two lower case counterparts would be
> close enough, but your examples aren't.

Well, in that case, the letter "ch" is good example of what you are
looking for. In Czech, it's normally capitalized as "Ch" and as
Radovan says in Slovak it's capitalized as "CH".
(However, not for much longer :-)

In both languages "ch" is strictly a single letter, not digraph.

pjk

Dušan Vukotić

unread,

Mar 16, 2008, 4:47:23 AM3/16/08

to

On Mar 15, 3:36 pm, garabik-news-2005...@kassiopeia.juls.savba.sk
wrote:

> Paul J Kriha <paul.nospam.kr...@paradise.net.nz> wrote:
>
> > "LEE Sau Dan" <dan...@informatik.uni-freiburg.de> wrote in message
> >news:87y78kv...@informatik.uni-freiburg.de...

> >> >>>>> "Oliver" == Oliver Cromm <lispamat...@yahoo.de> writes:
>
> >> Oliver> On a side note: is it the only case where the
> >> Oliver> capitalization of a lowercase letter depends on the
> >> Oliver> language?
>
> >> No. There are many other cases.
> >> e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".
>
> > That's true, there are no global 100% reliable rules of capitalization.
>
> > Compare the Dutch "ij"/"IJ" with a Czech letter "ch".
> > "ch" is always capitalized as "Ch".
>
> No it isn't :-) Sometimes you capitalize it as "CH", e.g. when writing in ALL CAPS.
>
> Do you know that "oficially", according to the codified Slovak orthography
> rules, "ch" should be capitalized only as "CH"?

If I understood well, you should write CHlad instead of Chlad (cold)
in the beginning of a sentence?

I would say, visually it looks much better as "Ch"... is it not?

DV

Nigel Greenwood

unread,

Mar 16, 2008, 6:23:03 AM3/16/08

to

No, but I keep getting spam advertising them.

Nigel Greenwood

unread,

Mar 16, 2008, 6:32:50 AM3/16/08

to

On Mar 14, 11:27 pm, Oliver Cromm <lispamat...@yahoo.de> wrote:

> On a side note: is it the only case where the capitalization of a
> lowercase letter depends on the language?

Does anyone know what happens when you capitalize entire phrases in
Irish (eg Banc na hÉireann)? My impression is that the LC "h" is
preserved: if so, there's an example for you.

Oliver Cromm

unread,

Mar 16, 2008, 11:47:55 AM3/16/08

to

* /r Ruud Harmsen wrote:

> Sat, 15 Mar 2008 16:58:17 -0400: Oliver Cromm <lispa...@yahoo.de>:
> in sci.lang:
>
>>* Ruud Harmsen wrote:
>>
>>> Sat, 15 Mar 2008 07:15:15 -0700 (PDT): Marc <marc....@gmail.com>: in
>>> sci.lang:
>>>
>>>>On Mar 14, 7:58 pm, LEE Sau Dan <dan...@informatik.uni-freiburg.de>
>>>>wrote:
>>>>
>>>>> No. There are many other cases.
>>>>> e.g. Dutch "ij" --> "IJ", German "帕" -> "SS".
>>>>
>>>>That's incorrect, by the way. I+J/i+j is a single letter: 谷/岫
>>>
>>> No, it's not. Or rather: that is heavily disputed, so much so that in
>>> nl.taal you are considered a troll if you even mention the subject.
>>>
>>> My stance on the matter is here:
>>

>>Whichever way, it's not an example of what I was looking for. It is at
>>best a case that a letter is capitalized in a context where it's not in
>>other languages, but not an example for two competing capitalized forms
>>of the same letter. It will result in a language-dependent
>>capitalization of a word or phrase, but not of a letter.
>
> I don't understand. Of a letter or two of them, but not just a word or
> phrase.

Wow, simple words can be difficult.

When you capitalize "ijsbeer", which letter has a capitalized form that
is different from another language?

--
Oliver C.

garabik-ne...@kassiopeia.juls.savba.sk

unread,

Mar 16, 2008, 1:09:05 PM3/16/08

to

Paul J Kriha <paul.nos...@paradise.net.nz> wrote:

...

>
> I don't have a Slovak dictionary. How do Slovak dictionaries treat
> letters with diacritics? My elderly Czech-English dictionary sorts
> words beginning with palatalized letters into separate sections
> under their own headings each immediately after the corresponding
> plain letter. That is as one would normally expect.

ä, č, dz, dž, ch, ô, š, ž have their own sections, the rest have not.
Some dictionaries, however, keep ä, dz, dž and ô inside a, d, o sections. It
does not matter much, since dz and dž are at the end of the section anyway,
and hardly any words start with ä. However, words beginning with ô- come
after those beginning with ož-, which might be confusing.

> However, for the sorting purposes the letters inside the words are treated
> as if they did not have diacritics at all. The words end up sorted
> according to alphabetic value of the letters that follow them.

Collation order is double-keyed: first, you sort the entried disregarding
the acute accent and háček in ďťňľ (but sorting the čšž after the corresponding háčekless
letters). Then you do a second pass and put the letters with acute accent and ďťňľ
after those accentless, if possible. So you'd get something like this:
asa asá ása ásá asb ásb aša ašá ašb ata atá aťa aťá áta atb
Though not everyone adheres strictly to this scheme :-)

I am pretty sure the official (as per technical standards) Czech collation is
almost identical.

garabik-ne...@kassiopeia.juls.savba.sk

unread,

Mar 16, 2008, 1:23:49 PM3/16/08

to

Paul J Kriha <paul.nos...@paradise.net.nz> wrote:

I wouldn't be surprised if the official rules of Czech orthography
forgot to mention "Ch" as well :-)

The capitalization is such an obvious and straightforward issue that no one
ever thought about writing it down into the book, and so "Ch" just did not
get there... I was not aware of it either, until I found out about the
Dutch IJ.

(For Oliver: everyone happily uses Ch, unless writing in all caps, when CH is used.
As one would expect.)

For Oliver too:
There is also the Azeri ə capitalized as Ə, and the Nigerian ǝ capitalized
as Ǝ. They however got different codepoints in Unicode (but, I vaguely recall
that the turkish i and non-turkish i got different codepoints in some encoding...
but it was not ISO_8859-9)

garabik-ne...@kassiopeia.juls.savba.sk

unread,

Mar 16, 2008, 1:27:31 PM3/16/08

to

Dušan Vukotić <dusan....@gmail.com> wrote:
> On Mar 15, 3:36 pm, garabik-news-2005...@kassiopeia.juls.savba.sk
> wrote:

...

>>
>> Do you know that "oficially", according to the codified Slovak orthography
>> rules, "ch" should be capitalized only as "CH"?
>
> If I understood well, you should write CHlad instead of Chlad (cold)
> in the beginning of a sentence?

Yes, but nobody ever writes like this. Indeed, it would be considered a typo
if seen in a text, and a mistake if e.g. a schoolchild was to write in this
way.

>
> I would say, visually it looks much better as "Ch"... is it not?

To me, it looks horrible, but I'd say it is just a matter of being used to.

Ruud Harmsen

unread,

Mar 16, 2008, 4:18:06 PM3/16/08

to

Sun, 16 Mar 2008 17:23:49 +0000 (UTC):
garabik-ne...@kassiopeia.juls.savba.sk: in sci.lang:

>The capitalization is such an obvious and straightforward issue that no one
>ever thought about writing it down into the book, and so "Ch" just did not
>get there... I was not aware of it either, until I found out about the
>Dutch IJ.

Klingon!

LEE Sau Dan

unread,

Mar 16, 2008, 7:54:26 PM3/16/08

to

>>>>> "garabik-news-2005-05" == garabik-news-2005-05 <garabik-ne...@kassiopeia.juls.savba.sk> writes:

garabik-news-2005-05> I wouldn't be surprised if the official
garabik-news-2005-05> rules of Czech orthography forgot to mention
garabik-news-2005-05> "Ch" as well :-)

garabik-news-2005-05> The capitalization is such an obvious and
garabik-news-2005-05> straightforward issue that no one ever
garabik-news-2005-05> thought about writing it down into the book,
garabik-news-2005-05> and so "Ch" just did not get there...

No. Like the grammar of a language, the capitalization rules may
appear "simple and straight-forward" to the native speakers and fluent
speakers, but they are tricky enough to trip new learners and computer
algorithms over and over.

Capitalization has been a non-trivial issue for l12n of software.
People building i18n libraries and writing internationalized software
need to understand the issues and pay special attention when coding
their software.

garabik-news-2005-05> I was not aware of it either, until I found
garabik-news-2005-05> out about the Dutch IJ.

And you may not be aware of how compulsory tense marking can be a very
difficult feature for some learners on the other side of the global.

Paul J Kriha

unread,

Mar 17, 2008, 3:12:57 AM3/17/08

to

<garabik-ne...@kassiopeia.juls.savba.sk> wrote in message
news:frjl35$1uqh$1...@ns.felk.cvut.cz...

> Paul J Kriha <paul.nos...@paradise.net.nz> wrote:
> > "Oliver Cromm" <lispa...@yahoo.de> wrote in message
> > news:x6vogekt297g$.dlg@mid.crommatograph.info...
> >> * /r LEE Sau Dan wrote:
> >>
> >> >>>>>> "Oliver" == Oliver Cromm <lispa...@yahoo.de> writes:
> >> >
> >> > Oliver> On a side note: is it the only case where the
> >> > Oliver> capitalization of a lowercase letter depends on the
> >> > Oliver> language?
> >> >
> >> > No. There are many other cases.
> >> > e.g. Dutch "ij" --> "IJ", German "ß" -> "SS".
> >>
> >> I didn't ask for any kind of unusual capitalization rule, I am
> >> interested if there is another letter or character than "i" that has two
> >> different capitalizations, depending on which language we're in. The
> >> opposite, one capital letter having two lower case counterparts would be
> >> close enough, but your examples aren't.
> >
> > Well, in that case, the letter "ch" is good example of what you are
> > looking for. In Czech, it's normally capitalized as "Ch" and as
> > Radovan says in Slovak it's capitalized as "CH".
> > (However, not for much longer :-)
>
> I wouldn't be surprised if the official rules of Czech orthography
> forgot to mention "Ch" as well :-)

I happen to have Akademická Pravidla C^eského Pravopisu (my capitalization
according to English language rules :-) issued by Academia Praha in 1993.
I have quickly looked through the relevant chapter and it seems they indeed do
not bother to deal with "ch/Ch/CH" explicitely. However, on page 40 while
they deal with proper names they include an example with "ch" and "Ch".
"chrám sv. Víta" is any church of St. Vitus, while "Chrám sv. Víta" is
_the one and only_ church (cathedral) of St. Vitus at Hrad^any in Prague.

pjk

Paul J Kriha

unread,

Mar 17, 2008, 3:40:36 AM3/17/08

to

"LEE Sau Dan" <dan...@informatik.uni-freiburg.de> wrote in message
news:87y78im...@informatik.uni-freiburg.de...
>>>>>> "garabik-news-2005-05" == garabik-news-2005-05

<garabik-news-2005->0...@kassiopeia.juls.savba.sk> writes:
>
> garabik-news-2005-05> I wouldn't be surprised if the official
> garabik-news-2005-05> rules of Czech orthography forgot to mention
> garabik-news-2005-05> "Ch" as well :-)
>
> garabik-news-2005-05> The capitalization is such an obvious and
> garabik-news-2005-05> straightforward issue that no one ever
> garabik-news-2005-05> thought about writing it down into the book,
> garabik-news-2005-05> and so "Ch" just did not get there...
>
>No. Like the grammar of a language, the capitalization rules may
>appear "simple and straight-forward" to the native speakers and fluent
>speakers, but they are tricky enough to trip new learners and computer
>algorithms over and over.

You are missing the point. The Czech and Slovak rulebooks
of orthography are written explicitely for native or fluent speakers.
Great majority of chapters deal with unusual and esoteric
issues which a foreign learner will hardly ever be faced with.

The foreign beginner learners are usually introduced to charts
of capitalized and non-capitalized alphabets in the first lesson.
Apart from many letters with diacritics, several letters are
handwritten differently than equivalent letters in other
languages. In case of "ch" all they are probably told is that
it is a single letter and that it looks like "ch" or "Ch".

>Capitalization has been a non-trivial issue for l12n of software.
>People building i18n libraries and writing internationalized software
>need to understand the issues and pay special attention when coding
>their software.
>
>
> garabik-news-2005-05> I was not aware of it either, until I found
> garabik-news-2005-05> out about the Dutch IJ.
>
>And you may not be aware of how compulsory tense marking can be a very
>difficult feature for some learners on the other side of the global.

Well, that's neither here or there. You may be equally unaware of
how substantially different is Czech and Slovak verb tense
marking in multi-verb sentences from either English or Chinese
language.

You are not the only one who has had hard time learning
to use English tenses. :-)

pjk

Paul J Kriha

unread,

Mar 17, 2008, 4:07:21 AM3/17/08

to

<garabik-ne...@kassiopeia.juls.savba.sk> wrote in message
news:frjk7h$1uh7$1...@ns.felk.cvut.cz...

> Paul J Kriha <paul.nos...@paradise.net.nz> wrote:
>
> ...
> >
> > I don't have a Slovak dictionary. How do Slovak dictionaries treat
> > letters with diacritics? My elderly Czech-English dictionary sorts
> > words beginning with palatalized letters into separate sections
> > under their own headings each immediately after the corresponding
> > plain letter. That is as one would normally expect.
>
> ä, č, dz, dž, ch, ô, š, ž have their own sections, the rest have not.
> Some dictionaries, however, keep ä, dz, dž and ô inside a, d, o sections. It
> does not matter much, since dz and dž are at the end of the section anyway,
> and hardly any words start with ä. However, words beginning with ô- come
> after those beginning with ož-, which might be confusing.
>
> > However, for the sorting purposes the letters inside the words are treated
> > as if they did not have diacritics at all. The words end up sorted
> > according to alphabetic value of the letters that follow them.
>
> Collation order is double-keyed: first, you sort the entried disregarding
> the acute accent and háček in ďťňľ (but sorting the čšž after the corresponding háčekless
> letters). Then you do a second pass and put the letters with acute accent and ďťňľ
> after those accentless, if possible. So you'd get something like this:
> asa asá ása ásá asb ásb aša ašá ašb ata atá aťa aťá áta atb
> Though not everyone adheres strictly to this scheme :-)
>
> I am pretty sure the official (as per technical standards) Czech collation is
> almost identical.

Oh, yes, I agree. What I was describing can be found in this Dr. Alois
C^ermák's highly idiosyncratic Cz-E and E-Cz dictionary printed in
Třebíč in 1940. It contains a lot of esoteric Cz dialectal words while
many relatively common words are not included. The English quite
often feels early Victorian. I suspect he based it on some mid 19th
century dictionary.

pjk

P.S.
This is my cut&paste tool: "příliš žluťoučký kůň úpěl ďábelské ódy"
The sentence contains all fifteen Cz letters with diacritics "říšžťčýůňúěďáéó",
each of them only once.
Is there a Slovak equivalent?

Recently I realized that one also needs a capitalized version of "říšžťčýůňúěďáéó"
For example, I can't type "T" or "D" with a hacek, or "U" with a krouzek.

John Atkinson

unread,

Apr 1, 2008, 9:59:39 AM4/1/08

to

<garabik-ne...@kassiopeia.juls.savba.sk> wrote ...

[...]

> For Oliver too:
> There is also the Azeri ə capitalized as Ə, and the Nigerian ǝ
> capitalized
> as Ǝ.

What Nigerian language is that? None of the ones I'm familiar with has
<ǝ>, though of course there's a few hundred languages in Nigeria that
I'm not familiar with.

[...]

John.

Joachim Pense

unread,

Apr 1, 2008, 11:28:42 AM4/1/08

to

John Atkinson wrote:

> <ǝ>, though of course there's a few hundred languages in Nigeria that

"there's", not "there are"? Is that standard English?

Curiously,
Joachim

John Atkinson

unread,

Apr 1, 2008, 12:28:38 PM4/1/08

to

"Joachim Pense" <sn...@pense-mainz.eu> wrote...

> John Atkinson wrote:
>
>> <ǝ>, though of course there's a few hundred languages in Nigeria that
>
> "there's", not "there are"? Is that standard English?

Yes, though some pedants don't like it. Both are in common use.

J.

Ruud Harmsen

unread,

Apr 2, 2008, 4:57:14 AM4/2/08

to

>John Atkinson wrote:
>> <?>, though of course there's a few hundred languages in Nigeria that

Tue, 01 Apr 2008 17:28:42 +0200: Joachim Pense <sn...@pense-mainz.eu>:
in sci.lang:

>"there's", not "there are"? Is that standard English?

(Deliberately not looking at John's answer yet.)

Both are possible, but "there's" is a bit more informal.
--
Ruud Harmsen
http://rudhar.com/index/whatsnew.htm