So, my question for those working in French (or other languages
which make extensive use of accented characters) is, does it
really make a difference _in_practice_ to do all the relevant
character conversions throughout the page text? The pages are
all UTF-8 in this particular case.
Some sample pages for background:
http://www.chem.utoronto.ca/IChO.Ontario/general.htm
http://www.chem.utoronto.ca/IChO.Ontario/fr/generalites.html
> So, my question for those working in French (or other languages
> which make extensive use of accented characters) is, does it
> really make a difference _in_practice_ to do all the relevant
> character conversions throughout the page text? The pages are
> all UTF-8 in this particular case.
No. Use UTF8 as the encoding for HTML pages, because they already use
Unicode as the implied character set (HTML always does).
Using ISO-8859-* as an encoding is what causes the problem, that's
where the mismatch between the characters in the served document and
the implied document arise.
Make sure that you use UTF8, not UTF16 or UTF8Y. Some tools,
especially Microsoft's, will tend to select UTF16 if you choose the
first "Unicode" option that you see (look for a specific UTF8).
Yup, no need to do that. As you say it's very ponderous. Life is not
supposed to be that difficult.
> So, my question for those working in French (or other languages
> which make extensive use of accented characters) is, does it
> really make a difference _in_practice_ to do all the relevant
> character conversions throughout the page text? The pages are
> all UTF-8 in this particular case.
Just leave them as UTF-8 and make sure your server is configured to send
the correct Content-Type (probably "text/html; charset=UTF-8").
> Use UTF8 as the encoding for HTML pages,
The hyphen in "UTF-8" is important.
--
In memoriam Alan J. Flavell
http://www.alanflavell.org.uk/charset/
> I am finding that converting all the
> accented characters (e-acute, etc.) to entities (é)
> time-consuming even with search-and-replace, while it greatly
> reduces source code legibility when trying to make the
> inevitable corrections.
>
> http://www.chem.utoronto.ca/IChO.Ontario/fr/generalites.html
With character refences ( û û ), you can find wrong
letters easily. In your above text, you have "u with inverted breve"
U+0217 instead of "�".
You may want to read
http://www.alanflavell.org.uk/charset/checklist
http://niwo.mnsys.org/saved/~flavell/charset/checklist.html
> On Tue, 1 Sep 2009, Andy Dingley wrote:
>
> > Use UTF8 as the encoding for HTML pages,
>
> The hyphen in "UTF-8" is important.
I would imagine so!
Follow-up questions: for the document type declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
Is there an equivalent "-//W3C//DTD HTML 4.01//FR" ?
If so, what is the effect of using it, and is it
preferable to using <html lang="fr"> ?
> Just leave them as UTF-8
Right. You could use ISO-8859-1 as well, using character or entity
references for a few characters only, but it's 2009 now and the Web (though,
alas, not the world) is reasonably safe for UTF-8.
> and make sure your server is configured to
> send the correct Content-Type (probably "text/html; charset=UTF-8").
That would be recommendable, but it really isn't mandatory, as long as the
document contains a suitable meta tag (as it does) and the server headers do
not contain conflicting information (as they don't, as they leave the
encoding unspecified).
This still leaves some problems, since the French language has spacing rules
for punctuation and there is no direct equivalent to them in Unicode. But
this is not very important, as most French pages don't even try to be clever
in the sense of using "espace fine ins�cable". Yet, if you wish to do things
better, you might check my
http://www.cs.tut.fi/~jkorpela/html/french.html
which is rather dusty as regards to accented characters, but the spacing
issue persists.
There are also finer typography issues. For example, an expression like
"L'examen" (whether written with ASCII apostrophe here or correctly using a
curly apostrophe as on the page mentioned) looks rather poor on a closer
look, due to excessive spacing. Web browsers don't do automatic kerning, so
the apostrophe appears to the right of "L", as opposite to being moved to
the left as in good old typography. Personally, I normally would not bother,
as it's too much trouble to fix such things, but for crucial texts like
headings or very prominent phrases, I might consider using a little piece of
CSS to tune the spacing, along with ideas presented in
http://www.cs.tut.fi/~jkorpela/www/letter-spacing.html
--
Yucca, http://www.cs.tut.fi/~jkorpela/
> Is there an equivalent "-//W3C//DTD HTML 4.01//FR" ?
No.
> If so, what is the effect of using it, and is it
> preferable to using <html lang="fr"> ?
That's completely different. "DTD HTML 4.01//FR" would mean
that HTML had been translated into French. So you would write
<titre> instead of <title>, etc.
--
� superscript 1 � fraction 1/4 � D stroke � d stroke
� superscript 2 � fraction 1/2 � Thorn � thorn
� superscript 3 � fraction 3/4 � Y acute � y acute
� multiply sign � broken bar
> Follow-up questions: for the document type declaration:
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
> "http://www.w3.org/TR/html4/strict.dtd">
>
> Is there an equivalent "-//W3C//DTD HTML 4.01//FR" ?
No. That would mean that the document type definition (DTD) is in French,
which typically means that the comments in the formal description of syntax
are French. That's an imaginable possibility, but there is no such version
of the DTD.
> If so, what is the effect of using it,
It depends. Browsers may then treat the doctype declaration as nonstandard
(as it is), triggering Quirks Mode, which sometimes implies nothing but
often changes visual rendering in a multitude of ways, sometimes messing up
everything; see
http://www.cs.tut.fi/~jkorpela/quirks-mode.html
> and is it
> preferable to using <html lang="fr"> ?
Certainly not. The language of the DTD (syntax description of markup
language) has absolutely nothing to do with the language of documents
written using the syntax.
--
Yucca, http://www.cs.tut.fi/~jkorpela/
Boy, that page has a few problems. Here are some tips to help you
write better HTML code:
1. You've forgotten to close your li tags. Yes, I know they are
optional, but it's sloppy nonetheless!
2. Some of your HTML tags and attributes are capitalized, while others
are in lower case. Pick one case and stick to it (preferably lower
case!)
3. Consider using utf-8 charset instead of iso-8859-1.
4. Why do your blockquotes contain p and div elements?
5. Netscape 4 is well over eleven years old; I strongly doubt if
anyone still uses it.
6. You really overuse italics and double quotes. Bad habit!
7. If you learn CSS, then you can make your page look much nicer!
There are many tutorials available on the web. Try using a search
engine such as Google or Bing to find them.
Keep practicing and you'll get the hang of it before long, I'm sure!
> Boy, that page has a few problems.
Girl, boy, or other entity, you make your problems loud and clear by
fullquoting a message.
> Here are some tips to help you
> write better HTML code:
You didn't provide a URL of your own web site, yet felt qualified to give
advice in HTML authoring. I'm not surprised.
> 1. You've forgotten to close your li tags. Yes, I know they are
> optional, but it's sloppy nonetheless!
You're boring.
> 2. Some of your HTML tags and attributes are capitalized, while others
> are in lower case. ?
You're more boring.
> 3. Consider using utf-8 charset instead of iso-8859-1.
Why? You don't have a clue about "charset", do you?
> 4. Why do your blockquotes contain p and div elements?
You don't know much about HTML, do you? Oh well, you said it already.
Thanks for playing. Please do not change your From field with a fake "name"
before you have a clue. TIA!
--
Yucca, http://www.cs.tut.fi/~jkorpela/
More recent releases of Firefox and Opera (at least) do do kerning. I
don't see much effect with L'examen, but a try a word like "Avid" with a
big font size and you can see the bounding boxes of the A and v overlap.
> You could use ISO-8859-1 as well, using character or entity
> references for a few characters only, but it's 2009 now and the Web
> (though, alas, not the world) is reasonably safe for UTF-8.
The advantage of an 8-bit character set is that you cannot
write a wrong letter so easily as the OP had done. I don�t
understand how the OP could insert �u with inverted breve� for
a plain �� � but he did.
The 8-bit character set Windows-1252
http://www.user.uni-hannover.de/nhtcapri/west-european.win.html
contains all necessary characters for French, including � �
ligatures, dashes, apostrophe and � sign.
--
In memoriam Alan J. Flavell
http://groups.google.co.uk/groups/search?q=author:Alan.J.Flavell
> On Tue, 1 Sep 2009, Jukka K. Korpela wrote:
>
> > You could use ISO-8859-1 as well, using character or entity
> > references for a few characters only, but it's 2009 now and the Web
> > (though, alas, not the world) is reasonably safe for UTF-8.
>
> The advantage of an 8-bit character set is that you cannot
> write a wrong letter so easily as the OP had done. I don�t
> understand how the OP could insert �u with inverted breve� for
> a plain �� � but he did.
I did? Where?
> Andreas Prilop <prilo...@trashmail.net> wrote:
>
>> The advantage of an 8-bit character set is that you cannot
>> write a wrong letter so easily as the OP had done. I don�t
>> understand how the OP could insert �u with inverted breve� for
>> a plain �� ? but he did.
This is a lie.
I did not include the superscripts � (1), � (2), � (3) in my
posting.
I did not write a question mark (?) after the letter �.
> I did? Where?
You corrected your page
http://www.chem.utoronto.ca/IChO.Ontario/fr/generalites.html
on
Wed, 02 Sep 2009 15:52:32 GMT
--
� superscript 1 � fraction 1/4 � D stroke � d stroke
� superscript 2 � fraction 1/2 � Thorn � thorn
� superscript 3 � fraction 3/4 � Y acute � y acute
� multiply sign � broken bar
Never mind - I found them. The problem seems to have originated
with MS Word: the file I was sent (PC) with the text showed up with
non-printing characters for � (Mac), at least until I changed the font.
I couldn't actually see the difference between the two forms of
accented 'u's until I zoomed the page about five times in FireFox,
or set the view to 200% in Word. Maybe it's time to move up to a 21"
screen... I just dropped the settings on my 17" display from 1152x720
to 1024x640 (which makes the accent differences easier to see), but
I don't like the loss of screen real-estate very much. 8(
In my experience (a pair of 24" screens) you end up with more characters
of about the same size.
This is wonderful for me; I can write code with comments on each line
that are sufficiently descriptive that I can remember what my code does
when I come back to it a week later.
Actually, my new displays have a higher DPI than my previous 19"
display, so I moved to smaller fonts, in general.
--
Steve Swift
http://www.swiftys.org.uk/swifty.html
http://www.ringers.org.uk
> On Wed, 2 Sep 2009, David Stone wrote:
>
> > Andreas Prilop <prilo...@trashmail.net> wrote:
> >
> >> The advantage of an 8-bit character set is that you cannot
> >> write a wrong letter so easily as the OP had done. I don1t
> >> understand how the OP could insert 3u with inverted breve2 for
> >> a plain 3�2 ? but he did.
>
> This is a lie.
> I did not include the superscripts 1 (1), 2 (2), 3 (3) in my
> posting.
> I did not write a question mark (?) after the letter �.
That is really odd. When I look at your post
<Pine.LNX.4.64.09...@sarge.rrzn.uni-hannover.de>
I can see the single curly right apostrophe in "don't", and double
curly quotes around "u with inverted breve" and "�", as well as
an n-dash. They were still there when I composed my reply also.
I thought I had fixed the character encoding issue with MT-NW,
but apparently not? If you can suggest what in my news client
might need poking, I would quite happily do it!
> David Stone wrote:
> > Maybe it's time to move up to a 21" screen�
>
> In my experience (a pair of 24" screens) you end up with more characters
> of about the same size.
Not if you also adjust the display settings!
> This is wonderful for me; I can write code with comments on each line
> that are sufficiently descriptive that I can remember what my code does
> when I come back to it a week later.
>
> Actually, my new displays have a higher DPI than my previous 19"
> display, so I moved to smaller fonts, in general.
My current monitor also has higher DPI than the older one, but my eyes
continue to grow less tolerant of smaller fonts...
> I thought I had fixed the character encoding issue with MT-NW,
> but apparently not? If you can suggest what in my news client
> might need poking, I would quite happily do it!
I'm not sure what happens when you select charset=UTF-8
for replying.
¹ superscript 1 ¼ fraction 1/4 Ð D stroke ð d stroke
² superscript 2 ½ fraction 1/2 Þ Thorn þ thorn
³ superscript 3 ¾ fraction 3/4 Ý Y acute ý y acute
× multiply sign ¦ broken bar Œ OE œ oe
‘ quote 6 “ quote 66 — em dash € euro
’ quote 9 ” quote 99 – en dash
If you don't get the right characters even with charset=UTF-8,
then there is no way I suspect.
Certain Macintosh newsreaders are far behind the times.
> On Wed, 2 Sep 2009, David Stone wrote:
>
> > I thought I had fixed the character encoding issue with MT-NW,
> > but apparently not? If you can suggest what in my news client
> > might need poking, I would quite happily do it!
>
> I'm not sure what happens when you select charset=UTF-8
> for replying.
>
> ¹ superscript 1 ¼ fraction 1/4 Ð D stroke ð d stroke
> ² superscript 2 ½ fraction 1/2 Þ Thorn þ thorn
> ³ superscript 3 ¾ fraction 3/4 � Y acute � y acute
> ₩~ multiply sign › broken bar � OE � oe
> � quote 6 � quote 66 � em dash � euro
> � quote 9 � quote 99 � en dash
>
> If you don't get the right characters even with charset=UTF-8,
> then there is no way I suspect.
Looks like I had replies formatted to use Western (ISO Latin 1)
but "Use article's character set for reply" was checked, so it
should have parsed the response as UTF-8.
I've now switched it to force UTF-8 for all replies, to see what
happens. (A test post in a non-test newsgroup? Oh the horror!)
> Certain Macintosh newsreaders are far behind the times.
Or the way preference settings are presented is not exactly
intuitive or easy to navigate...
> More recent releases of Firefox and Opera (at least) do do kerning.
I can't see any kerning effects on Firefox 3 or Opera 10.
> I don't see much effect with L'examen, but a try a word like "Avid"
> with a big font size and you can see the bounding boxes of the A and
> v overlap.
I can't see anything like that. Can you please provide a URL (preferably to
a page with font face set, since font may matter a lot)?
--
Yucca, http://www.cs.tut.fi/~jkorpela/
http://www.tidraso.co.uk/misc/kerning.html
with screenshots. I drew the pink lines on afterwards with the GIMP. I
don't know what exact fonts those are-- whatever you get by default for
serif and sans-serif on SUSE 11.
This is Opera 9.51 and Firefox 3.0.
In FF 3.5.2 (on my Mac at least), style="font-family: serif" which
translates to Century Schoolbook, there is no overlap. There should be
because it does look odd so much space between the serif A and v (due to
the serifs on the feet of A and peaks of v). I wonder what Mac FF 3 did,
probably what you are observing with your browser. Not sure exactly
*how* perfectly cross browser FF is, I know it is pretty good!
Nor in Mac Opera 9.64 (with Times) do I see this kerning.
--
dorayme
Exactly, that's why I chose that word.
> I wonder what Mac FF 3 did, probably what you are observing with your
> browser. Not sure exactly *how* perfectly cross browser FF is, I know
> it is pretty good!
>
> Nor in Mac Opera 9.64 (with Times) do I see this kerning.
It may be the fonts rather than the browser, and not only that, but some
browsers may use a font renderer that's part of the platform, so the
same font on different OSes may look different.
> On 2009-09-04, dorayme <dorayme...@optusnet.com.au> wrote:
>>> http://www.tidraso.co.uk/misc/kerning.html
>> In FF 3.5.2 (on my Mac at least), style="font-family: serif" which
>> translates to Century Schoolbook, there is no overlap.
Works fine here (the Firefox defaults for serif and sans-serif have been
Times and Helvetica as long as I can remember).
Firefox 3.5.2 Snow Leopard.
>> Nor in Mac Opera 9.64 (with Times) do I see this kerning.
>
> It may be the fonts rather than the browser, […]
That’s quite possible; e.g., on my system the kerning in the sans-serif
test is barely noticeable with Helvetica, but it is with e.g. Helvetica
Neue. Even the serif variant needs a second look with Times (simply not
satisfactory), while it looks just right with e.g. Arno Pro.
> I can't see any kerning effects on Firefox 3 or Opera 10.
Try with the string .V.
Usually V. kerns but .V does not kern.
--
Solipsists of the world - unite!
>> I can't see any kerning effects on Firefox 3 or Opera 10.
>
> http://www.tidraso.co.uk/misc/kerning.html
>
> This is Opera 9.51 and Firefox 3.0.
I can see kerning with Konqueror 4, too. My test string is
.V.
--
I used to believe in reincarnation in a former life.
That's a good test, because the assymmetry is immediately obvious if
you've got kerning.
Right, I changed the preference to Times for serif (on a Mac running
Tiger) and I am getting kerning on my FF now. But my Opera 9.64 (still)
shows none and from a check of preferences (but it is all a bit more
obscure) it does seem to be Times there too.
--
dorayme
It may be and it may not be: on my FF, as I posted a moment ago, using
Times and looking at your URL, I got *clear* kerning. But using
Andreas's test, it was not clear, it was touch and go, the vertical red
line just squeezing through.
<http://dorayme.netweaver.com.au/kerning/kerningTest.html>
--
dorayme
> On Thu, 3 Sep 2009, Jukka K. Korpela wrote:
>
>> I can't see any kerning effects on Firefox 3 or Opera 10.
>
> Try with the string .V.
>
> Usually V. kerns but .V does not kern.
This discussion has turned from French-language authoring to kerning, so I
have changed the heading and I'm also trying to move the discussion to
c.i.w.a.stylesheets.
The change of topic is mainly my fault - I mentioned the following:
"There are also finer typography issues. For example, an expression like
"L'examen" (whether written with ASCII apostrophe here or correctly using a
curly apostrophe as on the page mentioned) looks rather poor on a closer
look, due to excessive spacing. Web browsers don't do automatic kerning, so
the apostrophe appears to the right of "L", as opposite to being moved to
the left as in good old typography."
Then we have had discussions about automatic kerning, with varying
observations, and apparently more or less everybody is both right and wrong.
Firefox 3 may do some kerning under some conditions, and maybe Opera too,
but the effects are small and their presence may depend on many factors.
The page
http://opentype.info/blog/2008/06/14/kerning-and-opentype-features-in-firefox-3/
describes typographic effects on Firefox, including kerning, ligatures, and
contextual forms, in a suggestive but vague form, whereas the page
https://developer.mozilla.org/en/CSS/text-rendering
describes a proprietary CSS property text-rendering, with the odd-looking
value optimizeLegibility, as follows:
"One very visible effect is: optimizeLegibility enables ligatures (ff, fi,
fl etc.) in text smaller than 20px for some fonts (e.g. Microsoft's Calibri,
Candara, Constantia and Corbel or the DejaVu font family)."
However, my Firefox 3.5.2 (on Vista) does not seem to care about that
property. Instead, it by default applies some typography features -
ligatures and kerning - if I use, say, 18px Constantia. One can easily see
that fi is rendered as a ligature then (unlike on IE), and there is some
kerning e.g. in "Va" and "VA" (the letters are closer to each other than on
IE), but not for "V.", perhaps because this pair is not described in the
font properties.
This apparently happens for some fonts only, but there does not seem to be
any font-size limit.
As a whole, occasional and undocumented kerning for a few fonts probably
generates more harm than useful effects.
Is there some CSS property that actually affects Firefox 3 regarding kerning
and ligatures?
To create more confusion, if I try to defeat the Firefox 3 behavior when I
_don't_ want a ligature, I cannot use the obvious (to Unicode-aware people)
approach: instead of fi, write f‌i. The should zero-width non-joiner
character should prevent ligature behavior, and it does, but it also turns
the font of the letter after it to something unexpected! This is easy to see
using e.g.
<span style="font: 32pt Constantia">fi<br>
f‌i</span>
The renderings are different, but in too odd a way - in the latter, the "i"
appears in some sans-serif font! The effect is easier to see if you test
just
<span style="font: 32pt Constantia">‌ii<br>
--
Yucca, http://www.cs.tut.fi/~jkorpela/
>>> Try with the string .V.
>
> It may be and it may not be: on my FF, as I posted a moment ago, using
> Times and looking at your URL, I got *clear* kerning. But using
> Andreas's test, it was not clear, it was touch and go, the vertical red
> line just squeezing through.
> <http://dorayme.netweaver.com.au/kerning/kerningTest.html>
Apparently there are different versions of Times and Helvetica with
different kerning. In Adobe's versions, V. kerns much more than Av
http://www.google.co.uk/search?q=%22KPX+A+v%22+%22KPX+V+period%22+Helvetica
http://www.google.co.uk/search?q=%22KPX+A+v%22+%22KPX+V+period%22+Times
--
� superscript 1 � fraction 1/4 � D stroke � d stroke
� superscript 2 � fraction 1/2 � Thorn � thorn
� superscript 3 � fraction 3/4 � Y acute � y acute
� multiply sign � broken bar
Boy, what a pathetic comeback, even for a Finnjävel. You can't defend
your web site, so you resort to childish attacks.
Here's my web site: www.JukkaKorpelaIsATurd.com
Äitisi nai poroja!
> On Sep 1, 4:39�pm, "Jukka K. Korpela" <jkorp...@cs.tut.fi> wrote:
>> fp90210wrote:
>>> Boy, that page has a few problems.
>>
>> Girl, boy, or other entity, you make your problems loud and clear by
>> fullquoting a message.
>>
>>> Here are some tips to help you
>>> write better HTML code:
>>
>> You didn't provide a URL of your own web site, yet felt qualified to give
>> advice in HTML authoring. I'm not surprised.
>>
>>> 1. You've forgotten to close your li tags. Yes, I know they are
>>> optional, but it's sloppy nonetheless!
>>
>> You're boring.
>
> Boy, what a pathetic comeback, even for a Finnj�vel. You can't defend
> your web site, so you resort to childish attacks.
>
> Here's my web site: www.JukkaKorpelaIsATurd.com
Network Error
An error occurred while accessing "www.jukkakorpelaisaturd.com".
Maybe the domain name is not valid or there's a typo in the internet address.
Why am I not surprised?
--
athel