Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

IE7 and the non-breaking hyphen

28 views
Skip to first unread message

Andreas Prilop

unread,
Sep 19, 2008, 10:18:44 AM9/19/08
to
It seems to me that Internet Explorer 7 (as opposed to IE6)
displays the non-breaking-hyphen U+2011 or ‑
as some sort of en-dash rather than a hyphen.

See U+2011 #8209 at
http://www.alanflavell.org.uk/unicode/unidata20.html#x2010
http://niwo.mnsys.org/saved/~flavell/unicode/unidata20.html
http://www.cs.tut.fi/~jkorpela/dashes.html#unidash

Anyone else?


Now for the complete mystery:

In IE7, select
Tools > Internet Options > General > Accessibility >
[X] Ignore font styles specified on webpages

Then select
View > Encoding > Right-to-Left Document
on the above pages and the hyphen for U+2011 appears!

Dick Margulis

unread,
Sep 19, 2008, 11:09:45 AM9/19/08
to
Andreas Prilop wrote:
> It seems to me that Internet Explorer 7 (as opposed to IE6)
> displays the non-breaking-hyphen U+2011 or ‑
> as some sort of en-dash rather than a hyphen.
>
> See U+2011 #8209 at
> http://www.alanflavell.org.uk/unicode/unidata20.html#x2010
> http://niwo.mnsys.org/saved/~flavell/unicode/unidata20.html
> http://www.cs.tut.fi/~jkorpela/dashes.html#unidash
>
> Anyone else?
>

They look like hyphens to me. What font do you have as your browser
default? Perhaps that font is missing the glyph and the browser is
substituting one from a different font with a wider quad.

Andreas Prilop

unread,
Sep 19, 2008, 11:18:25 AM9/19/08
to
On Fri, 19 Sep 2008, Dick Margulis wrote:

>> It seems to me that Internet Explorer 7 (as opposed to IE6)
>> displays the non-breaking-hyphen U+2011 or ‑
>> as some sort of en-dash rather than a hyphen.
>

> They look like hyphens to me. What font do you have as your browser
> default?

It happens with all: Arial, Tahoma, Times New Roman, Verdana, etc.

> Perhaps that font is missing the glyph and the browser is
> substituting one from a different font with a wider quad.

The non-breaking hyphen should need no glyph at all - as
well as the soft hyphen. The same glyph should be taken for all:
the "normal" hyphen.

And Firefox does display a hyphen. So it is not a question
of fonts.

I forgot: This is Windows XP.

Andreas Prilop

unread,
Sep 24, 2008, 11:51:36 AM9/24/08
to
It seems to me that Internet Explorer 7 (as opposed to IE6)
displays the non-breaking-hyphen U+2011 or ‑
as some sort of dash rather than a hyphen.

My test page is
http://www.unics.uni-hannover.de/nhtcapri/temp/2000.html
where Internet Explorer 7 displays

dir=ltr ‑

as some sort of dash but displays

dir=rtl ‑

as a regular hyphen. The typeface is Times New Roman;
the operating system is Windows XP.

--
</wallstreet>

Jukka K. Korpela

unread,
Sep 24, 2008, 12:33:59 PM9/24/08
to
Andreas Prilop wrote:

> It seems to me that Internet Explorer 7 (as opposed to IE6)
> displays the non-breaking-hyphen U+2011 or &#8209;
> as some sort of dash rather than a hyphen.

This is a tricky issue and depends on fonts maybe more than anything else.

> http://www.cs.tut.fi/~jkorpela/dashes.html#unidash

For example, that page of mine suggests Code2000 as the primary font for the
column with the glyphs. So if your system has that font installed, you will
see the glyphs as implemented in that font: U+2011 is clearly longer than
U+2010. That's rather wrong, since U+2011 is a non-breaking variant of
U+2010, so the visual appearance should normally be the same. If your system
lacks Code2000 but has Arial Unicode MS, you should see U+2011 as identical
to U+2010, since that's the way that font has been designed.

That happens to be both on IE 7 and on Firefox 3, on Vista.

I remember vaguely having sent the author of Code2000 a suggestion to fix
this...

> My test page is
> http://www.unics.uni-hannover.de/nhtcapri/temp/2000.html
> where Internet Explorer 7 displays
>
> dir=ltr &#8209;
>
> as some sort of dash but displays
>
> dir=rtl &#8209;
>
> as a regular hyphen.

That's really weird. It does not happen on Firefox. And if I copy and paste
the content from IE 7 to MS Word, an additional oddity takes place: when I
place the cursor after U+2010 (on the right or left, depending on direction)
and press Alt+X, nothing happens. For any normal character, Alt+X changes
the character to its Unicode number. Thus, I conclude that IE 7's
"non-breaking hyphen" is not a normal character at all but something
special. Oddly enough, with copy & paste from Firefox 3 the same oddity
appears. But if type 2011 Alt+X (to produce U+2011 directly), then Alt+X
again, I get 2011 as expected.

> The typeface is Times New Roman;
> the operating system is Windows XP.

Since Times New Roman does not contain a glyph for U+2011, I think what you
see is the rendering in your browser's fallback font. Conceivably, a browser
_could_ implement U+2011 by displaying U+2010 or even U+002D (hyphen-minus,
i.e. the Ascii hyphen), just applying different line breaking rules. But I
don't think browsers are that clever. IE 7 might be doing something
non-obvious, though.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Andreas Prilop

unread,
Sep 24, 2008, 12:54:17 PM9/24/08
to
On Wed, 24 Sep 2008, Jukka K. Korpela wrote:

> Since Times New Roman does not contain a glyph for U+2011, I think
> what you see is the rendering in your browser's fallback font.

However, Internet Explorer 6 on Windows XP and even on Windows 2000
displays the non-breaking hyphen very well with Times New Roman.
And IE6 does not fetch missing glyphs from other fonts.

Jukka K. Korpela

unread,
Sep 24, 2008, 1:33:43 PM9/24/08
to
Andreas Prilop wrote:

> Internet Explorer 7 displays
>
> dir=ltr &#8209;
>
> as some sort of dash but displays
>
> dir=rtl &#8209;
>
> as a regular hyphen.

I tested this with a&#8209;b (i.e., letters on both sides of the
non-breaking hyphen) in text with dir=rtl, and IE 7 displays the
non-breaking hyphen as longish (dash-like). If I remove a or b, the hyphen
looks like a regular hyphen.

This doesn't explain anything... makes the problem even more mysterious.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Andreas Prilop

unread,
Sep 25, 2008, 11:07:15 AM9/25/08
to
On Wed, 24 Sep 2008, Jukka K. Korpela wrote:

> I tested this with a&#8209;b (i.e., letters on both sides of the
> non-breaking hyphen) in text with dir=rtl, and IE 7 displays the
> non-breaking hyphen as longish (dash-like). If I remove a or b,
> the hyphen looks like a regular hyphen.
>
> This doesn't explain anything... makes the problem even more mysterious.

No, this is no more mysterious. Remember that
<span dir=rtl> a&#8209;b
is still left-to-right text because of the Latin letters "a b".

You need to write
<bdo dir=rtl> a&#8209;b
for right-to-left text.

Actually, such an example on
http://freenet-homepage.de/prilop/bidirectional-text.html#bdo
led me to this problem.

--
</wallstreet>

Andreas Prilop

unread,
Sep 25, 2008, 11:11:48 AM9/25/08
to
On Wed, 24 Sep 2008, Jukka K. Korpela wrote:

> And if I copy and paste the content from IE 7 to MS Word,
> an additional oddity takes place: when I place the cursor
> after U+2010 (on the right or left, depending on direction)
> and press Alt+X, nothing happens. For any normal character,
> Alt+X changes the character to its Unicode number.
> Thus, I conclude that IE 7's "non-breaking hyphen" is not a
> normal character at all but something special.

I remember that ancient versions of MS Word took U+001E as
non-breaking hyphen and U+001F as soft hyphen.

--
</wallstreet>

Jukka K. Korpela

unread,
Sep 25, 2008, 3:44:27 PM9/25/08
to
Andreas Prilop wrote:

> I remember that ancient versions of MS Word took U+001E as
> non-breaking hyphen and U+001F as soft hyphen.

I think the new versions still do. More precisely, the Word commands (via
Insert/Symbol) for adding things called that way internally insert those
control codes, which are handled by MS Word in a special way. This may
relate to what IE 6 does to U+2011, whereas IE 7 might actually display
U+2011 from an applicable font.

This might explain the basic observation, but what about the effect of
dir="rtl"?

Well, on my system, IE 7 on Vista with Code2000 font installed, I get
different renderings for

<div dir=rtl style="font-family: Code2000">&#8209;</div>
<div dir=rtl>&#8209;</div>

In the former element, the hyphen is dash-like, apparently from Code2000. So
I would guess that IE 7 primarily tries to render U+2011 by picking up a
glyph from the font suggested by the document author (in CSS or HTML).
Failing that, it will look for other fonts installed on the system. Normally
it might find it e.g. in Arial Unicode MS (where it's like a normal hyphen),
but for right to left text, it might prefer some other font, depending on
installed fonts.

This doesn't sound very logical but it would explain what I have observed on
_my_ system. Now testing... I temporarily removed Code2000, and now the
former <div> displays a normal-looking hyphen. Matches my hypothesis, but
<div dir=rtl align=left>foo&#8209;bar</div>
still displays a dash-like hyphen. It might come from MS PMincho, but I
think I'll stop playing with fonts now...

So my hypothesis is that right to left setting may affect the way IE 7
selects the font to be used to cover characters that don't exist in the
current font.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Hendrik Maryns

unread,
Sep 26, 2008, 3:50:57 PM9/26/08
to
Op 24-09-08 17:51 heeft Andreas Prilop als volgt van zich laten horen:

Actually, Firefox on Linux seems to do the same.

H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
www.lieverleven.be
http://catb.org/~esr/faqs/smart-questions.html

Andreas Prilop

unread,
Sep 29, 2008, 11:05:17 AM9/29/08
to
On Fri, 26 Sep 2008, Hendrik Maryns wrote:

>> http://www.unics.uni-hannover.de/nhtcapri/temp/2000.html


>
> Actually, Firefox on Linux seems to do the same.

Perhaps you don't understand what is text and what is graphic
on the page above.

--
</wallstreet>

0 new messages