24.3.2016, 6:21, Adam H. Kerman wrote:
>> The meaning of ZWSP (as defined in the Unicode Standard) is to indicate
>> a direct line break opportunity. A normal space usually does the same,
>
> Except a space is a word boundary (except in a URL); I suppose this
> character isn't.
Word boundaries are where you define them to be, but the Unicode default
word boundary rules classify ZWSP as being in word boundary class “Any”.
This means they are word boundaries.
http://unicode.org/reports/tr29/
> A URL is a word.
Not in the normal meaning of the word “word”. A URL does not correspond
to a spoken word. For example, in English, a word is written using
letters A–Z, a–z, possibly some accented letters like “é”, and possibly
hyphens. Anything else in writing is not a word but e.g. a punctuation
mark or a special symbol.
The computerese meanings for “word” are something completely different.
This includes the technical concept of a maximal string of
non-whitespace characters, like the thing between spaces in
“foo §½+/y612cf&!0#£ bar”. If in doubt, ask your neighbor whether he
would call that a word.
> They're often annoyingly long. That's what I was thinking of
> being desireable to break at a selected location.
URLs should rarely appear in text. In HTML documents, they should appear
as attribute values (e.g. in href=...), not as content. Normally you
should have URLs in content only when your text is *about* URLs, like a
description of URL syntax and some URLs shown as examples. Then you
should probably put each URL on a line of its own. It may still be too
long, and then you need to consider setting allowable line break points.
>> except in contexts where a simple break (with no hyphen at the end of
>> the line) is permitted—as it is in many writing systems, but not in
>> languages using Latin letters (except in contexts like bird-<wbr>cage
>> where the hyphen is part of the word). The tag name is thus misleading:
>> it comes from "word break", but is more like "string break".
>
> Right. I wonder if it's desireable to preserve it when copying.
When copying data in HTML format, it is. If the ZWSP character were used
instead and you copied data from an HTML document as plain text, it’s a
bit debatable. I would say that ZWSP should normally be retained, but it
depends on the use of the copied text whether it needs to be removed.
The quoted text does not say that it is the Unicode character; it says
it behaves like it.
>> The page incorrectly claims that "This element was first implemented in
>> Internet Explorer 5.5". It was actually Netscape 1.1, much earlier.
>
> No, it reads,
>
> Support for the <wbr> tag was introduced in Internet Explorer 5.5,
> though removed again in version 7.
>
> which is a footnote to the browser compatibility table.
What I quoted appears in the text proper of the page, at the start of
the fourth paragraph.
Support for <wbr> in IE is a messy and frustrating story. Please don’t
get me started. And their documentation of it is even worse, if
possible. E.g. currently
https://msdn.microsoft.com/en-us/library/ms535917(v=vs.85).aspx
claims that <wbr> is deprecated or obsolete, that it was defined in HTML
4.01, that it “inserts a soft line break into a block of nobr text”
(reflecting the absurdity that in some versions of IE, <wbr> works only
inside a <nobr> element), and does not say a word about support in any
IE version.
The good news is that you can make <wbr> work in any reasonably new
version of IE using a CSS one-liner:
<style>
wbr:after { content: "\00200B" }
</style>
> It goes without saying that it was introduced in Netscape. I mean,
> that was the era of Browser Wars.
It was introduced in Netscape 1.1 in 1995, the same year that IE 1.0 was
published, so the browser wars had not started yet (IE 1.0 was too lousy
to be any challenge).
> Netscape repeatedly introduced
> proprietary behavior, hoping to force servers to accomodate every
> feature introduced in the client.
Uh, I don’t see how servers are involved here.
> I agree with that, but that's not the case now that it's standardized in
> HTML 5. Why doesn't it point to the Unicode character?
I think I participated in some HTML 5 development discussions related to
<wbr> but I don’t remember too well how it went. But I think there was
the opinion, supported by me, that <wbr> should be standardized due to
its usefulness and existing support. As mentioned in our discussion,
<wbr> is easier to write and read than any of the alternatives for
representing ZWSP in HTML.
--
Yucca,
http://www.cs.tut.fi/~jkorpela/