Zero-width spaces and early browsers ...

tlvp

unread,

Mar 21, 2016, 1:27:39 PM3/21/16

to

It's just come to my attention that the browser built into Windows Mobile 6
(as incorporated in the Motorola Q9m cellphone) misunderstands the ZWSP
entity … (aka  ), displaying it as a full-width "unknown"
glyph (i.e., unfilled, portrait-oriented rectangle).

What other browsers (if any), not over 10 years old, have similar problems?
(Win Mo 6 dates from 2007 or so.) Thanks. Cheers, -- tlvp
--
Avant de repondre, jeter la poubelle, SVP.

Jukka K. Korpela

unread,

Mar 21, 2016, 6:19:51 PM3/21/16

to

21.3.2016, 19:27, tlvp wrote:

> It's just come to my attention that the browser built into Windows Mobile 6
> (as incorporated in the Motorola Q9m cellphone) misunderstands the ZWSP
> entity … (aka  ), displaying it as a full-width "unknown"
> glyph (i.e., unfilled, portrait-oriented rectangle).

My educated guess is that this does not depend on the representation of
the character in HTML source (e.g. as an entity vs. the character itself
as UTF-8 encoded) but on the character. An old browser that does not
recognize the character as a zero-width character may try to pick up a
glyph for it from some font. A browser may process the character
properly either by using information about it as a zero-width character
(that allows line break) or by using a font that contains a correct
glyph for it, with an advance width of zero.

> What other browsers (if any), not over 10 years old, have similar problems?
> (Win Mo 6 dates from 2007 or so.) Thanks. Cheers, -- tlvp

The only that I know of is IE 6, which was issued in 2001, with the last
release in May 2008, so it might (or might not) qualify as “not over 10
years old”.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

tlvp

unread,

Mar 21, 2016, 9:09:22 PM3/21/16

to

Thanks, Jukka, The only other "older" cellphone browser I have experience
with is that with the 2.3.5 version of Android (as embodied in the Motorola
Droid X2 handset); and that understands perfectly well what to do with an
 cropping up in an HTML file it's rendering (what it does *not*
understand is how to render a .GIF graphic file :-) ).

So I think there's not enough reason for me to foreswear either the 
creature or the use of .GIF graphics files -- there are far too few
10-year-old-browser users who will be inconvenienced. Cheers, -- tlvp

Philip Herlihy

unread,

Mar 22, 2016, 7:07:27 AM3/22/16

to

In article <ncprs7$976$1...@dont-email.me>, jkor...@cs.tut.fi says...

>
> 21.3.2016, 19:27, tlvp wrote:
>
> > It's just come to my attention that the browser built into Windows Mobile 6
> > (as incorporated in the Motorola Q9m cellphone) misunderstands the ZWSP
> > entity … (aka  ), displaying it as a full-width "unknown"
> > glyph (i.e., unfilled, portrait-oriented rectangle).
>
> My educated guess is that this does not depend on the representation of
> the character in HTML source (e.g. as an entity vs. the character itself
> as UTF-8 encoded) but on the character. An old browser that does not
> recognize the character as a zero-width character may try to pick up a
> glyph for it from some font. A browser may process the character
> properly either by using information about it as a zero-width character
> (that allows line break) or by using a font that contains a correct
> glyph for it, with an advance width of zero.
>

..

Is that it's purpose? To signal to a browser where a line might be
broken?

--

Phil, London

Adam H. Kerman

unread,

Mar 22, 2016, 12:31:15 PM3/22/16

to

I guess it might be used within a very long word or after punctuation
that must not be followed by a space character, like forward slash or
em dash.

I find a reference to , an optional line break, that was widely
supported in browsers but not standardized until HTML 5.

Why are there multiple ways to do the same thing? Because, just because.

Adam H. Kerman

unread,

Mar 22, 2016, 12:33:58 PM3/22/16

to

Jukka K. Korpela <jkor...@cs.tut.fi> wrote:
>21.3.2016, 19:27, tlvp wrote:

>>It's just come to my attention that the browser built into Windows Mobile 6
>>(as incorporated in the Motorola Q9m cellphone) misunderstands the ZWSP
>>entity … (aka ), displaying it as a full-width "unknown"
>>glyph (i.e., unfilled, portrait-oriented rectangle).

>My educated guess is that this does not depend on the representation of
>the character in HTML source (e.g. as an entity vs. the character itself
>as UTF-8 encoded) but on the character. An old browser that does not
>recognize the character as a zero-width character may try to pick up a
>glyph for it from some font. A browser may process the character
>properly either by using information about it as a zero-width character
>(that allows line break) or by using a font that contains a correct

>glyph for it, with an advance width of zero. . . .

As it's a non-printing character, isn't that a MUST NOT as to whether a font
should include a glyph for it? That seems to be a bad thing.

tlvp

unread,

Mar 22, 2016, 8:36:15 PM3/22/16

to

On Tue, 22 Mar 2016 16:31:13 +0000 (UTC), Adam H. Kerman wrote:

> ... I find a reference to , an optional line break, that was widely
> supported in browsers but not standardized until HTML 5. ...

I'd be overjoyed to be allowed to use the mnemonically easy-to-recall 
rather than the impossible-to-remember *if* it were readily
supported in HTML 4.01 Transitional (the last version of HTML I have any
mastery of). Advice on that score? Thanks in advance. Cheers, -- tlvp

PS: Or can pages written to be valid as HTML 4.01 Transitional be assured
valid also as HTML 5? If so, I'd at once change all my !DOCTYPE lines :-) .

Adam H. Kerman

unread,

Mar 22, 2016, 9:43:18 PM3/22/16

to

tlvp <mPiOsUcB...@att.net> wrote:
>On Tue, 22 Mar 2016 16:31:13 +0000 (UTC), Adam H. Kerman wrote:

>>... I find a reference to , an optional line break, that was widely
>>supported in browsers but not standardized until HTML 5. ...

>I'd be overjoyed to be allowed to use the mnemonically easy-to-recall 
>rather than the impossible-to-remember *if* it were readily
>supported in HTML 4.01 Transitional (the last version of HTML I have any
>mastery of). Advice on that score? Thanks in advance. Cheers, -- tlvp

Alas, I'd wasn't familiar with it before looking it up for this thread.
Doing further reading, it's not standard in 4.01 but older browsers
supported it... with the exception of IE.
http://www.w3schools.com/tags/tag_wbr.asp

>PS: Or can pages written to be valid as HTML 4.01 Transitional be assured
>valid also as HTML 5? If so, I'd at once change all my !DOCTYPE lines :-) .

There's an awful lot of deprecated tags in there.

Richard Owlett

unread,

Mar 23, 2016, 7:05:25 AM3/23/16

to

On 3/22/2016 7:36 PM, tlvp wrote:[snip]
> [snip]

> PS: Or can pages written to be valid as HTML 4.01 Transitional be assured
> valid also as HTML 5? If so, I'd at once change all my !DOCTYPE lines :-) .
>

In general, likely no. However it may depend on your coding style.
I suggest using http://validator.w3.org/ to check individual pages.
I has an option to override the declared document type.
HTH

Jukka K. Korpela

unread,

Mar 23, 2016, 11:05:43 PM3/23/16

to

22.3.2016, 18:33, Adam H. Kerman wrote:

> As it's a non-printing character, isn't that a MUST NOT as to whether a font
> should include a glyph for it? That seems to be a bad thing.

I don’t see anything wrong with a glyph for ZWSP, provided that the
glyph is empty and has an advance width of 0. This is useful when the
software used to render text does not recognize the ZWSP character as
having a special meaning.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jukka K. Korpela

unread,

Mar 23, 2016, 11:05:44 PM3/23/16

to

22.3.2016, 18:31, Adam H. Kerman wrote:

> Philip Herlihy <thiswillb...@you.com> wrote:
[...]
>> Is that it's [= ZWSP] purpose? To signal to a browser where a line might be

>> broken?
>
> I guess it might be used within a very long word or after punctuation
> that must not be followed by a space character, like forward slash or
> em dash.

The meaning of ZWSP (as defined in the Unicode Standard) is to indicate
a direct line break opportunity. A normal space usually does the same,
but the point is that ZWSP has no spacing effect. So it could be used
after “/” or “—” to allow a line break there.

However, it should *not* be used within a *word*, except in contexts
where a simple break (with no hyphen at the end of the line) is
permitted—as it is in many writing systems, but not in languages using
Latin letters (except in contexts like bird-cage where the hyphen
is part of the word). The tag name is thus misleading: it comes from
“word break”, but is more like “string break”.

> I find a reference to , an optional line break, that was widely
> supported in browsers but not standardized until HTML 5.

is now defined in W3C HTML5 REC so that it “represents a line
break opportunity”. There is a better description at MDN:

“The HTML element word break opportunity represents a position
within text where the browser may optionally break a line, though its
line-breaking rules would not otherwise create a break at that location.

On UTF-8 encoded pages, behaves like the U+200B ZERO-WIDTH SPACE
code point.”

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/wbr

The page incorrectly claims that “This element was first implemented in
Internet Explorer 5.5”. It was actually Netscape 1.1, much earlier.

> Why are there multiple ways to do the same thing? Because, just because.

When was invented, it would have been very unrealistic to expect
browsers to deal with fancy characters like ZWSP. Initially HTML was
defined and implemented so that only the 8-bit Latin 1 character
repertoire was used, and “internationalization” of HTML was at an early
stage when was coined.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Adam H. Kerman

unread,

Mar 23, 2016, 11:50:10 PM3/23/16

to

Jukka K. Korpela <jkor...@cs.tut.fi> wrote:

Quite frankly, I'd want another glyph I can spot to be substituted for it,
else how can I make sure I put it in place?

Adam H. Kerman

unread,

Mar 24, 2016, 12:21:04 AM3/24/16

to

Jukka K. Korpela <jkor...@cs.tut.fi> wrote:

>22.3.2016, 18:31, Adam H. Kerman wrote:
>>Philip Herlihy <thiswillb...@you.com> wrote:

>[...]
>>>Is that it's [= ZWSP] purpose? To signal to a browser where a line might be
>>>broken?

>>I guess it might be used within a very long word or after punctuation
>>that must not be followed by a space character, like forward slash or
>>em dash.

>The meaning of ZWSP (as defined in the Unicode Standard) is to indicate
>a direct line break opportunity. A normal space usually does the same,

Except a space is a word boundary (except in a URL); I suppose this
character isn't.

>but the point is that ZWSP has no spacing effect. So it could be used
>after "/" or "—" to allow a line break there.

>However, it should *not* be used within a *word*,

A URL is a word. They're often annoyingly long. That's what I was thinking of
being desireable to break at a selected location.

>except in contexts where a simple break (with no hyphen at the end of
>the line) is permitted—as it is in many writing systems, but not in
>languages using Latin letters (except in contexts like bird-cage
>where the hyphen is part of the word). The tag name is thus misleading:
>it comes from "word break", but is more like "string break".

Right. I wonder if it's desireable to preserve it when copying.

>>I find a reference to , an optional line break, that was widely
>>supported in browsers but not standardized until HTML 5.

> is now defined in W3C HTML5 REC so that it "represents a line
>break opportunity". There is a better description at MDN:

>"The HTML element word break opportunity represents a position
>within text where the browser may optionally break a line, though its
>line-breaking rules would not otherwise create a break at that location.

>On UTF-8 encoded pages, behaves like the U+200B ZERO-WIDTH SPACE
>code point."

> https://developer.mozilla.org/en-US/docs/Web/HTML/Element/wbr

Except that it's NOT the Unicode character, despite both being used for
the same purpose.

>The page incorrectly claims that "This element was first implemented in
>Internet Explorer 5.5". It was actually Netscape 1.1, much earlier.

No, it reads,

Support for the tag was introduced in Internet Explorer 5.5,
though removed again in version 7.

which is a footnote to the browser compatibility table.

It goes without saying that it was introduced in Netscape. I mean,
that was the era of Browser Wars. Netscape repeatedly introduced
proprietary behavior, hoping to force servers to accomodate every
feature introduced in the client.

>>Why are there multiple ways to do the same thing? Because, just because.

>When was invented, it would have been very unrealistic to expect
>browsers to deal with fancy characters like ZWSP. Initially HTML was
>defined and implemented so that only the 8-bit Latin 1 character
>repertoire was used, and "internationalization" of HTML was at an early
>stage when was coined.

I agree with that, but that's not the case now that it's standardized in
HTML 5. Why doesn't it point to the Unicode character?

tlvp

unread,

Mar 24, 2016, 1:56:00 AM3/24/16

to

On Wed, 23 Mar 2016 01:43:17 +0000 (UTC), Adam H. Kerman wrote:

>>PS: Or can pages written to be valid as HTML 4.01 Transitional be assured
>>valid also as HTML 5? If so, I'd at once change all my !DOCTYPE lines :-) .
>
> There's an awful lot of deprecated tags in there.

Yup. I tried changing DOCTYPE declaration on a sample valid HTML 4.01
Transitional page of mine to what identifies it as HTML 5. Validator was
*utterly* devasted by what I had done ... and so, consequently, was I.

I immediately changed it back to what it was and resolved, if I must use
 in *drafting* a new HTML page, to replace that with once I'm
done editing, before going live with it.

(HTML 5 forces too many new-things-to-learn on my platter :-) .)

Cheers, and thanks for all the pertinent thoughts, ruminations, and advice,

-- tlvp

tlvp

unread,

Mar 24, 2016, 2:22:54 AM3/24/16

to

On Tue, 22 Mar 2016 16:31:13 +0000 (UTC), Adam H. Kerman wrote, of ZWSP:

> it might be used within a very long word or after punctuation
> that must not be followed by a space character, like forward slash or
> em dash.

Having noticed the utility of this construct, in the forms or 
or their equivalents, for letting old IE 7 do line-flowing for long URLs, I
now see that Firefox needs no such instruction -- before a long URL such as

: http://www.gita-society.com/section3/sivasahasranama.htm ,

for example, bumps into the side border of the viewport (seen when one
drags the browser window narrower), FF reflows the line that it's in, by
breaking it at a suitable "/" or "-", without any further prompting.

Is that common behavior for today's browsers, particularly those on cell
phones? If so, I might not need to worry nearly so much about exploiting
 as I had originally feared.

Jukka K. Korpela

unread,

Mar 24, 2016, 3:51:05 AM3/24/16

to

24.3.2016, 6:21, Adam H. Kerman wrote:

>> The meaning of ZWSP (as defined in the Unicode Standard) is to indicate
>> a direct line break opportunity. A normal space usually does the same,
>
> Except a space is a word boundary (except in a URL); I suppose this
> character isn't.

Word boundaries are where you define them to be, but the Unicode default
word boundary rules classify ZWSP as being in word boundary class “Any”.
This means they are word boundaries.
http://unicode.org/reports/tr29/

> A URL is a word.

Not in the normal meaning of the word “word”. A URL does not correspond
to a spoken word. For example, in English, a word is written using
letters A–Z, a–z, possibly some accented letters like “é”, and possibly
hyphens. Anything else in writing is not a word but e.g. a punctuation
mark or a special symbol.

The computerese meanings for “word” are something completely different.
This includes the technical concept of a maximal string of
non-whitespace characters, like the thing between spaces in
“foo §½+/y612cf&!0#£ bar”. If in doubt, ask your neighbor whether he
would call that a word.

> They're often annoyingly long. That's what I was thinking of
> being desireable to break at a selected location.

URLs should rarely appear in text. In HTML documents, they should appear
as attribute values (e.g. in href=...), not as content. Normally you
should have URLs in content only when your text is *about* URLs, like a
description of URL syntax and some URLs shown as examples. Then you
should probably put each URL on a line of its own. It may still be too
long, and then you need to consider setting allowable line break points.

>> except in contexts where a simple break (with no hyphen at the end of
>> the line) is permitted—as it is in many writing systems, but not in
>> languages using Latin letters (except in contexts like bird-cage
>> where the hyphen is part of the word). The tag name is thus misleading:
>> it comes from "word break", but is more like "string break".
>
> Right. I wonder if it's desireable to preserve it when copying.

When copying data in HTML format, it is. If the ZWSP character were used
instead and you copied data from an HTML document as plain text, it’s a
bit debatable. I would say that ZWSP should normally be retained, but it
depends on the use of the copied text whether it needs to be removed.

>> On UTF-8 encoded pages, behaves like the U+200B ZERO-WIDTH SPACE
>> code point."
>
>> https://developer.mozilla.org/en-US/docs/Web/HTML/Element/wbr
>
> Except that it's NOT the Unicode character, despite both being used for
> the same purpose.

The quoted text does not say that it is the Unicode character; it says
it behaves like it.

>> The page incorrectly claims that "This element was first implemented in
>> Internet Explorer 5.5". It was actually Netscape 1.1, much earlier.
>
> No, it reads,
>
> Support for the tag was introduced in Internet Explorer 5.5,
> though removed again in version 7.
>
> which is a footnote to the browser compatibility table.

What I quoted appears in the text proper of the page, at the start of
the fourth paragraph.

Support for in IE is a messy and frustrating story. Please don’t
get me started. And their documentation of it is even worse, if
possible. E.g. currently
https://msdn.microsoft.com/en-us/library/ms535917(v=vs.85).aspx
claims that is deprecated or obsolete, that it was defined in HTML
4.01, that it “inserts a soft line break into a block of nobr text”
(reflecting the absurdity that in some versions of IE, works only
inside a element), and does not say a word about support in any
IE version.

The good news is that you can make work in any reasonably new
version of IE using a CSS one-liner:

<style>
wbr:after { content: "\00200B" }
</style>

> It goes without saying that it was introduced in Netscape. I mean,
> that was the era of Browser Wars.

It was introduced in Netscape 1.1 in 1995, the same year that IE 1.0 was
published, so the browser wars had not started yet (IE 1.0 was too lousy
to be any challenge).

> Netscape repeatedly introduced
> proprietary behavior, hoping to force servers to accomodate every
> feature introduced in the client.

Uh, I don’t see how servers are involved here.

> I agree with that, but that's not the case now that it's standardized in
> HTML 5. Why doesn't it point to the Unicode character?

I think I participated in some HTML 5 development discussions related to
 but I don’t remember too well how it went. But I think there was
the opinion, supported by me, that should be standardized due to
its usefulness and existing support. As mentioned in our discussion,
 is easier to write and read than any of the alternatives for
representing ZWSP in HTML.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jukka K. Korpela

unread,

Mar 24, 2016, 5:06:39 AM3/24/16

to

If you refer to a context where you enter ZWSP as a character in an
editor or in an authoring program, then it’s surely a relevant question
whether it should be displayed visibly. But this depends on the software
used. You don’t need to tweak a font to contain a visible glyph for
ZWSP; instead, you need to make the program recognize ZWSP and display a
symbol instead.

To avoid confusion with real printable characters, the symbol should
probably be colored, so you really to display it as an image of some
kind rather than as a glyph from a font. For example, Microsoft Word
optionally (i.e. when you select the “¶ mode”) displays ZWSP as a small
gray rectangle with an even smaller rectangle inside it.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jukka K. Korpela

unread,

Mar 24, 2016, 6:27:50 AM3/24/16

to

24.3.2016, 8:22, tlvp wrote:

> Having noticed the utility of this construct, in the forms or 
> or their equivalents, for letting old IE 7 do line-flowing for long URLs, I
> now see that Firefox needs no such instruction -- before a long URL such as
>
> : http://www.gita-society.com/section3/sivasahasranama.htm ,

That’s not a URL – and it does not work if you e.g. copy and paste it
into a browser’s address bar. It contains ZWSP characters before the
last two slashes (odd places for it).

> for example, bumps into the side border of the viewport (seen when one
> drags the browser window narrower), FF reflows the line that it's in, by
> breaking it at a suitable "/" or "-", without any further prompting.

If you use the real URL
http://www.gita-society.com/section3/sivasahasranama.htm
Firefox does what you describe. The reason is that applies its own line
breaking rules, which allow a break after “/” and (under some
conditions) after “-”. This is *bad*. Breaking a URL after “-” is never
acceptable, as it makes it impossible to know (without testing) whether e.g.
http://www.gita-
society.com
means http://www.gita-society.com or http://www.gitasociety.com

Breaking after “/” is acceptable *provided that* the user knows that a
URL is displayed and it is clear where it ends, e.g. when the URL is
shown in a distinctive color or it has some wrapper characters, e.g.

Please visit the page <http://www.gita-society.com/
section3/sivasahasranama.htm> to see...

> Is that common behavior for today's browsers, particularly those on cell
> phones?

Browsers have different line breaking rules, varying from very bad to
potentially useful. My old and messy page
http://www.cs.tut.fi/~jkorpela/html/nobr.html
documents some of the problems. I’m afraid things are getting even worse.

For example, if your content (as opposite to markup) contains any slash
(solidus) character “/”, you should take into account that some browsers
will treat the string as breakable after it, some don’t.

So if you have a string that should not be broken that way, e.g. “I/O”,
wrap it inside a element or (to comply with “standards”) a 
element for which you set white-space: nowrap in CSS.

And if have a string that may be broken that way, e.g. “input/output”
(let’s assume you wish to allow breaking there), use or ZWSP after
the “/”.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

tlvp

unread,

Mar 27, 2016, 2:14:43 AM3/27/16

to

On Mon, 21 Mar 2016 13:27:39 -0400, tlvp wrote regarding:

> ... the ZWSP
> entity … (aka  ) ... .

I now wish to thank Jukka, Philip, Adam, and Richard for their many
offerings of comments, responses, facts, and observations, all of which I
have found helpful and edifying: many, many thanks!

I must also apologize for having erroneously misused the term URL to refer
only to the expression coming after the characters http:// in

> http://www.gita-society.com/section3/sivasahasranama.htm

So let me take this opportunity to improve my terminological mastery:

what term should I be using to designate the initial http portion,
what term should I be using to designate the subsequent :// part,
and what term describes the remainder, from www. through .htm ?

Or should I have decomposed that into three other segments, the initial

http:

(complete with trailing : ), the domain

//www.gita-society.com/

(complete with leading // ), and the final full path-specific filename

/section3/sivasahasranama.htm

? If that's how to break the components apart, how are *they* called?

Thanks heaps! (Yes, I know, you'd think I'd have learned this stuff by now,
but Google doesn't work in this direction, and I *will* have learned it
soon enough, with your kind help :-) .) Cheers, and thanks again, -- tlvp

Stan Brown

unread,

Mar 27, 2016, 6:48:33 AM3/27/16

to

On Sun, 27 Mar 2016 02:14:29 -0400, tlvp wrote:
> So let me take this opportunity to improve my terminological mastery:
>
> what term should I be using to designate the initial http portion,
> what term should I be using to designate the subsequent :// part,
> and what term describes the remainder, from www. through .htm ?
>
> Or should I have decomposed that into three other segments, the initial
>
> http:
>
> (complete with trailing : ), the domain
>
> //www.gita-society.com/
>
> (complete with leading // ), and the final full path-specific filename
>

> ?/section3?/sivasahasranama.htm

Sort of.

"URLs always start with a protocol (http) and usually contain
information such as the network host name (example.com) and often a
document path (/foo/mypage.html). URLs may have query parameters and
fragment identifiers."

http://stackoverflow.com/questions/176264/what-is-the-difference-
between-a-uri-a-url-and-a-urn/1984225#1984225

The : and // are separators, I believe, certainly not part of the
network host name.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://BrownMath.com/
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You: http://preview.tinyurl.com/WhyWont

Adam H. Kerman

unread,

Mar 27, 2016, 9:48:37 AM3/27/16

to

tlvp <mPiOsUcB...@att.net> wrote:
>On Mon, 21 Mar 2016 13:27:39 -0400, tlvp wrote regarding:

>>... the ZWSP
>>entity … (aka ) ... .

>I now wish to thank Jukka, Philip, Adam, and Richard for their many
>offerings of comments, responses, facts, and observations, all of which I
>have found helpful and edifying: many, many thanks!

This was an interesting discussion and I learned a few things too.

>I must also apologize for having erroneously misused the term URL to refer
>only to the expression coming after the characters http:// in

>>http://www.gita-society.com/section3/sivasahasranama.htm

>So let me take this opportunity to improve my terminological mastery:

>what term should I be using to designate the initial http portion,
>what term should I be using to designate the subsequent :// part,
>and what term describes the remainder, from www. through .htm ?

RFC 3986 https://tools.ietf.org/html/rfc3986

Portions were updated by other RFCs, but not for this purpose.

"http" is the scheme component. ":" is a separator. "//" indicates that
the scheme includes a hierarchical element for a naming authority which
governs the name space to which interpretation is delegated. Further
explanation is in Section 3.2; it's hard to summarize. Basically, it
means that the syntax specific to http is found in another document.

"www.gita-society.com[:port]" is the authority component.

From Section 3.3, "/section3/sivasahasranama.htm" is the path
component, which (if present) must begin with "/" and must not begin
with "//"; each "/" separates the path into a sequence of path segments.

dorayme

unread,

Mar 27, 2016, 5:45:30 PM3/27/16

to

In article <nd8obj$fgo$1...@news.albasani.net>,

"Adam H. Kerman" <a...@chinet.com> wrote:

> "http" is the scheme component. ":" is a separator.

Was there an important need for ":"? A space would not have sufficed?

--
dorayme

tlvp

unread,

Mar 27, 2016, 7:32:25 PM3/27/16

to

On Sun, 27 Mar 2016 02:14:29 -0400, tlvp wrote:

> what term should I be using to designate the initial http portion,
> what term should I be using to designate the subsequent :// part,
> and what term describes the remainder, from www. through .htm ?

> ... etc. ...

Stan, Adam, thanks to you both for providing me with the terminology sought
-- I think I've got it now: protocol, hostname or authority, and document
path, with suitable punctuational "separators" ( : , // , / ) as
connectives :-) ; and, perhaps, optional added parameters at the tail end.

Cheers, -- tlvp

PS to dorayme: I'd guess that a space is not approved as separator in place
of the : because because the separators here all play the role of
connectives, which a space is congenitally incapable of doing (which is why
it was necessary to develop a wholly separate Non-Breaking SPace). -- tlvp

dorayme

unread,

Mar 28, 2016, 4:38:42 AM3/28/16

to

In article <rf1usd1rlwm.10...@40tude.net>,

tlvp <mPiOsUcB...@att.net> wrote:

> PS to dorayme: I'd guess that a space is not approved as separator in place
> of the : because because the separators here all play the role of
> connectives, which a space is congenitally incapable of doing (which is why
> it was necessary to develop a wholly separate Non-Breaking SPace).

A space in a context might not be congenitally capable of it, but why
could not a context make it a separator. You were not born a king, but
given the right context you could be.

PS: I'd guess that

PS I'd guess that

The former is conventional and more verbose than the latter. The
context of "PS" followed by a space might act as a separator as a
convention?

--
dorayme

Philip Herlihy

unread,

Mar 28, 2016, 2:44:21 PM3/28/16

to

In article <rf1usd1rlwm.10...@40tude.net>,
mPiOsUcB...@att.net says...

I read that Tim Berners-Lee, who came up with this format, observed that
the second forward-slash before the host component is unnecessary, in
hindsight.

--

Phil, London