Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Ellipsis

5 views
Skip to first unread message

Stan Brown

unread,
Dec 18, 2016, 8:16:02 AM12/18/16
to
I've been validating some pages at validator.w3.org. It had a problem
with … for the ellipsis character, and I can understand that
since that's Windows character set, not iso-8859-1. So I changed them
all to ….

But I'm wondering about browser coverage. Is a significant number of
users likely to see a garbage character now, instead of ellipsis?
Should I just replace it with three dots instead?

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://BrownMath.com/
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You: http://preview.tinyurl.com/WhyWont

Jukka K. Korpela

unread,
Dec 18, 2016, 10:36:00 AM12/18/16
to
18.12.2016, 15:16, Stan Brown wrote:

> I've been validating some pages at validator.w3.org. It had a problem
> with … for the ellipsis character, and I can understand that
> since that's Windows character set, not iso-8859-1. So I changed them
> all to ….

That was a correct move in principle, though not really needed these
days. Browsers actually interpret … as the ellipsis character (and
this is even documented in HTML5). I don’t think you can find a browser
that doesn’t, except perhaps in a museum of technology.

> But I'm wondering about browser coverage. Is a significant number of
> users likely to see a garbage character now, instead of ellipsis?

No. I don’t think any user is.

> Should I just replace it with three dots instead?

A matter of style. The ellipsis character “…” is supposed to have dots
set more apart from each other than a sequence of three periods (FULL
STOP) characters, “...”, but this does not always happen. Apparently, in
a monospace font, it is just the opposite, very much so. Even in
proportional fonts, the design is not always what you might expect.

Using the ellipsis character is fine if you have reasonable expectations
for having it rendered in a manner where the dots are spaced acceptably.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Helmut Richter

unread,
Dec 18, 2016, 10:56:44 AM12/18/16
to
Am 18.12.2016 um 16:36 schrieb Jukka K. Korpela:

> 18.12.2016, 15:16, Stan Brown wrote:
>
>> I've been validating some pages at validator.w3.org. It had a problem
>> with … for the ellipsis character, and I can understand that
>> since that's Windows character set, not iso-8859-1. So I changed them
>> all to ….
>
> That was a correct move in principle

And demanded by W3C papers such as
https://www.w3.org/TR/WD-html40-970708/charset.html. Long before Unicode
was generally used by many people, it was defined to be the document
character set of HTML, irrespective of which encoding was actually used
in a file with HTML text. In the same paper, a numeric character entity
is defined to refer to the Unicode code point. -- I deliberately chose
an 20-year-old HTML 4.0 paper to demonstrate that this is not a modern
fad but has always (or nearly always) been so.

> though not really needed these days. Browsers actually interpret …
> as the ellipsis character (and
> this is even documented in HTML5). I don’t think you can find a browser
> that doesn’t, except perhaps in a museum of technology.

Depends whether one regards conformane to standards as needed, or
whether coincidental coverage by browsers is sufficient.

--
Helmut Richter

Stan Brown

unread,
Dec 18, 2016, 3:04:08 PM12/18/16
to
On Sun, 18 Dec 2016 08:16:01 -0500, Stan Brown wrote:
>
> I've been validating some pages at validator.w3.org. It had a problem
> with … for the ellipsis character, and I can understand that
> since that's Windows character set, not iso-8859-1. So I changed them
> all to ….
>
> But I'm wondering about browser coverage. Is a significant number of
> users likely to see a garbage character now, instead of ellipsis?
> Should I just replace it with three dots instead?

Thanks Jukka and Helmut, for your prompt and clear answers.

I suppose either way, … or …, I'm relying on the browser
to do what I want. But it's important to me to have my pages pass
validation without errors or warnings, I think I'll stick with the
Unicode character.

James Moe

unread,
Dec 18, 2016, 6:21:56 PM12/18/16
to
On 12/18/2016 06:16 AM, Stan Brown wrote:
> I've been validating some pages at validator.w3.org. It had a problem
> with … for the ellipsis character, and I can understand that
> since that's Windows character set, not iso-8859-1. So I changed them
> all to ….
>
> But I'm wondering about browser coverage. Is a significant number of
> users likely to see a garbage character now, instead of ellipsis?
> Should I just replace it with three dots instead?
>
Use "…" instead. It is properly translated to whatever the
character set is. In general it is safer to use character entities
rather than numeric escape sequences.

--
James Moe
jmm-list at sohnen-moe dot com
Think.

Dr J R Stockton

unread,
Dec 19, 2016, 6:56:01 PM12/19/16
to
In comp.infosystems.www.authoring.html message <MPG.32c0561ec73451db98fa
4...@news.individual.net>, Sun, 18 Dec 2016 08:16:01, Stan Brown
<the_sta...@fastmail.fm> posted:

>I've been validating some pages at validator.w3.org. It had a problem
>with &#133; for the ellipsis character, and I can understand that
>since that's Windows character set, not iso-8859-1. So I changed them
>all to &#x2026;.
>
>But I'm wondering about browser coverage. Is a significant number of
>users likely to see a garbage character now, instead of ellipsis?
>Should I just replace it with three dots instead?

The ellipsis characters that I have seen are weak, feeble, and thin.
But so are three dots ... .

Choose whatever is most visible in common fonts and browsers, and passes
your preferred validators.

The full stop itself is also commonly feeble, and that is particularly
bad as it is commonly used as a decimal point in English-speaking
locations.

--
(c) John Stockton, Surrey, UK. 拯merlyn.demon.co.uk Turnpike v6.05 MIME.
Merlyn Web Site < > - FAQish topics, acronyms, & links.


Jukka K. Korpela

unread,
Dec 20, 2016, 1:12:28 PM12/20/16
to
19.12.2016, 21:09, Dr J R Stockton wrote:

> The ellipsis characters that I have seen are weak, feeble, and thin.
> But so are three dots ... .

That is sadly true for many widely used fonts. It is, however, an issue
with fonts and typography, with no particular HTML aspect.

(HTML *could* have an element like <ellipsis> or an entity like
&ellipsis; with the definition that it should be rendered as an ellipsis
symbol in a manner that depends on the document language. But it doesn’t.)

> Choose whatever is most visible in common fonts and browsers,

That’s much more complicated than it sounds. We don’t really know what
fonts are common, though we may have some reasonable guesses. In most
cases, we can’t really pay much attention to such criteria when choosing
fonts, since there are so many criteria that are more crucial.

It’s so complicated that people may just decide it’s not worth it and
use the simple solution, “...” (three consecutive FULL STOP characters).

> and passes
> your preferred validators.

I don’t see how validators would be relevant here. Use “…” as such, or
&hellip;, or one of the equivalent numeric character references. The
choice does not matter as regards to the typography issue.

> The full stop itself is also commonly feeble, and that is particularly
> bad as it is commonly used as a decimal point in English-speaking
> locations.

It’s a problem character, as it has been for centuries, long before HTML
was created.



--
Yucca, http://www.cs.tut.fi/~jkorpela/

tlvp

unread,
Dec 20, 2016, 6:14:50 PM12/20/16
to
On Mon, 19 Dec 2016 19:09:11 +0000, Dr J R Stockton wrote:

> The ellipsis characters that I have seen are weak, feeble, and thin.
> But so are three dots ... .

So fatten them up: bracket them between <B> and </B> tags :-) .

Cheers, -- tlvp
--
Avant de repondre, jeter la poubelle, SVP.

Joy Beeson

unread,
Dec 20, 2016, 8:19:22 PM12/20/16
to
On Tue, 20 Dec 2016 20:12:30 +0200, "Jukka K. Korpela"
<jkor...@cs.tut.fi> wrote:

> It’s a problem character, as it has been for centuries, long before HTML
> was created.

Full stops used to punch holes in Mimeograph stencils.

--
Joy Beeson
joy beeson at comcast dot net
http://wlweather.net/PAGEJOY/


dorayme

unread,
Dec 28, 2016, 4:33:26 PM12/28/16
to
In article <jhij5cdgd605qagvc...@4ax.com>,
Joy Beeson <jbe...@invalid.net.invalid> wrote:

> Full stops used to punch holes in Mimeograph stencils.

They have a history of being very aggressive, they perpetrated war
crimes in the battles with commas. The semicolons tried to be the
neutral Swiss in those wars, being made up of both tribes - but, alas,
succumbed to civil war and eventually took sides with the main warring
parties.

--
dorayme

Thomas 'PointedEars' Lahn

unread,
Jan 14, 2017, 2:47:21 PM1/14/17
to
Helmut Richter wrote:

> Am 18.12.2016 um 16:36 schrieb Jukka K. Korpela:
>> 18.12.2016, 15:16, Stan Brown wrote:
>>> I've been validating some pages at validator.w3.org. It had a problem
>>> with &#133; for the ellipsis character, and I can understand that
>>> since that's Windows character set, not iso-8859-1.
>>>
>>> So I changed them all to &#x2026;.
>> That was a correct move in principle
>
> And demanded by W3C papers such as
> https://www.w3.org/TR/WD-html40-970708/charset.html.

That is not a “W3C paper”, but a W3C Working Draft (WD), and because of the
latter it is irrelevant evidence and a fallacy to cite it as a “demand”:

,-<https://www.w3.org/TR/WD-html40-970708/cover.html>
|
| Status of this document
|
| This is a W3C Working Draft for review by W3C members and other interested
| parties. It is a draft document and may be updated, replaced or obsoleted
| by other documents at any time. It is inappropriate to use W3C Working
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| Drafts as reference material or to cite them as other than "work in
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| progress". This is work in progress and does not imply endorsement by, or
^^^^^^^^^
| the consensus of, either W3C or members of the HTML working group.

As it is and has always been. (IIRC the two of us have discussed this
before. Is there a bug in the Matrix?)

The “other document” that replaced it, is

<https://www.w3.org/TR/1999/REC-html401-19991224/>

The URI of the corresponding section is

<https://www.w3.org/TR/1999/REC-html401-19991224/charset.html>.

And those are W3C _Recommendations_, not “demands”.

> Long before Unicode was generally used by many people, it was defined to
> be the document character set of HTML, irrespective of which encoding was
> actually used in a file with HTML text.

No, the document character set of HTML before version 5 is the _Universal
(Coded) Character Set_ (UCS; ISO/IEC 10646) which is only said in HTML 4.01
(REC) to be “character-by-character equivalent to Unicode”. And that is
only referring to *the Unicode version at the time* (1999 CE). In the case
of HTML 4.01, that was Unicode 3.0. [1]

However, as you can see in the changelog of Unicode 4.0 [2], among other
things it added support for Linear B. By coincidence recently I used the
first character of the Linear B Unicode range to demonstrate a problem with
characters beyond the Basic Multilingual Plane (BMP) of Unicode (in MySQL).
Therefore I know by heart that the code point of that character is U+10000,
just one codepoint beyond the BMP (U+0000 to U+FFFF). So if Unicode 4.0
introduced support for Linear B, this means that previous versions of
Unicode specified a character set that did not extend beyond the BMP.

Therefore HTML 4.01, which refers to Unicode 3.0 as an equivalent to the
Universal Character Set, is only specified to support characters within the
BMP.

This has changed since HTML5 which now uses “the Unicode character set” as
the one “used to represent textual data”, by which the most recent version
of Unicode is meant (as the version number was omitted from the reference):

<https://www.w3.org/TR/2014/REC-html5-20141028/infrastructure.html#dependencies>


PointedEars
___________
[1] <https://www.w3.org/TR/1999/REC-html401-19991224/references.html#ref-UNICODE>
[2] <http://www.unicode.org/versions/Unicode4.0.0/>
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee

Helmut Richter

unread,
Jan 14, 2017, 3:44:04 PM1/14/17
to
Am 14.01.2017 um 20:47 schrieb Thomas 'PointedEars' Lahn:

> Helmut Richter wrote:

>> Long before Unicode was generally used by many people, it was defined to
>> be the document character set of HTML, irrespective of which encoding was
>> actually used in a file with HTML text.
>
> No, the document character set of HTML before version 5 is the _Universal
> (Coded) Character Set_ (UCS; ISO/IEC 10646) which is only said in HTML 4.01
> (REC) to be “character-by-character equivalent to Unicode”. And that is
> only referring to *the Unicode version at the time* (1999 CE). In the case
> of HTML 4.01, that was Unicode 3.0. [1]

Thanks for the additional precision by your enhancements.

The main gist of my contribution was to emphasise that *even if* the
HTML document uses something else, e.g. some windows codepage, for its
representation, and *even if* the document itself (or the HTTP server in
a header, which takes precedence) declares that other character set,
Unicode or something equivalent *is all the same* the document character
set. This is what the OP really has to know.

> However, as you can see in the changelog of Unicode 4.0 [2], among other
> things it added support for Linear B. By coincidence recently I used the
> first character of the Linear B Unicode range to demonstrate a problem with
> characters beyond the Basic Multilingual Plane (BMP) of Unicode (in MySQL).

Yes, this is important for people providing Web pages in Linear B.

I do appreciate that you strive for absolute exactitude in matters of
standards, but the -- relatively simple -- information the OP needs can
be buried in detail that is absolutely irrelevant to him. I am sorry I
was unable to provide the same degree of exactitude but I still think I
provided a higher signal to noise ratio for the useful information.

--
Helmut Richter

dorayme

unread,
Jan 14, 2017, 4:35:43 PM1/14/17
to
In article <o5e2ij$jpn$1...@news.in.tum.de>,
Helmut Richter <hh...@web.de> wrote:

> Am 14.01.2017 um 20:47 schrieb Thomas 'PointedEars' Lahn:
>
> > Helmut Richter wrote:
>
...
> ... I am sorry I
> was unable to provide the same degree of exactitude but I still think I
> provided a higher signal to noise ratio for the useful information.

Sounds about right!

--
dorayme

Thomas 'PointedEars' Lahn

unread,
Jan 14, 2017, 5:01:09 PM1/14/17
to
Helmut Richter wrote:

> Am 14.01.2017 um 20:47 schrieb Thomas 'PointedEars' Lahn:
>> Helmut Richter wrote:
>>> Long before Unicode was generally used by many people, it was defined to
>>> be the document character set of HTML, irrespective of which encoding
>>> was actually used in a file with HTML text.
>>
>> No, the document character set of HTML before version 5 is the _Universal
>> (Coded) Character Set_ (UCS; ISO/IEC 10646) which is only said in HTML
>> 4.01 (REC) to be “character-by-character equivalent to Unicode”. And
>> that is only referring to *the Unicode version at the time* (1999 CE).
>> In the case of HTML 4.01, that was Unicode 3.0. [1]
>
> Thanks for the additional precision by your enhancements.
>
> The main gist of my contribution was to emphasise that *even if* the
> HTML document uses something else, e.g. some windows codepage, for its
> representation, and *even if* the document itself (or the HTTP server in
> a header, which takes precedence) declares that other character set,
> Unicode or something equivalent *is all the same* the document character
> set.

Again, guided by nothing more than smattering, you are confusing concepts
and terminology. The *representation*, the character *set*, of an HTML
document is either UCS or the Unicode character set (depending on the HTML
version); only the character *encoding* may be virtually anything (since
UCS/the Unicode character set is *designed to be* a proper superset of
virtually all character sets that humans have come up with).

It is important to realize – which you clearly have not yet – that the use
of “charset” e.g. in Content-Type header fields is *historic*; it was
defined when there was no difference between character set and corresponding
encoding. This is different since at least Unicode, for whose characters
there are several character encodings, namely at least UTF-8, UTF-16BE,
UTF16LE, and UTF-32. What was and is really meant with “charset” there is
the character *encoding*.

> This is what the OP really has to know.

No, the OP and, in general, the dedicated reader also _needs_ to know that
they may not safely use all Unicode characters in HTML 4.01, but they may do
so in HTML5 and beyond (the most recent W3C HTML REC is HTML 5.1), typeface
considerations aside.

>> However, as you can see in the changelog of Unicode 4.0 [2], among other
>> things it added support for Linear B. By coincidence recently I used the
>> first character of the Linear B Unicode range to demonstrate a problem
>> with characters beyond the Basic Multilingual Plane (BMP) of Unicode (in
>> MySQL).
>
> Yes, this is important for people providing Web pages in Linear B.

Archaelogists and linguists may in fact use Linear B syllables and ideograms
to publish their work about them in Web documents. (There are no “Web
pages”. That term, too, is historic.)

That said, if you cared to read more carefully, this was just a fitting
example to show that HTML 4.01 is not specified to support characters beyond
the BMP. There are also other, non-historic characters supported by the
Unicode ranges beyond the BMP.

> [fallacies]

This is a discussion group, not a support forum. Get a life or go away.

--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300...@news.demon.co.uk> (2004)

Adam H. Kerman

unread,
Jan 31, 2017, 3:24:20 PM1/31/17
to
Jukka K. Korpela <jkor...@cs.tut.fi> wrote:
>19.12.2016, 21:09, Dr J R Stockton wrote:

>>The ellipsis characters that I have seen are weak, feeble, and thin.
>>But so are three dots ... .

>That is sadly true for many widely used fonts. It is, however, an issue
>with fonts and typography, with no particular HTML aspect.

>(HTML *could* have an element like <ellipsis> or an entity like
>&ellipsis; with the definition that it should be rendered as an ellipsis
>symbol in a manner that depends on the document language. But it doesn’t.)

>>Choose whatever is most visible in common fonts and browsers,

>That's much more complicated than it sounds. We don't really know what
>fonts are common, though we may have some reasonable guesses. In most
>cases, we can't really pay much attention to such criteria when choosing
>fonts, since there are so many criteria that are more crucial.

>It's so complicated that people may just decide it's not worth it and
>use the simple solution, ... (three consecutive FULL STOP characters).

If there's no whitespace, those are points of suspension, not points of
ellipsis. Lots and lots of people mix up the two, even though they are
used for very different purposes.

If you don't want to rely upon an acceptable elipsis glyph being displayed
to the end user, the use . . . and not ...

The rest snipped, so . . .
0 new messages