Jukka K. Korpela wrote:
> David E. Ross wrote:
>> You are trying to use numeric character references. PLEASE DO NOT, even
>> if you use Unicode code point numbers (which you are not).
>
> The only example mentioned was A for “A”. It surely uses a Unicode
> code point number.
^^^^^^
The proper term is “code point _value_”. Otherwise yes, 65 is the *decimal*
code point value for that Unicode character, U+0041 LATIN CAPITAL LETTER A.
Perhaps the underlying reason for the David E. Ross’ claim is this:
It is a common misconception that Unicode characters would only be
"characters that are not contained in 8-bit character sets" such as
ISO(/IEC)-8859-1. When in fact the Unicode character set is,
intentionally, by characters (not necessarily also by code point
value) a superset of all other character sets.
<
http://www.unicode.org/faq/basic_q.html>
<
http://www.unicode.org/charts/PDF/U0000.pdf>
>> Instead, use named character references
>
> They are pseudo-mnemonic
^^^^^^^^^^^^^^^^^^^^^^^^ (meaningless; “mnemonic” is _not_ an adjective)
> in an often misleading way,
^^^^^ ^^^^^^^^^^ (According to whom, for whom?)
(How often?)
<
https://en.wikipedia.org/wiki/Weasel_word>
> they do not add any expressive power,
IBTD. Sometimes they do as not all (monospace) fonts are equally suited to
display certain Unicode characters in a text editor.
Sometimes it is even syntactically necessary to use them, for example
“&” instead of a standalone “&” in a “href” attribute value.
> they do not exist except for a fraction of the set of Unicode characters,
True. Although the subset in HTML 5.x is larger than in HTML 4.01/XHTML
1.x, one can surmise that in both cases named character references have only
been *predefined* for the most used Unicode characters, including those that
would conflict with HTML and XML syntax rules.
> they appear in different sets in different versions of HTML and have had
> varying browser support, and their meanings have even changed from one
> version to another-
Cite evidence.
He was _not_ referring to “HTML 5.1 edition 8”, but to HTML 5.1 _2nd_
Edition, _§ (section)_ 8.
Probably he was not aware of HTML 5.2; it became a W3C Recommendation fairly
recently, on 2017-12-14.
>> While that chart does show Unicode code points, HTML does not support
^^^^^^^^^^^^^^^^^^^^^
>> all code points.
^^^^^^^^^^^^^^^
> Whatever you mean by that, it is most probably wrong.
The last part of their sentence is *definitely* wrong:
<
https://www.w3.org/TR/2017/REC-html52-20171214/infrastructure.html#unicode-code-point>
,-<
https://www.w3.org/TR/html/references.html#biblio-unicode>
|
| The Unicode Standard. URL:
https://www.unicode.org/versions/latest/
What can be said with certainty is that not all *fonts* support all Unicode
code points (regardless whether they are used for display in a text editor
or in Web browsers), so authors may need to provide the user with suitable
fonts if they use unusual characters, such as Egyptian hieroglyphs (this is
easier to accomplish now with Web fonts). But that has nothing to do with
HTML directly.
The author also appears to be blissfully unaware of character encodings and
typography.
As is obvious from my postings (which I write under GNU+Linux), the issue on
"UNIX platforms" is _not_ the use of “smart quotes”, "smart apostrophes",
and "smart single quotes" written under Micro$~1 Windows – those are just
*typographical* quotation marks and apostrophes, and usually the *correct*
way to write text¹ – but the bug in the software producing them not to
declare the corresponding character encoding, the bug in the software
displaying them not to support declared and IANA-registered character
encodings, or the problem of the user not to choose fonts that support
those proper characters.
As always, if the encoding declaration is missing, that leaves the user
agent (software) to guess the character encoding, and even educated guesses
can easily be wrong in this context.
(All of this is equally true for clueless non-Windows users and buggy
non-Windows software.)
_______
¹ Note the difference in my postings when I use "scare quotes" (straight
quotation marks) instead of “smart quotes” (typographical quotation
marks).
> The comp.infosystems.www.authoring.html
> newsgroup, one of the few Usenet groups left with
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Of which set of newsgroups?)
> some meaningful traffic,
^^^^ ^^^^^^^^^^^^^^^^^^ (According to whom, for whom?)
(How much?)
> in about HTML authoring for the World Wide Web, where numeric
^^^^^^^^^^^^^^^^^^^^^^ _not_ *only*
> character references have worked well since the 1990s.
^^^^^^^^^^^ (According to whom, for whom?)
Unfortunately, that is only your typical weasel-word argument again.
JFTR: IBTD.
However, as a matter of *fact*, HTML 4.01 is a Superseded Recommendation
since 2018-03-27:
,-<
https://www.w3.org/TR/2018/SPSD-html401-20180327/>
|
| HTML 4.01 Specification
|
| W3C Recommendation 24 December 1999
| superseded 27 March 2018
|
| […]
|
| This specification is a Superseded Recommendation. A newer specification
| exists that is recommended for new adoption in place of this
| specification. New implementations should follow the latest version
| <
https://www.w3.org/TR/html/> of the HTML specification.
Thus, HTML 4.01 and previous versions of HTML (e.g., HTML 3.2) that were not
already “obsolete” or “historic” (e.g. HTML 2.0), have been superseded by
HTML 5.2+.
PointedEars
--
Sometimes, what you learn is wrong. If those wrong ideas are close to the
root of the knowledge tree you build on a particular subject, pruning the
bad branches can sometimes cause the whole tree to collapse.
-- Mike Duffy in cljs, <
news:Xns9FB6521286...@94.75.214.39>