section and article tags

Stan Brown

unread,

Nov 5, 2016, 9:00:05 AM11/5/16

to

I see from https://html-differences.whatwg.org/ that HTML5 has new
<section> and <article> tags.

Is there any real benefit to using <section>? I understand that it
lets you style different <h2>s or <p>s differently, but I could do
that with <div>. And in terms of document structure, each header tag
introduces an implicit section anyway, if I'm reading the spec[1]
correctly. What am I missing?

I'm pretty sure that <article> doesn't apply to me, as every page of
mine is a single topic.

[1]https://www.w3.org/TR/html5/sections.html#the-section-element

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://BrownMath.com/
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You: http://preview.tinyurl.com/WhyWont

Jukka K. Korpela

unread,

Nov 5, 2016, 4:51:36 PM11/5/16

to

5.11.2016, 15:00, Stan Brown wrote:

> I see from https://html-differences.whatwg.org/ that HTML5 has new
> <section> and <article> tags.

They have been in HTML5 for quite some time. There is no demonstrable
benefit from using them; consider them as pseudo-religious markup.

> Is there any real benefit to using <section>?

No.

> I understand that it
> lets you style different <h2>s or <p>s differently, but I could do
> that with <div>.

Indeed, and you can do that with <div> more robustly. Admittedly
<section>-ignorant browsers (the crowd shouts “old versions of IE”) have
become rare.

> And in terms of document structure, each header tag
> introduces an implicit section anyway, if I'm reading the spec[1]
> correctly. What am I missing?

You might be missing the point that the “section” concept is an exercise
in futility. The concept as such is relatively simple; but in the HTML
context, it is illusionary, since relevant software does not care about it.

> I'm pretty sure that <article> doesn't apply to me, as every page of
> mine is a single topic.

Well, it could still be classified as an article, as a “a complete, or
self-contained, composition in a document, page, application, or site
and that is, in principle, independently distributable or reusable,
e.g. in syndication”. But this concept is both vague and pointless.
Can anyone point at some software that seriously treats it that way?

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Helmut Richter

unread,

Nov 5, 2016, 6:44:00 PM11/5/16

to

Am 05.11.2016 um 21:51 schrieb Jukka K. Korpela:

> 5.11.2016, 15:00, Stan Brown wrote:

>> Is there any real benefit to using <section>?

> No.

At least not for Jukka.

To some extent I am using HTML tags for documentation within the
document even if the effect on the optical representation is minimal. I
find it useful to have a tag by which I can mark the hierarchical
structure of a document: its chapters, sections, subsections etc. The
classical HTML way is to do that with <h1>, <h2>, ... but these have
closing tags only for the headline but not for the portion of text of
which it is the headline. Of course one could have replaced <section> by
<div><hx> and </section> by </div> but I find it useful to see what
purpose a <div> serves. The <section id="id1"> tag is also a good place
where the id goes which serves as anchor for <a href=#id1"> tag. Of
course, one could put it into the <hx> tag but I find it more correct to
say that the id denotes a chapter or section than a headline.

If you will, <section> differs from <div> in the same way as <em>
differs from <i>: it tells the semantic reason why it is there instead
of only providing a place where some CSS can be attached.

Well, the <section> tag has also an effect: it determines the relative
weight of a headline, e.g.

<section id="chapter2"><h1>Chapter Two</h1>
<section id="section2_1"><h1>Section One</h1>
...
</section>
</section>

should produce a bigger first than second headline (even though both are
<h1>) unless the CSS tells otherwise. In the first place I found this
feature handy as it allows to make a former chapter become a section
without reformatting its headlines. Meanwhile I find the drawbacks more
important than the benefits and I will not use it any longer. The
drawback is that the effect of the markup can be specified in
conflicting ways in the CSS which gives rise to hard-to-find errors.

>> I understand that it
>> lets you style different <h2>s or <p>s differently, but I could do
>> that with <div>.
>
> Indeed, and you can do that with <div> more robustly. Admittedly
> <section>-ignorant browsers (the crowd shouts “old versions of IE”) have
> become rare.

Even if a browser is <section>-ignorant, it will probably treat
<section> the intended way if you have in your CSS

section, article, aside, footer, header, nav, hgroup { display: block; }

>> And in terms of document structure, each header tag
>> introduces an implicit section anyway, if I'm reading the spec[1]
>> correctly. What am I missing?

Yes. But you do not see how far it goes. The same problem which you had
in very old-style HTML when people used <p> as a paragraph separator
intead of <p>...</p> as a paragraph bracket. It worked just fine because
of the implicit SGML rule which would implicitly insert the missing </p>
in the right places. It you use only <hx> the rule is that a <hx>
implicitly closes all <hy> ranges with y>=x. I prefer to see the places
where a section is closed instead of having to infer it. It has often
helped to find an inconsistency in the nesting of a longer document. I
use it to generate the table of contents of a multi-HTML-page article,
e.g. http://hhr-m.userweb.mwn.de/sw-fibel/contents.html

> You might be missing the point that the “section” concept is an exercise
> in futility. The concept as such is relatively simple; but in the HTML
> context, it is illusionary, since relevant software does not care about it.

Indeed. Comments in programs are also futile since relevant software
does not care about them.

It is up to everybody's own preferences to use or not to use a feature.
If you find it futile, do not use it. I find <section> useful: it
documents the structure of the whole text and costs nothing.

--
Helmut Richter

Stan Brown

unread,

Nov 5, 2016, 9:44:37 PM11/5/16

to

On Sat, 5 Nov 2016 09:00:05 -0400, Stan Brown wrote:
>
> Is there any real benefit to using <section>?

Thank you for your detailed answers, Jukka ad Helmut. ("Go not to
the Elves for counsel, for they will say both no and yes.")

If <section> actually did something in browsers, I would be more
willing to use it. But the uses for <section> that Helmut mentions
just don't seem all that useful to me. Classic YMMV, I guess.

Thomas 'PointedEars' Lahn

unread,

Nov 6, 2016, 1:45:06 PM11/6/16

to

Jukka K. Korpela wrote:

> 5.11.2016, 15:00, Stan Brown wrote:
>> I see from https://html-differences.whatwg.org/ that HTML5 has new
>> <section> and <article> tags.
>
> They have been in HTML5 for quite some time. There is no demonstrable
> benefit from using them; consider them as pseudo-religious markup.

Nonsense.

>> Is there any real benefit to using <section>?
>
> No.

Yes, there is. They are very helpful in structuring documents.

>> I understand that it
>> lets you style different <h2>s or <p>s differently, but I could do
>> that with <div>.
>
> Indeed, and you can do that with <div> more robustly.

Nonsense. By contrast, a “div” element has no semantics. It is not
possible to tell, without additional attributes and a second look, whether a
“div” element was put in to put structure into a document, or if it is
merely there for design/formatting purposes.

> Admittedly
> <section>-ignorant browsers (the crowd shouts “old versions of IE”) have
> become rare.

As a Web author, it is not primarily the browsers you should be concerned
about.

>> And in terms of document structure, each header tag
>> introduces an implicit section anyway, if I'm reading the spec[1]
>> correctly. What am I missing?
>
> You might be missing the point that the “section” concept is an exercise
> in futility.

Nonsense.

> The concept as such is relatively simple; but in the HTML context, it is
> illusionary, since relevant software does not care about it.

But people do. “section” elements are useful for authors and readers alike:
For authors, to structure their documents, to find the relevant parts in the
DOM inspector which eases Web development, and to place links to specific
sections; for readers, to have a table of contents which is made possible by
referrable sections, and to refer to sections by URI.

I strongly suggest that you brush up your Web knowledge lest you are
eventually left behind.

PointedEars
--
> If you get a bunch of authors […] that state the same "best practices"
> in any programming language, then you can bet who is wrong or right...
Not with javascript. Nonsense propagates like wildfire in this field.
-- Richard Cornford, comp.lang.javascript, 2011-11-14

Jukka K. Korpela

unread,

Nov 6, 2016, 2:02:35 PM11/6/16

to

6.11.2016, 0:44, Helmut Richter wrote:

> To some extent I am using HTML tags for documentation within the
> document even if the effect on the optical representation is minimal.

The effect of <section>, for example, is indeed minimal, namely none –
unless you count the effect of the implied display: block, but what you
put inside <section> virtually always starts and ends with line breaks
anyway. Of course you can style <section> as you like, but that applies
to any element, including <div>.

> I find it useful to have a tag by which I can mark the hierarchical
> structure of a document: its chapters, sections, subsections etc. The
> classical HTML way is to do that with <h1>, <h2>, ... but these have
> closing tags only for the headline but not for the portion of text of
> which it is the headline.

If you think of <section> tags as documentation tools, more or less
equivalent of  and , and you
find that convenient, keep using them. However, when reading HTML
source, these tags have the same problem as <div> and </div>: they don’t
tell the hierarchic level. A </section> or </div> tag does not tell
whether it ends a subsection, section, chapter, or part.

> The <section id="id1"> tag is also a good place
> where the id goes which serves as anchor for <a href=#id1"> tag. Of
> course, one could put it into the <hx> tag but I find it more correct to
> say that the id denotes a chapter or section than a headline.

This is of some practical relevance e.g. when the :target pseudo-class
is used in CSS. If you want to highlight the thing that the user “jumped
to”, it is more useful (mostly) to highlight the entire passage or
section referred to, rather than just its heading. But you can do this
just as well with <div>.

> If you will, <section> differs from <div> in the same way as <em>
> differs from <i>: it tells the semantic reason why it is there instead
> of only providing a place where some CSS can be attached.

The “semantic markup” meme has gone pretty wild; it is typical that the
meaning (i.e., semantics) of “semantic” is wrong. Almost always people
who advocate “semantic markup” mean structural markup. It is a matter of
a document’s structure that some part thereof constitutes a section (a
loosely defined concept, by the way). It does not say a word about the
meaning of that part.

The comparison is flawed, since <i> means italic. This is what it has
always meant in HTML and still means, no matter what scholastic
pseudo-definitions have been written in HTML5. It simply instructs the
browser to render the content in an italic typeface if possible. The
<em> element has nominally been for “emphasis”, whatever that means, but
for almost all practical purposes, <em> is just an alias for <i>. Even
if we take the nominal definition of <em> seriously, it does not say
anything about the meaning of the element or its content, just about its
relative importance. If I write “Do <em>not</em> do that!”, then the
markup does not change the meaning of the word “not”; it just tries to
emphasize the word.

> Well, the <section> tag has also an effect: it determines the relative
> weight of a headline, e.g.
>
> <section id="chapter2"><h1>Chapter Two</h1>
> <section id="section2_1"><h1>Section One</h1>
> ...
> </section>
> </section>
>
> should produce a bigger first than second headline

I think they gave up this idea. It was too confusing.

> Even if a browser is <section>-ignorant, it will probably treat
> <section> the intended way if you have in your CSS
>
> section, article, aside, footer, header, nav, hgroup { display: block; }

Sufficiently old versions of IE won’t. They need a magic JavaScript
incantation document.createElement('section') etc. I hope those versions
are mostly extinct now, but I haven’t really checked the developments.

> The same problem which you had
> in very old-style HTML when people used <p> as a paragraph separator
> intead of <p>...</p> as a paragraph bracket. It worked just fine because
> of the implicit SGML rule which would implicitly insert the missing </p>
> in the right places.

Not really. HTML was never implemented as an SGML application. (I
suppose the resident troll will pop in and make his verbose claim to the
contrary, failing to provide factual evidence, as usual.) Browsers have
tended to imply the end of a <p> element in a manner that accidentally
corresponds to the SGML rule mostly. But they have failed to do so at
times. Sometimes it matters whether you explicitly close your <p>
elements, so it’s safest to use </p>.

>I find <section> useful: it
> documents the structure of the whole text and costs nothing.

In terms of file size, data transfer time, and processing time, it costs
something – browsers do construct nodes for <section> elements. So it is
more accurate to say that the cost is ignorable,

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Ram Tobolski

unread,

Nov 6, 2016, 6:23:59 PM11/6/16

to

> If you will, <section> differs from <div> in the same way as <em>
> differs from <i>: it tells the semantic reason why it is there instead
> of only providing a place where some CSS can be attached.

I find this an interesting topic, even if not immediately practical. the idea that HTML code should be "semantic", and leave the visual presentation to CSS, goes back, I think, to the early days of HTML, where HTML pages were often thought of as a new kind of document - the hypertext. The novel aspect about hypertext documents were the links, which connected hypertext documents together. Wikipedia is probably the best surviving example of this older view of how HTML pages are supposed to look and behave.

Thomas 'PointedEars' Lahn

unread,

Nov 7, 2016, 6:08:11 AM11/7/16

to

Jukka K. Korpela wrote:

> 6.11.2016, 0:44, Helmut Richter wrote:
>> If you will, <section> differs from <div> in the same way as <em>
>> differs from <i>: it tells the semantic reason why it is there instead
>> of only providing a place where some CSS can be attached.
>
> The “semantic markup” meme has gone pretty wild; it is typical that the
> meaning (i.e., semantics) of “semantic” is wrong.

Fortunately, it is not you who defines what is wrong (in the English
language or elsewhere):

<https://en.oxforddictionaries.com/definition/semantics>, 1.1

PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm> (404-comp.)

Thomas 'PointedEars' Lahn

unread,

Nov 7, 2016, 7:40:08 AM11/7/16

to

Jukka K. Korpela wrote:

> 6.11.2016, 0:44, Helmut Richter wrote:
>> If you will, <section> differs from <div> in the same way as <em>
>> differs from <i>: it tells the semantic reason why it is there instead
>> of only providing a place where some CSS can be attached.
>
> The “semantic markup” meme has gone pretty wild; it is typical that the
> meaning (i.e., semantics) of “semantic” is wrong. Almost always people
> who advocate “semantic markup” mean structural markup. It is a matter of
> a document’s structure that some part thereof constitutes a section (a
> loosely defined concept, by the way). It does not say a word about the
> meaning of that part.

Wishful thinking. When I say “semantic markup” I am referring to element
types that have an intrinsic *meaning*. And I am _not_ alone in that at
all. That they can convey the meaning of structure as well is only a part
of that property.

It is true that a bare “section” element has the problem that you cannot
tell *at a glance* how deeply nested that element is in the document
structure. However, different to elements like “div” that have no
additional semantics, even bare “section” elements do have the advantage of
saying *that* they convey structure, while bare non-semantic elements
neither say at a glance whether they are there for structural reasons nor,
if so, how deeply nested they are.

The disadvantage of not being able to tell at a glance the nesting level can
be mitigated by adding a semantic ID or class name, whereas the latter also
makes it possible to refer to specific sections.

[Unfortunately, I find the the current HTML5 Specification to be not a
good example of this as, although “section” elements are used, what is
called “sections” there is not always marked up using “section” elements.
But it deserves mention anyway because HTML 5.1 is a W3C Recommendation
since almost a week ago now:

<https://www.w3.org/TR/2016/REC-html51-20161101/>.)

It can be argued that where a “section” element was introduced, elements for
“chapter”, “subsection” and “subsubsection” should have been introduced as
well, like in LaTeX. However, HTML is a multi-purpose markup language, and
so neither “chapter” nor “(sub)subsection” would have necessarily conveyed
their nesting level either without sacrificing flexibility in how authors
can write documents.

And frankly, this train has left the station, so we should not satisfy
ourselves with arguing about what should have been, but how we can use
available new element types as best as we can. That means that we should
not NOT use “section” elements only because, if bare, they are insufficient
to convey the nesting level, for there *is* intrinsic value in using them.
The same logic applies to other semantic markup, including but *not limited
to* those in

<http://html5doctor.com/downloads/h5d-sectioning-flowchart.png>

PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300...@news.demon.co.uk> (2004)

Thomas 'PointedEars' Lahn

unread,

Nov 7, 2016, 7:43:53 AM11/7/16

to

Thomas 'PointedEars' Lahn wrote:

> The disadvantage of not being able to tell at a glance the nesting level
> can be mitigated by adding a semantic ID or class name, whereas the latter
> also makes it possible to refer to specific sections.

I meant “the former”, of course; the ID serving as a URI(-reference)
fragment. However, with client-side scripting, class names can make it
possible as well (BTDT).

PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee

Jukka K. Korpela

unread,

Nov 7, 2016, 9:42:36 AM11/7/16

to

7.11.2016, 1:23, Ram Tobolski wrote:

> the idea that HTML code should be "semantic", and leave the visual
> presentation to CSS, goes back, I think, to the early days of HTML,

There are some statements in that direction in early drafts for HTML
specifications, but that was mostly just some people’s ideas rather than
the way HTML was implemented and used. It much resembles the “semantic
HTML” babble nowadays. But it was impossible to refer to CSS at that
time, in the early 1990s; sometimes “style sheets” as a general and
vague idea were mentioned.

> where HTML pages were often thought of as a new kind of document -
> the hypertext.

That’s a different issue, and HTML never stopped being hypertext.

> The novel aspect about hypertext documents were the
> links, which connected hypertext documents together.

Well, that’s more or less what (hyper)link means, except that a link may
also connect parts of a document and it may connect a hypertext document
with some document or resource in a different from, e.g. a plain text
document or a movie.

> Wikipedia is
> probably the best surviving example of this older view of how HTML
> pages are supposed to look and behave.

Wikipedia was established about ten years after the dawn of HTML.
Wikipedia has its own visual design, which would have been impossible
in the early days HTML (when there was no CSS and there very just a few
clumsy and limited ways of affecting visual rendering in HTML).

Wikipedia is heavily hypertextual, but this is consequence of its idea
and design rather than something from the early 1990s. It has a lot
of links mostly for the same reason that real dictionaries had
cross-references (e.g. “see Foobar” or “→Foobar”). But it has
exaggerated linking, too, and this might be seen as a holdover from the
early days. I remember a colleague saying, those days, that in proper
hypertext,
every word should be link – if there is nothing else to link to, you
can at least link to a description of the word in a general dictionary.
That was rather extremistic, and he did not actually do much authoring,
really, he just had ideas about it. But there *is* such a thing as too
much linking.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Helmut Richter

unread,

Nov 7, 2016, 11:11:34 AM11/7/16

to

Am 07.11.2016 um 15:42 schrieb Jukka K. Korpela:

> 7.11.2016, 1:23, Ram Tobolski wrote:
>
>> the idea that HTML code should be "semantic", and leave the visual
>> presentation to CSS, goes back, I think, to the early days of HTML,
>
> There are some statements in that direction in early drafts for HTML
> specifications, but that was mostly just some people’s ideas rather than
> the way HTML was implemented and used. It much resembles the “semantic
> HTML” babble nowadays. But it was impossible to refer to CSS at that
> time, in the early 1990s; sometimes “style sheets” as a general and
> vague idea were mentioned.

Well, HTML version HTML 2.0 (1995;
https://www.w3.org/MarkUp/html-spec/html-spec_toc.html) contains both
"idiomatic" (about what is now sometimes called "semantic") and
"typographic" elements -- and both kinds are still in HTML 5. Excerpt
from HTML 2.0:

Phrase Markup
Idiomatic Elements
Citation: CITE
Code: CODE
Emphasis: EM
Keyboard: KBD
Sample: SAMP
Strong Emphasis: STRONG
Variable: VAR
Typographic Elements
Bold: B
Italic: I
Teletype: TT
Anchor: A

At that time, you could choose whether you wanted something in italic
for any reason, or if you wanted to emphasize something with an <em>
element. Exactly the same as today where HTML 5
(https://www.w3.org/TR/html5/text-level-semantics.html#the-i-element)
makes the following recommendations about the use of the <i> element:

----- begin quotation -----

Authors can use the class attribute on the <i> element to identify why
the element is being used, so that if the style of a particular use
(e.g. dream sequences as opposed to taxonomic terms) is to be changed at
a later date, the author doesn't have to go through the entire document
(or series of related documents) annotating each use.

Authors are encouraged to consider whether other elements might be more
applicable than the <i> element, for instance the <em> element for
marking up stress emphasis, or the <dfn> element to mark up the defining
instance of a term.

----- end quotation -----

I do not see that much difference in philosophy. Then -- 20 years ago --
as well as today, authors are encouraged to use a semantic element if
there is one that fits the purpose, and to use a purely typographic one
if no other fits. Then as well as today, the set of semantic HTML
elements matches only a small, albeit frequently used, subset of
necessary semantics.

Nevertheless, if there is an element in HTML which fits the semantics I
wanted to specify anyway, I will use it instead of simulating it by a
<div> or <span> element with a class attribute. This is why I do use
<section>.

There are other initiatives trying to design markup for *all* semantics
in a set of documents by encouraging the extension of the markup, most
prominently TEI (tei-c.org). HTML never went in this direction.

HTML could have done a better job by allowing authors private markup
with no semantics at all but with the option of providing CSS for them,
e.g. '<funny>abc def</funny>' (I guess it would even work now in many
browsers but it is not standard-conformant). The standard-conformant
'<span class="funny">abc def</span>' is more cumbersome both to write
and to read. If this is considered a problem with future extensions, one
could have reserved a set of names, e.g. <x-funny>.

--
Helmut Richter

Helmut Richter

unread,

Nov 7, 2016, 11:32:45 AM11/7/16

to

Am 07.11.2016 um 17:11 schrieb Helmut Richter:

> HTML could have done a better job by allowing authors private markup
> with no semantics at all but with the option of providing CSS for them,
> e.g. '<funny>abc def</funny>' (I guess it would even work now in many
> browsers but it is not standard-conformant). The standard-conformant
> '<span class="funny">abc def</span>' is more cumbersome both to write
> and to read. If this is considered a problem with future extensions, one
> could have reserved a set of names, e.g. <x-funny>.

If you do not understand what I am talking about, please look at
http://hhr-m.userweb.mwn.de/de-decl/noun/ (a description how German
nouns undergo declension) and compare what the HTML source looks like.

--
Helmut Richter

John W Kennedy

unread,

Nov 7, 2016, 1:38:47 PM11/7/16

to

Hell, the original IBM GML from the mid-70s (implemented as a macro
processor for the mid-60s, wholly typographic SCRIPT engine) was mainly
about creating semantic markup.

--
John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
-- Charles Williams. "Taliessin through Logres: Prelude"

Jukka K. Korpela

unread,

Nov 7, 2016, 1:57:33 PM11/7/16

to

7.11.2016, 18:11, Helmut Richter wrote:

> Well, HTML version HTML 2.0 (1995;
> https://www.w3.org/MarkUp/html-spec/html-spec_toc.html) contains both
> "idiomatic" (about what is now sometimes called "semantic") and
> "typographic" elements -- and both kinds are still in HTML 5.

“Idiomatic” was an interesting choice of word, and so was “typographic”.
They should not be taken too seriously, and they never were.

> Phrase Markup
> Idiomatic Elements
> Citation: CITE
> Code: CODE
> Emphasis: EM
> Keyboard: KBD
> Sample: SAMP
> Strong Emphasis: STRONG
> Variable: VAR
> Typographic Elements
> Bold: B
> Italic: I
> Teletype: TT
> Anchor: A

Of these, TT does not exist in HTML5, except in the list of obsolete
elements. Other presentational or “typographic” markup elements were
retrofitted into the ideology of “semantic” markup, with
pseudo-scholastic treatises like this:
“The i element represents a span of text in an alternate voice or mood,
or otherwise offset from the normal prose in a manner indicating a
different quality of text, such as a taxonomic designation, a technical
term, an idiomatic phrase from another language, transliteration, a
thought, or a ship name in Western texts.”

The HTML 2.0 definition, “italic”, is a great improvement over that mess.

> At that time, you could choose whether you wanted something in italic
> for any reason, or if you wanted to emphasize something with an <em>
> element. Exactly the same as today

Yes, even today you can italicize using <i>, or you can imagine that
using <cite>, <em>, or <var> is “more semantic”, even though all of
these elements have very obscure definitions in HTML5 (and HTML 2.0
isn’t much better here).

> I do not see that much difference in philosophy. Then -- 20 years ago --
> as well as today, authors are encouraged to use a semantic element

And then, as well as today, a question like “what does it really matter”
is answered with babble that just repeats the “philosophy” more
obscurely. (I’m rather sure the resident troll cannot resist the
temptation to do that, repeating the babble he already wrote in this
thread.)

> Nevertheless, if there is an element in HTML which fits the semantics I
> wanted to specify anyway, I will use it instead of simulating it by a
> <div> or <span> element with a class attribute.

It’s really about structure, not semantics. And why would you use an
element when the only tangible effect is a change in rendering in a way
that you *don’t* want? Surely you can undo that effect, with the usual
CSS Caveats, with farely simple CSS, but why bother creating the
problem? For example, if your document contains address information
about the author of the document, do you use <address>, the “semantic”
element? The only thing you actually achieve is that the content appears
in italic, unless you prevent that with CSS. (I have great difficulties
in imagine a context where address information should be italicized. I
think they originally considered just e-mail addresses, for which it
might not be too bad – but pointless anyway.)

> There are other initiatives trying to design markup for *all* semantics
> in a set of documents by encouraging the extension of the markup, most
> prominently TEI (tei-c.org). HTML never went in this direction.

HTML was never really something designed by documentation specialists.
Rather, it was a very simple markup language to be rendered by simple
software. (Simpler, for example, than RTF.)

> HTML could have done a better job by allowing authors private markup
> with no semantics at all but with the option of providing CSS for them,
> e.g. '<funny>abc def</funny>' (I guess it would even work now in many
> browsers but it is not standard-conformant).

It works well in any reasonably modern browser. If you need to worship
“standards” (there is actually a single *standard* on HTML, and
virtually nobody cares about it: the “ISO HTML”, which is just a
slightly complicated reformulation of HTML 4), then you can’t use that
approach. But it’s a risky business. It is unlikely that any future
“standard” defines the <funny> element, but you did not really think of
writing <funny>, were you? The more natural your tag name is, the more
probable it is that some people will define it in a future “standard”,
probably in a manner that is more or less incompatible with your ideas,
e.g. causing some funny default rendering you didn’t expect.

> The standard-conformant
> '<span class="funny">abc def</span>' is more cumbersome both to write
> and to read.

It really does not matter that much. HTML markup is pointlessly verbose
anyway (and obscure too, on the other hand); think about using
<blockquote> instead of <indent>. And the <span> element itself is named
poorly; it could equally well have been <a>, if it were not so that this
element name means, for hystorical reasons, something that <link> (which
in turn means something cryptic) should have been used for.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jukka K. Korpela

unread,

Nov 7, 2016, 2:10:59 PM11/7/16

to

So you use markup like <span class="ex-affix">-en</span>. I don’t think
that’s too bad, in terms of verbosity.

You could use the more concise <i><b>-en</b></i>. Now, this could be a
problem if you use <i> and <b> for other purposes, too; this is one of
the (usually implied) arguments of the “semantic markup” school. Does it
really matter? You could start worrying about it if it ever becomes a
real problem. And at that point, you could ask yourself: if you use
italic for expressions in object language(s) in a linguistic treatment,
should you really use italic for any other purpose on the same page? If
the answer is ever affirmative, then you could use <i class="..."> for
such content so that you can style that differently from your plain <i>
(if you so want).

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Thomas 'PointedEars' Lahn

unread,

Nov 7, 2016, 5:25:37 PM11/7/16

to

Helmut Richter wrote:

> HTML could have done a better job by allowing authors private markup
> with no semantics at all but with the option of providing CSS for them,
> e.g. '<funny>abc def</funny>' (I guess it would even work now in many
> browsers but it is not standard-conformant).

(I think the proper term is “standards-compliant” or “standards-
conforming”.)

How did you get that idea?

That element will be rendered into an HTMLUnknownElement node in the
document tree. The markup is just not Valid as a markup validator has no
means, like a DTD, to determine if the unknown element is there
intentionally or is a mistake.

But, IIUC, if you really need that, in theory you can still embed markup in
what I call “XHTML5”, short for the “XHTML syntax” of HTML5(.1), in an XML
document:

<https://www.w3.org/TR/2016/REC-html51-20161101/xhtml.html#xhtml>

> The standard-conformant
> '<span class="funny">abc def</span>' is more cumbersome both to write
> and to read. If this is considered a problem with future extensions, one
> could have reserved a set of names, e.g. <x-funny>.

Your argument reminds me of

<http://html5doctor.com/lets-talk-about-semantics/#what-about-adding-more-elements>

:)

Dr J R Stockton

unread,

Nov 8, 2016, 6:44:15 PM11/8/16

to

In comp.infosystems.www.authoring.html message <nvq3ru$o0h$1@dont-
email.me>, Mon, 7 Nov 2016 16:42:36, Jukka K. Korpela
<jkor...@cs.tut.fi> posted:

> ...

> I remember a colleague saying, those days, that in proper hypertext,
>every word should be link – if there is nothing else to link to, you
>can at least link to a description of the word in a general dictionary.
>That was rather extremistic, and he did not actually do much authoring,
>really, he just had ideas about it. But there *is* such a thing as too
>much linking.

It would be a useful browser *option*, to help people who need to read
Web pages in a language for which they have a sufficient understanding
of the grammar but an insufficient vocabulary, to have every word act as
a dictionary link. That may already be available as an add-on!

--
(c) John Stockton, Surrey, UK. ¬@merlyn.demon.co.uk Turnpike v6.05 MIME.
Merlyn Web Site < > - FAQish topics, acronyms, & links.

Jukka K. Korpela

unread,

Nov 9, 2016, 8:08:50 AM11/9/16

to

9.11.2016, 1:27, Dr J R Stockton wrote:

> In comp.infosystems.www.authoring.html message <nvq3ru$o0h$1@dont-
> email.me>, Mon, 7 Nov 2016 16:42:36, Jukka K. Korpela
> <jkor...@cs.tut.fi> posted:
>
>> ...
>> I remember a colleague saying, those days, that in proper hypertext,
>> every word should be link – if there is nothing else to link to, you
>> can at least link to a description of the word in a general dictionary.
>> That was rather extremistic, and he did not actually do much authoring,
>> really, he just had ideas about it. But there *is* such a thing as too
>> much linking.
>
> It would be a useful browser *option*, to help people who need to read
> Web pages in a language for which they have a sufficient understanding
> of the grammar but an insufficient vocabulary, to have every word act as
> a dictionary link. That may already be available as an add-on!

Perhaps, and it is a built-in feature in some e-book readers (which
mostly interpret EPUB format, which is a packaged XHTML format), at
least for English. It is very useful, but it’s not linking at all. It
works independently of HTML markup and it uses dictionary lookup, not
dictionary links.

It is possible, of course, to write text data in HTML format so that
each word is in fact a dictionary link, and it is useful in special
cases. Some online reproductions of classical works use such an
approach. But then you have a problem if you want to create more normal
links: how do you distinguish them?

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Thomas 'PointedEars' Lahn

unread,

Nov 9, 2016, 1:03:02 PM11/9/16

to

Jukka K. Korpela wrote:

> It is possible, of course, to write text data in HTML format so that
> each word is in fact a dictionary link, and it is useful in special
> cases. Some online reproductions of classical works use such an
> approach. But then you have a problem if you want to create more normal
> links: how do you distinguish them?

It is possible, but not wise, to make every (registered) word in the
original markup a hyperlink, because, first of all that would increase the
download size of the document considerably at no corresponding advantage.
So one would do that dynamically, with client-side scripting only, instead.

But, in fact, these days you do not need hyperlinks to display additional
information for content. Words in a dictionary can be wrapped into elements
of other types, the pointer cursor can be modified, and appropriate event
listeners can be added to those elements. (There are some technical Web
sites that already do this for technical terms.)

So for the real hyperlinks (that are not in a navigation), underline them by
default, for example, and do not do that with the other "links".

On the other hand, if it is not every word and if it is used sparingly, then
it can be a real hyperlink, but one that displays a short explanation when
hovered over it (or when focused, for keyboard navigation), and a longer
one, maybe a link to Wikipedia, when clicked/activated. This can even be
done without any scripting, with non-empty “title” attribute values.

More considerations into how to distinguish "hyperlinks" for purely
explanatory purposes from those that lead to other non-dictionary-ish Web
sites have to be made when considering accessibility.

But it is certainly possible, and there are various ways to do it;
including, but not limited to, different pointer cursors, colors, and
decoration, and adjacent icons.

Dr J R Stockton

unread,

Nov 12, 2016, 6:46:18 PM11/12/16

to

In comp.infosystems.www.authoring.html message <nvv743$5l1$1@dont-
email.me>, Wed, 9 Nov 2016 15:08:48, Jukka K. Korpela
<jkor...@cs.tut.fi> posted:

>It is possible, of course, to write text data in HTML format so that
>each word is in fact a dictionary link, and it is useful in special
>cases. Some online reproductions of classical works use such an
>approach. But then you have a problem if you want to create more normal
>links: how do you distinguish them?

Write the page in HTML so that it displays as intended. Then choose
_one_ of (all untested) :-

(1) edit the HTML manually to encase every normal word in a span element
with an 'onclick location.href='.

(2) do that editing with a programmable editor such as MiniTrue.

(3) on body load, edit part of document.body with a magnificent
JavaScript RegExp and put it back.

(4) with body onload JavaScript, do the corresponding editing by walking
the DOM tree.

(5) ?? set body onclick to get the clicked word (if any) and set
location.href accordingly ??.

Go to
<http://web.archive.org/web/20150614002723/http://www.merlyn.demon.co.uk
/essai-3c.htm>, search for "233" and note the general appearance; then
search for the pseudo-word "UpdateTime", press the adjacent "Typeset"
button (which does a partial body rebuild first), wait a bit, then look
again in the 233 region.

My preprocessing script is *shown* at the bottom of that page.

Feel free to republish the page (with acknowledgement) in Finnish; but
check your version against the original French.

--
(c) John Stockton, near London. Mail ?.?.Stoc...@physics.org
Web < > - FAQish topics, acronyms, and links.