Introduction to HTML 3.2 (PostScript)

146 views
Skip to first unread message

Marty Hall

unread,
Dec 19, 1996, 3:00:00 AM12/19/96
to

I teach a WWW/Java development course here in the Hopkins part-time MS
program in Computer Science, and have finally gotten around to
collecting my various handouts into a (hopefully) complete yet concise
chapter on HTML 3.2.

Before I use it next semester, I was hoping to get feedback from some
of the local HTML gurus on technical correctness and from the
non-gurus on readability and helpfulness.

http://www.apl.jhu.edu/~hall/WWW/Thumbnails/HTML32.ps

Happy holidays-
- Marty

Arjun Ray

unread,
Dec 21, 1996, 3:00:00 AM12/21/96
to

[ posted and emailed ]

In <x5vi9y5...@rsi.jhuapl.edu>,
Marty Hall <ha...@apl.jhu.edu> writes:

| I teach a WWW/Java development course here in the Hopkins part-time
| MS program in Computer Science, and have finally gotten around to
| collecting my various handouts into a (hopefully) complete yet
| concise chapter on HTML 3.2.

| http://www.apl.jhu.edu/~hall/WWW/Thumbnails/HTML32.ps

In a sense, the first sentence says it all:

> WWW pages are created using the HyperText Markup Language (HTML), which
> lets you mix text to be directly displayed with "markup" tags telling
> the WWW browser how the result should look.

Quite frankly, this material falls well short of any minimal standard
for a postgraduate course in Computer Science. I'm guessing the author
has based these notes only on experience with a few browsers and some
casual research on the 'Net. There is no evidence of familiarity with
the readily available archived materials or understanding of the
authoritative specs. Rather, almost all of the substantive commentary
on HTML is poorly worded or plain wrong in one way or another, with
errors in history, concepts, and terminology. Some examples:

1. History.

Dates have a nasty habit of getting in the way of convenient stories:

> Until 1996, most browsers supported HTML 2.0 [...] In order to provide
> additional capabilities, a large number of nonstandard extensions were
> supported by one or more major vendors. In order to standardize these
> features and provide powerful new capabilities that would reduce the
> incentive for vendors to introduce nonstandard extensions, the World Wide
> Web Consortium (W3C) attempted to define HTML 3.0 as the next generation
> HTML standard.

The 3.0 spec was not a response to the proliferation of nonstandard
extensions. It was the new name for the HTML+ spec, the first official
draft of which was published in Nov 1993 [1]. What became HTML 2.0 had
been published as an Internet-Draft in June 1993; there was already a
clear realization that standardizing common features and working on a
"state-of-the-art" spec had to be parallel efforts. The IETF HTML
Working Group was formed in late July 1994 [2] and simply picked up
where the old CERN projects (focused through the www-talk and later
also the www-html mailing lists [3]) had left off, since the essential
personnel were the same.

The point is that as of mid 1994, 3.0 was the spec that implementors
were being invited (if not expected) to support. Far from being a
response, it was an attempt to *lead*, based on consensus among the
experts in the field.

FYI, the first beta of Netscape 0.9 was announced in Oct 1994 [4]. The
HTML 3.0 spec (and the name!) had been well-known for months by then.

> However, this undertaking proved overly ambitious given the state of
> flux in this technology area, there were too many controversial features,
> and the effort was dropped.

The effort was dropped because the "major vendors" who emerged *later*
declined to implement that spec. The "nonstandard extensions" in many
cases were poorly conceived and sloppily implemented alternatives.

> Instead, HTML 3.2, an intermediate specification, was drafted. HTML 3.2
> is intended to reflect consensus among the major vendors on what
> features could be expected to be widely supported as of early 1997.

Both 2.0 and 3.2 are simply descriptions of "current practice" as of
certain dates, approximately mid '94 and mid '96 respectively. From
the Abstract of RFC 1866 [5], the HTML 2.0 spec:

= HTML has been in use by the World Wide Web (WWW) global information
= initiative since 1990. This specification roughly corresponds to the
= capabilities of HTML in common use prior to June 1994.

And from the Wilbur draft spec [6] for HTML 3.2:

= This specification defines HTML version 3.2. HTML 3.2 aims to
= capture recommended practice as of early '96 and as such to be used
= as a replacement for HTML 2.0 (RFC 1866).

Neither spec was ever intended as state-of-the-art, as 3.0 was. The
"history" that places 2.0, 3.0 and 3.2 in a unified sequence is bogus.


2. Concepts and Terminology.

The syntax of HTML is the Reference Concrete Syntax of SGML, which has
been an international standard (ISO 8879) since 1986. Section 3 of RFC
1866 has a very good introduction to the syntax. The full text of ISO
8879 is included in _The SGML Handbook_ by Charles Goldfarb [7]. A BNF
for SGML is available online also [8].

There is no need to concoct explanations in a graduate level CS course
when sources exist with formal exact definitions and authoritative
commentary.

> HTML _elements_ are indicated by markup _tags_ in angle brackets.

> HTML elements can also have _attributes_ [...] For instance, with
> <HR NOSHADE>, HR is the main tag and NOSHADE is the attribute

> Finally, many attributes have _values_ that follow an equals sign.

> Some elements are _containers_, which have a start tag (e.g. <BODY>)
> and a corresponding end tag that starts with "/" (e.g. </BODY>).

The syntax may *appear* that way, but this characterisation is bogus.

SGML defines three abstract delimiters for tags: STAGO, ETAGO, and
TAGC. These delimiters can have different string bindings in SGML
systems; the Reference Concrete Syntax binds them to "<", "</" and ">"
respectively. There is no "/" seemingly independent of angle brackets,
and the angle brackets by themselves are not a notation.

The _tag_ is the entirety of '<HR NOSHADE>', not 'HR' (or '/BODY'.) HR
is the name of the element (called the _generic identifier_). Names
are *common* to start-tags and end-tags (e.g. '<BODY>' and '</BODY>'),
because the tags themselves are actually just grouping devices, like
"named" parentheses.

And, *all* attributes have values. When the values constitute a _name
token group_ (i.e. a fixed enumeration), the full name=value syntax
can be _minimized_ by omitting the *name* of the attribute and the
equal sign. 'NOSHADE', just like 'ISMAP', is the value, not the name!
Syntax like '<H1 CENTER>', omitting the 'ALIGN=', is perfectly legal,
browser limitations notwithstanding.

> HTML documents start with a DOCTYPE element

No such thing. _Declarations_ are an entirely different category of
markup from tags, and have nothing to do with elements per se. The
surface similarity of lexical form is for authorial convenience; the
abstract delimiters in this case are MDO and MDC, bound in the RCS to
"<!" and ">" respectively. That's right: the ">" in <!DOCTYPE...> and
the ">" in <HTML> are syntactically *different* tokens.


It's reasonably clear that the author should familiarize himself with
the rudiments of SGML [9] to avoid fundamentally misinforming his
students. But I confess I'm distressed by the possibility that course
notes such as these are not uncommon in academia.


[1] http://www.w3.org/pub/WWW/MarkUp/HTMLPlus/htmlplus_1.html
[2] http://www.acl.lanl.gov/HTML_WG/
[3] http://www.eit.com/www.lists/
[4] http://www.eit.com/www.lists/www-talk.1994q4/0187.html
[5] http://ds.internic.net/rfc/rfc1866.txt
[6] http://www.w3.org/pub/WWW/TR/PR-html32-961105
[7] Oxford Univ Press 1990 ISBN 0-19-853737-9
[8] ftp://ftp.ifi.uio.no/pub/SGML/productions
[9] http://www.w3.org/pub/WWW/MarkUp/SGML/


:ar


Alan J. Flavell

unread,
Dec 22, 1996, 3:00:00 AM12/22/96
to

On Sat, 21 Dec 1996, Arjun Ray wrote:

> In <x5vi9y5...@rsi.jhuapl.edu>,
> Marty Hall <ha...@apl.jhu.edu> writes:

...


> > WWW pages are created using the HyperText Markup Language (HTML), which
> > lets you mix text to be directly displayed with "markup" tags telling
> > the WWW browser how the result should look.

This is not the HTML language that I am familiar with. Indexing robots,
and speaking machines, and even character cell browsers, have quite
different ideas than the typical graphical browser about "how the result
should look"; they are all valid interpretations of the logical
structures marked up by standard HTML.

> Quite frankly, this material falls well short of any minimal standard
> for a postgraduate course in Computer Science.

On that basis alone, it has to be bogus. It appears to be describing
the degenerate page layout language that the popular browser vendors are
trying to turn HTML into, with the help of masses of newcomers who
haven't developed an understanding of the benefits of a presentation-
independent logical markup.

> The point is that as of mid 1994, 3.0 was the spec that implementors
> were being invited (if not expected) to support. Far from being a
> response, it was an attempt to *lead*, based on consensus among the
> experts in the field.

I remember that...

> FYI, the first beta of Netscape 0.9 was announced in Oct 1994 [4]. The
> HTML 3.0 spec (and the name!) had been well-known for months by then.
>
> > However, this undertaking proved overly ambitious given the state of
> > flux in this technology area,

Not true. QUite apart from the W3C's experimental browser, arena,
there's a one-man-job browser (UdiWWW) that implements large parts of
HTML3.0. The "overly ambitious" part has _little_ to do with the
"technology area", and _a_lot_ to do with business practices, as far as
I can see.

> The effort was dropped because the "major vendors" who emerged *later*
> declined to implement that spec. The "nonstandard extensions" in many
> cases were poorly conceived and sloppily implemented alternatives.

Right

> > Instead, HTML 3.2, an intermediate specification, was drafted. HTML 3.2
> > is intended to reflect consensus among the major vendors on what
> > features could be expected to be widely supported as of early 1997.
>
> Both 2.0 and 3.2 are simply descriptions of "current practice" as of
> certain dates, approximately mid '94 and mid '96 respectively.

I've been told "Spring '96", and the Wilbur spec itself says "early
'96", doesn't it? So "mid '96" might not be quite accurate. But your
reply is spot-on as to principles, and the numerical value "3.2" is
more inclined to mislead than to inform. "2.3" might have been more
realistic!

> And from the Wilbur draft spec [6] for HTML 3.2:
>
> = This specification defines HTML version 3.2. HTML 3.2 aims to
> = capture recommended practice as of early '96 and as such to be used
> = as a replacement for HTML 2.0 (RFC 1866).

> Neither spec was ever intended as state-of-the-art, as 3.0 was. The
> "history" that places 2.0, 3.0 and 3.2 in a unified sequence is bogus.

Quite so. How sad to see history re-invented within a mere couple
of years.

> There is no need to concoct explanations in a graduate level CS course
> when sources exist with formal exact definitions and authoritative
> commentary.

Well, it _is_ surely necessary to expose the kiddies to reality, i.e
that the major browser vendor does not believe in SGML and continues
to deliberately invent new constructs that fly in the face of SGML
standards?

Nevertheless, the standard continues to assert that HTML is an
application of SGML, so it behoves any serious teacher to take that
into proper account.

--


Arjun Ray

unread,
Dec 23, 1996, 3:00:00 AM12/23/96
to

In <Pine.HPP.3.95.96122...@hpplus05.cern.ch>,
"Alan J. Flavell" <fla...@mail.cern.ch> writes:
| On Sat, 21 Dec 1996, Arjun Ray wrote:

|> The "history" that places 2.0, 3.0 and 3.2 in a unified sequence is
|> bogus.

| Quite so. How sad to see history re-invented within a mere couple
| of years.

It's known as Spin Control. Consider the latest agonism over Netscape
and stylesheets. Naturally, The Official Hagiography will need to be
some convenient and comfortable story, perhaps like this:

(From
http://home.netscape.com/comprod/products/communicator/guide.html)

= Style sheets. In the past, Web page designers wishing to affect
= aspects of page design such as colors and text sizes had to write
= HTML tags for each element. With style sheets, designers not only
= get complete control over these elements, they also can build a
= style sheet standard that can be leveraged repeatedly (similar to a
= template).

In the past? HTML tags? Really? Why was that? Who invented <FONT>?

Here's the history that must not come to light:

1. Oct 13 1994 - Marc A. announces the first beta of Netscape 0.9:
http://www.eit.com/www.lists/www-talk.1994q4/0187.html

2. Oct 10 1994 - Hakon Lie announces the first public draft of CSS:
http://www.eit.com/www.lists/www-talk.1994q4/0153.html

That's right, folks, Cascading Style Sheets are at least *that* old.
And surely a draft was the result of prior discussions, no?

3. Oct 14 1994 - Marc A. clarifies some time lines:
http://www.acl.lanl.gov/HTML_WG/html-wg-94q4.messages/0088.html

= we started the company April 4, hired the core staff in April and
= May, started coding June 1, entered alpha/beta around Sep 1, and
= just released to the net -- things have been a bit hectic :-).

Doing what, one wonders, given that stylesheets had been *the* topic
of discussion since 1993 on the www-talk and www-html mailing lists:
http://www.eit.com/www.lists/

4. May 31 1994 - Dave Raggett refers everyone to the proceedings at
WWW'94 (where, BTW, HTML+ became HTML 3.0):
http://www.eit.com/www.lists/www-html.1994q2/0012.html

5. May 25-27 1994 - The First International Conference on the WWW:
http://www.eit.com/www.lists/www-talk.1993q4/0898.html

During that summer, the media blitz over "Jim Clark and the Mosaic
boys" had been relentless. Would such "talent" find a way to support
stylesheets?

Nope. With fanfare, flourish and folderol, <CENTER> and <FONT> came
into the world...

= In the past, Web page designers wishing to affect aspects of page
= design such as colors and text sizes had to write HTML tags for each
= element.

Humbug? Disingenuousness? Of course not! Stylesheets are new! The
Official Hagiographies will all say so!


:ar


Alan J. Flavell

unread,
Dec 23, 1996, 3:00:00 AM12/23/96
to

On Mon, 23 Dec 1996, Arjun Ray wrote:

(quoting something from Netscape)

> ... With style sheets, designers not only


> = get complete control over these elements, they also can build a
> = style sheet standard that can be leveraged repeatedly (similar to a
> = template).

Half right, eh? They can finally use a common style sheet, just as
was envisaged back in 1993. But "complete control"? I'd like to
see them achieve the display of a given named font, in a given color,
on a system with a monochrome screen and without the named font.

(Arjun reviews the known history, justifiably fearful that it will be
swept away...)

> Nope. With fanfare, flourish and folderol, <CENTER> and <FONT> came
> into the world...

(continuing quote from "history rewritten by the market leader"):

> = In the past, Web page designers wishing to affect aspects of page
> = design such as colors and text sizes had to write HTML tags for each
> = element.

Er, no. There _were_ no HTML tags for colors and text sizes in 1993,
AFAIR. There were the temporary palliatives of <b> and <i> and that was
about it, as far as typography was concerned. If I'm not mistaken, HTML+
introduced underscore and strikeout. All else was meant to be done with
logical structures. <BIG> and <SMALL> came later, with HTML3.0. Slowly
the presentation-based markups were chipping away at the foundations.
Correct me if I'm wrong.

> Humbug? Disingenuousness? Of course not! Stylesheets are new! The
> Official Hagiographies will all say so!

Well, CSS1 hadn't been drafted out in the amount of detail that it has
been now. Instead we had the diversion of effort into what were, in end
effect, blind alleys for a portable content-based markup.

Those blind alleys are great for pretending that authors are in control
of some unknown platform though, and for giving authors the opportunity
to blame their readers whenever anything went wrong. Isn't that what
it's all about nowadays?

(sorry if I seem to lack festive cheer. this is depressing stuff.)

Marty Hall

unread,
Dec 30, 1996, 3:00:00 AM12/30/96
to

ar...@nmds.com (Arjun Ray) writes:

[My PostScript summary of HTML 3.2 at
http://www.apl.jhu.edu/~hall/WWW/Thumbnails/HTML32.ps]


> Quite frankly, this material falls well short of any minimal standard
> for a postgraduate course in Computer Science. I'm guessing the author
> has based these notes only on experience with a few browsers and some
> casual research on the 'Net. There is no evidence of familiarity with
> the readily available archived materials or understanding of the
> authoritative specs. Rather, almost all of the substantive commentary
> on HTML is poorly worded or plain wrong in one way or another, with
> errors in history, concepts, and terminology. Some examples:

I stand well and truly corrected. I suppose I should have mentioned
that we only spend 1.5 weeks (out of 14) on HTML, with the majority on
Java and another smaller section on CGI programming. Nevertheless, that
is no reason that I should get the HTML part wrong.

I appreciate the input from an expert, and will try again and repost
when the updated version is available.

- Marty

Alan J. Flavell

unread,
Dec 30, 1996, 3:00:00 AM12/30/96
to

On 30 Dec 1996, Marty Hall wrote:

> I stand well and truly corrected. I suppose I should have mentioned
> that we only spend 1.5 weeks (out of 14) on HTML, with the majority on
> Java and another smaller section on CGI programming. Nevertheless, that
> is no reason that I should get the HTML part wrong.

Kudos and a round of applause for someone who's willing to stand
up in public and be so open. I hope I'll be as honest myself, the
next time an occasion comes up.

Have a successful New Year.


Reply all
Reply to author
Forward
0 new messages