What are the main differences between HTML 3.2, HTML 4 and XHTML??????

Voetleuce

unread,

Aug 10, 2002, 2:48:46 PM8/10/02

to

"Webpage Workshop" <dy...@webpageworkshop.co.uk> wrote in message
news:aj12vf$1785qa$1...@ID-150186.news.dfncis.de...

> To the OP, I would take every little word that Jukka says writes as being
> very well informed, and it would pay you well to listen to him. Anything
> that I have said that Jukka has broadened or corrected, I would take his
> version as he is a mch more educated person with much more experience than
> I.

Now of course we're just *waiting* for Jukka's "Learn HTML 4 and CSS 2 by
example[s]". Hint...

Jukka K. Korpela

unread,

Aug 11, 2002, 4:13:44 AM8/11/02

to

"Webpage Workshop" <dy...@webpageworkshop.co.uk> wrote:

> I didn't realise that was
> first in the 4.0 spec, I thought it was in 3.2, but I am wrong

For some further confusion, was in 3.2 but only with "size" and
"color" attributes; "face" was added in 4.0, simultaneously declaring all of
 as deprecated. Convincing, isn't it? :-)

> [XHTML 1.0] does afterall
> introduce a more standards based approach in that before you could type
> <img SRC="">, <img src="">, <img SRC=""> or <img src=""> whereas now it
> is simply <img src="" />.

But the _standards_ based approach is to allow variation, not to force into a
particular lexical syntax! The SGML approach allows broad variation. It
doesn't even fix things like "<" as the character that begins a tag; this
only needs to be fixed when defining a markup system in SGML. And SGML is a
standard, ISO 8879.

> I beleive that this makes reading the HTML
> produced by others a lot easier,

Probably, but how much does it matter? It's a matter of coding conventions
rather the definition of a markup system. In fact, I think the _formatting_
of HTML markup, such as using indentation and not using too long lines, is
far more important to readability of HTML markup than lower or upper case.

I would compare this to case sensitivity vs. case insensitivity in
programming languages. Good old FORTRAN allowed upper case only, since lower
case letters had not been invented yet. Then came the languages where case
doesn't matter and you could write the reserved words (keywords, somehow
corresponding to tag and attribute names in HTML) in any case or mixture of
cases you like, partly because there were still terminals that were able to
handle upper case only. Problems started with languages like C, Perl, and
JavaScript that made things case sensitive. This has caused quite some
confusion, when people haven't e.g. realized that something like
selectedOptions need to be typed exactly as is, that single "O" in uppercase
and all the rest in lower case. Admittedly the XHMTL approach of making
everything lower case is more sensible than such odd systems. But in a sense,
it means a return to the punched card era, with single case only.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Webpage Workshop

unread,

Aug 11, 2002, 7:14:17 AM8/11/02

to

Enough of the rubber Spock ears, thought Jukka K. Korpela hacking at
the keyboard hysterically

> "Webpage Workshop" <dy...@webpageworkshop.co.uk> wrote:
>
>> I didn't realise that was
>> first in the 4.0 spec, I thought it was in 3.2, but I am wrong
>
> For some further confusion, was in 3.2 but only with "size" and
> "color" attributes; "face" was added in 4.0, simultaneously declaring
> all of as deprecated. Convincing, isn't it? :-)

Yes, I noticed that when I took a look at your HTML 3.2 w/ examples pages.
You confused me somewhat saying that it was first in 4.0, but I understand
what you mean now - it was first declared as we know it now in 4.0 and was
also deprecated in the same spec. It is indeed convincing - they had no
faith in their new ideas.

>> [XHTML 1.0] does afterall
>> introduce a more standards based approach in that before you could
>> type <img SRC="">, <img src="">, <img SRC=""> or <img src="">
>> whereas now it is simply <img src="" />.
>
> But the _standards_ based approach is to allow variation, not to
> force into a particular lexical syntax! The SGML approach allows
> broad variation. It doesn't even fix things like "<" as the character
> that begins a tag; this only needs to be fixed when defining a markup
> system in SGML. And SGML is a standard, ISO 8879.

Well, variation in coding (or in this case markup) styles - that being
design - should be catered for by the standards based approach, but in my
eyes a standards based approach should convey the sticking to a particular
case!

>> I beleive that this makes reading the HTML
>> produced by others a lot easier,
>
> Probably, but how much does it matter? It's a matter of coding
> conventions rather the definition of a markup system. In fact, I
> think the _formatting_ of HTML markup, such as using indentation and
> not using too long lines, is far more important to readability of
> HTML markup than lower or upper case.

I quite agree, I tend not to use more than my application's width for a line
(which is generally 800px) so that it is easier for myself to edit - it is
not so important in my case for others to follow my source code. I also
ensure that elements are indented and nested in a way that stands out as a
flow - but then I guess that's what happens when you are used to programming
in C/Java etc. The way that you lay out a document in its code does go a
long way to ensuring usablility, but then I still think that sticking to the
lowercase has helped also - as a Computer systems w/ Psychology student I
have read studies which have shown that the human brain finds its more
difficult to process UPPERCASE and MiXEdCasE word (see for your self, they
are much harder to read than if I had used purely lowercase), which is
partially the source of my arguement.

> Admittedly the XHMTL approach of making
> everything lower case is more sensible than such odd systems. But in
> a sense, it means a return to the punched card era, with single case
> only.

Indeed, but it does mean that when I inherit some pages from one person I
wont have to go and change all the tags to lowercase (purely because upper
and mixed cases bug me!)

--
Dylan Parry
http://www.webpageworkshop.co.uk

frostie

unread,

Aug 11, 2002, 5:29:13 PM8/11/02

to

On Sun, 11 Aug 2002 08:13:44 +0000 (UTC), "Jukka K. Korpela"
<jkor...@cs.tut.fi> wrote:

>
>But the _standards_ based approach is to allow variation, not to force into a
>particular lexical syntax! The SGML approach allows broad variation. It
>doesn't even fix things like "<" as the character that begins a tag; this
>only needs to be fixed when defining a markup system in SGML. And SGML is a
>standard, ISO 8879.

Isn't this than a bit of an anomaly? A standard that allows such a
broad variation in syntax isn't much of a standard.

>> I beleive that this makes reading the HTML
>> produced by others a lot easier,
>
>Probably, but how much does it matter? It's a matter of coding conventions
>rather the definition of a markup system. In fact, I think the _formatting_
>of HTML markup, such as using indentation and not using too long lines, is
>far more important to readability of HTML markup than lower or upper case.

Perhaps there should be a new markup language to format the markup?
One thing I did keep from xhtml (before reverting to html) was the use
of lowercase. I found it much easier to read and follow.

--
frostie
http://www.brightonfixedodds.co.uk

Jukka K. Korpela

unread,

Aug 11, 2002, 5:57:13 PM8/11/02

to

frostie <frost...@ntlworld.com> wrote:

>>But the _standards_ based approach is to allow variation, not to force
>>into a particular lexical syntax! The SGML approach allows broad
>>variation. It doesn't even fix things like "<" as the character that
>>begins a tag; this only needs to be fixed when defining a markup system
>>in SGML. And SGML is a standard, ISO 8879.
>
> Isn't this than a bit of an anomaly? A standard that allows such a
> broad variation in syntax isn't much of a standard.

It might be seen as an anomaly from some current trends, which believe in
simplification, which inevitably leads to _new_ complexities somewhere.

But a good standard does not impose any limitations that need not be imposed.
The purpose is interoperability, not similarity.

Arjun Ray

unread,

Aug 11, 2002, 6:10:05 PM8/11/02

to

In <meldlukksgu3u0cgd...@4ax.com>, frostie

<frost...@ntlworld.com> wrote:
| On Sun, 11 Aug 2002 08:13:44 +0000 (UTC), "Jukka K. Korpela"
| <jkor...@cs.tut.fi> wrote:

|> But the _standards_ based approach is to allow variation, not to
|> force into a particular lexical syntax! The SGML approach allows
|> broad variation. It doesn't even fix things like "<" as the
|> character that begins a tag; this only needs to be fixed when
|> defining a markup system in SGML. And SGML is a standard, ISO 8879.

| Isn't this than a bit of an anomaly? A standard that allows such a
| broad variation in syntax isn't much of a standard.

On the contrary, a standard should allow inessential variation. The
SGML approach views syntax as irrelevant to post-parse processing - it
is perfectly admissible for a processing system not even to know that
starttags begin with "<", because it will have been one of the jobs of
an SGML parser to insulate the processing system from such inessential
details.

ISO 8879 actually fixes an _abstract syntax_, and rules to specify how
this is mapped to a _concrete syntax_. Thus, the real SGML delimiter in
Jukka's example is called STAGO ("Start Tag Open"); it is mapped to the
string "<" in the so-called Reference Concrete Syntax, which is a set of
default string mappings.

No one is required to use the RCS, but most systems use it anyway.

Unfortunately, one of the downsides of the RCS has been the ease with
which it has been misunderstood by the RTFM-challenged. Consider, for
example, a solecism like "the DOCTYPE tag" or - for those who imagine
they're somehow being more "precise" or "technical" - the seemingly more
accurate "!DOCTYPE tag".

The origin of such mistakes is to take it as given that anything between
a "<" and a ">" is a tag. This leads to mental parsing processes like

<!DOCTYPE ...> == "<" + "!DOCTYPE ..." + ">"
</foo> == "<" + "/foo" + ">"

and so on. However, the *correct* parsing is in terms of the abstract
syntax:

<!DOCTYPE ...> == MDO + "DOCTYPE ..." + MDC
</foo> == ETAGO + "foo" + TAGC

from which it should be clear that the doctype thingy isn't even a tag
at all - it's a _markup declaration_. Note also that the ">" in the two
cases are not even the same syntactic marker ("Markup Declaration Close"
in one case, "Tag Close" in the other), they are two distinct syntactic
functions mapped to the same string.

A similar jump to a wrong conclusion is with entity references and
character references, thanks to ERO ("Entity Reference Open") and CRO
("Character Reference Open") sharing the same first character in the RCS
bindings (respectively, "&" and "&#").

The *reason* for the RCS having such overlaps is to minimize the number
of ordinary characters that would need to be escaped in some fashion to
prevent their interpretation as the start of *some* markup sequence. It
was a reasonable economization. Only it presumed that users would have
a clue to what they were doing.

Wrong.