XHTML: The incremental XML-upgrade path

Jorn Barger

unread,

Aug 4, 1998, 3:00:00 AM8/4/98

to

.
I think the W3C has XML all wrong.

Requiring DTDs puts a straitjacket on the average web-page, which is
freeform text and not fascist-lockstep-database-oriented.

And, contrary to the ivory-tower theory, text layout has virtually
nothing to do with the sorts of semantics XML-folk are interested in:
you won't ever have a FAXNUM-style that you'll use for every fax number
on every page.

But only a tiny portion of the Web is pure database, not nearly enough
to swing a majority of surfers... nor authors, who'll have to start
maintaining two versions of every page if they want true XML support.

But obviously, we need to move in the direction of better semantic
markup, so what's needed is a compromise like what MSIE now offers-- XML
tags mixed in with normal HTML. (I call this XHTML, but that's not an
official designation yet, afaik.)

But I'm also losing confidence that the XML-crowd will come up with a
tagset that the Rest of Us can really use.

I'm picturing this natural upgrade path:

- even before there's any market for it, a major search engine could add
a specialised-search option where you can find a phrase tagged, eg,
'FAXNUM' by querying "+faxnum:312"

- a lot of authors could then start experimenting with a lot of new
tags, and a few of these would catch on in a big way, because they make
certain specialised searches vastly more efficient

- new refinements would then be built evolutionarily on top of these
proven successes

This approach bypasses XML's do-it-our-way-or-else incompatible-upgrade
problem...

So what might some of these tag-successes be?

I use Excite NewsTracker a lot, and it uses an auto-summarise algorithm
to 'summarise' the text into a short paragraph. Often, these include
useless marginalia like navigation-bar-text or even javascripts. Also,
my search patterns (eg 'LeCarre') will often return every page of a new
zine, because each page includes the same linktext ('Click here to read
our groovy new LeCarre interview'). So there might be a payoff in a
simple <CONTENT> tag that declares what should be indexed.

Also, authors could pick a few vivid sentences and tag them <PULLQUOTE>
so that instead of the autosummary, NewsTracker could show the author's
choice for the most useful extract.

I have grave doubts about the conventional tag-guesses like FAXNUM and
NAME, for this upgrade path. They'd be a lot of effort to maintain, and
for search, most of this sort of keyword belongs in a META header (which
people hardly use anyway).

j
--
I EDIT THE NET: <URL:http://www.mcs.net/~jorn/html/weblogs/weblog.html>
"The cold hard truth is that portals serve no purpose beyond collecting
a set of links to information you may or may not care about." --NetSkink

Tov Are Jacobsen

unread,

Aug 4, 1998, 3:00:00 AM8/4/98

to

The things I like about XML is:

1. Small standard.
2. Easy validation ( "fascist-lockstep-database-oriented :-)" )
3. Loads of nice parsers to make programming easy.
4. I also think that "real" information should be separated from layout.

Jorn, why do you bother with XML when you don't like it?

Tov Are Jacobsen.

Jorn Barger <jo...@mcs.com> wrote in message
1dd8aad.1hb...@jorn.pr.mcs.net...

Jelks H. Cabaniss, Jr.

unread,

Aug 4, 1998, 3:00:00 AM8/4/98

to

Jorn Barger wrote:

>Requiring DTDs puts a straitjacket on the average web-page, which is
>freeform text and not fascist-lockstep-database-oriented.

The average web pages I've seen need a good fascist-lockstep spanking.

/Jelks

Robert Ducharme

unread,

Aug 4, 1998, 3:00:00 AM8/4/98

to

>Requiring DTDs puts a straitjacket on the average web-page

XML doesn't require DTDs. See the "?" at the end of production 22,
http://www.w3.org/TR/REC-xml#NT-prolog.

>But I'm also losing confidence that the XML-crowd will come up with a
>tagset that the Rest of Us can really use.

A key reason for XML's creation was the widespread realization that no
single tagset could serve everybody. (Well, maybe not that
widespread--I guess you never realized it.) Groups within various
industries have developed tagsets to serve those industries. This is
the point of XML: letting people design their own document types,
whether for their personal use or to share with others have similar
needs.

Bob DuCharme <bob@ ~~~ NEW E-MAIL ADDRESS ~~~
snee.com> Free SGML software and how to use it: "SGML CD"
from Prentice Hall. See http://www.snee.com/bob/sgmlfree.

samsa

unread,

Aug 4, 1998, 3:00:00 AM8/4/98

to

Jorn Barger wrote in message <1dd8aad.1hb...@jorn.pr.mcs.net>...

>.
>I think the W3C has XML all wrong.
>

>Requiring DTDs puts a straitjacket on the average web-page, which is
>freeform text and not fascist-lockstep-database-oriented.

I (almost) completely disagree. Perhaps the average joe or jane who builds
a
personal page has little use for a DTD (for which the better page designers
would
provide a default style sheet and dtd or templates from which to chose), but
people
who design sites most certainly do have a use for this methodology. Let's
take a
hypothetical situation.

I am a web designer and I run a site dedicated to serving, say, information
regarding
games. The exact sort of game is immaterial. If I am giving information
about more
than five games, then I can use XML in conjunction with a standard CSS
format to
manage my work. This is especially true if I decide to change the look of
the pages.
One change to a style sheet and wallah, a whole new look.

>And, contrary to the ivory-tower theory, text layout has virtually
>nothing to do with the sorts of semantics XML-folk are interested in:
>you won't ever have a FAXNUM-style that you'll use for every fax number
>on every page.

No, but you might have a PHONENUM style if you're serving a large number of
phone
numbers for any purpose at all. I could have an employee listing of size
30, for example,
and just have a static HTML-style page which I then edited every time
someone new
was hired or left.

But this is inconvenient if the nature of the business changes... expansion,
addition of
functionality such as customer phone numbers (served on the intranet), etc.
On the
other hand, if I used an XML style format and we decide to serve the numbers
from the
native AS/400 database, then half of my work is done for me.

>But only a tiny portion of the Web is pure database, not nearly enough
>to swing a majority of surfers... nor authors, who'll have to start
>maintaining two versions of every page if they want true XML support.

But database functionality is not, IMO, the best reason to use XML. I
support it because
it allows for a more object-oriented style (or at least taxonomical) of site
design.

Customized views of web pages is also possible, providing the browser (the
person)
to view material in ways better suited to their screen... match the
information stored
in a cookie or onscreen view selection with a style sheet, build the page
with a CGI
interface and woo the consumer to your methods.

Then there's the possibility of serving LDAP (lightweight directory access
protocol)
searches in an economical way.

>
>But obviously, we need to move in the direction of better semantic
>markup, so what's needed is a compromise like what MSIE now offers-- XML
>tags mixed in with normal HTML. (I call this XHTML, but that's not an
>official designation yet, afaik.)
>

>But I'm also losing confidence that the XML-crowd will come up with a
>tagset that the Rest of Us can really use.

Ugh. The W3C blew it, that much is true. The original specification,
published in the
WWW journal, consisted of two parts: syntax and linking. Unfortunately the
two were
separated and the syntax passed without the linking capacity.

*But*, the working draft for linking is available from the W3C. MSIE should
have
supported that, even if it hasn't become the official method. If it is
accepted as a standard,
then MS's costs for adding that support would have been reduced. Yes,
changes will occur,
but they will most likely be modest and altering existing code to provide
this support
would be highly cost effective.

Actually, if Netscape releases the code for v5, I may add such support for
it myself (since
it would only make calls to existing functions which link HTML documents).

>I'm picturing this natural upgrade path:
>
>- even before there's any market for it, a major search engine could add
>a specialised-search option where you can find a phrase tagged, eg,
>'FAXNUM' by querying "+faxnum:312"
>
>- a lot of authors could then start experimenting with a lot of new
>tags, and a few of these would catch on in a big way, because they make
>certain specialised searches vastly more efficient
>
>- new refinements would then be built evolutionarily on top of these
>proven successes
>
>This approach bypasses XML's do-it-our-way-or-else incompatible-upgrade
>problem...

What problem? The specification would allow such a search. What section of
the
spec doesn't allow it?

Search engines could release to the public a set of tags to which they
provide
support. People could add them to their DTDs or use an external DTD placed
on that
engine's system. All of this, IIRC, is very much supported by XML, at least
in my
reading of the specification.

[tag success stories snipped]

>I have grave doubts about the conventional tag-guesses like FAXNUM and
>NAME, for this upgrade path. They'd be a lot of effort to maintain, and
>for search, most of this sort of keyword belongs in a META header (which
>people hardly use anyway).

Eh? To which portion of the spec are you referring? <B>XML does NOT
provide
tags in the traditional markup sense.</b> If I want to have a tag called
NAME, I would
simply write:
<NAME classification="historical personage">Thomas Jefferson</NAME>
<NAME>Gregor Samsa</NAME>

I could also throw in:
<VERSION jdk="1.1.6"/>

None of this belongs in a META header, it's too general.

eric filson

Jorn Barger

unread,

Aug 4, 1998, 3:00:00 AM8/4/98

to

samsa <sa...@sparc.isl.net> rehashes the standard arguments for
structural markup:

> I am a web designer and I run a site dedicated to serving, say,
> information regarding games. The exact sort of game is immaterial. If I
> am giving information about more than five games, then I can use XML in
> conjunction with a standard CSS format to manage my work. This is
> especially true if I decide to change the look of the pages. One change to
> a style sheet and wallah, a whole new look.

Let's call this the style-macro argument.

My objections:

1) You can get the same benefit without any hint of DTDs or semantic
tags.

2) Adding the extra constraints of XML makes it *harder* to do.

3) Depending on others' DTDs limits your design choices.

4) Roll-your-own tagsets eliminate the advantage of intelligent
substitution at the user's end.

> ...you might have a PHONENUM style if you're serving a large number
> of phone numbers for any purpose at all. ... if I used an XML style

> format and we decide to serve the numbers from the native AS/400 database,
> then half of my work is done for me.

This has nothing to do with styles, and everything to do with databases.

> But database functionality is not, IMO, the best reason to use XML. I
> support it because it allows for a more object-oriented style (or at least
> taxonomical) of site design.

And you imagine that freeform text can be usefully analysed into a
taxonomy of data-objects. But as soon as you try to give me an example,
you have to return to text databases, with DTDs produced at great
expense, and users who feel straitjacketed by their limits.

> Customized views of web pages is also possible, providing the browser (the
> person) to view material in ways better suited to their screen... match
> the information stored in a cookie or onscreen view selection with a style
> sheet, build the page with a CGI interface and woo the consumer to your
> methods.

Let's call this the platform-portability argument.

My objections:

1) If people roll their own tags, the benefit is nil.

2) The semantic elements that XML-folk want to tag are entirely useless
for page layout.

3) Portability can be achieved just as well with intelligent
substitution for pure layout-tags.

> >This approach bypasses XML's do-it-our-way-or-else incompatible-upgrade
> >problem...
> What problem? The specification would allow such a search. What section
> of the spec doesn't allow it?

You're way out of sync with me, here.

My complaint is that the XML community is dripping with contempt for any
approach that doesn't 'validate', in particular the obvious
embedded-XHTML approach.

As a result, in total opposition to all known principles of human
factors, the XML community has rejected backward compatibility with
existing HTML documents. With a cult-ish zeal that's obviously doomed,
you're insisting that the change to XML be an all-or-nothing quantum
leap... so that you can have your lovely ivory-tower DTDs validate, so
that people's 200+MHz 64-bit CPUs won't be horribly burdened by such
desperately ambiguous syntax as <P>.

> Search engines could release to the public a set of tags to which they
> provide support. People could add them to their DTDs or use an external
> DTD placed on that engine's system. All of this, IIRC, is very much
> supported by XML, at least in my reading of the specification.

Funny, you don't seem to notice that you've thrown out all the
advantages of my incremental approach.

NOBODY IS SMART ENOUGH TO GUESS WHAT THE GOOD TAGS WILL BE.

So for this changeover to work, it has to happen evolutionarily.

> >I have grave doubts about the conventional tag-guesses like FAXNUM and
> >NAME, for this upgrade path. They'd be a lot of effort to maintain, and
> >for search, most of this sort of keyword belongs in a META header (which
> >people hardly use anyway).
> Eh? To which portion of the spec are you referring? <B>XML does NOT
> provide tags in the traditional markup sense.</b> If I want to have a
> tag called NAME, I would simply write: <NAME classification="historical
> personage">Thomas Jefferson</NAME> <NAME>Gregor Samsa</NAME>
> I could also throw in: <VERSION jdk="1.1.6"/>
> None of this belongs in a META header, it's too general.

I strongly suggest you reread my post, because your preconceptions have
made you utterly oblivious to my points.

NAME is an example of the kind of ivory-tower markup that the XML
community seems to imagine will be useful in freeform text.

Clearly, it's not useful for styling, but I'm suggesting it's not even
useful for search. As the TEI people discovered, you can mark up your
text until the cows come home, and end up with nothing more useful than
the raw ascii. The trick is to find the handful of tags that *will* be
most useful, and there's no way to determine this except by trial and
error... so the system has to be implemented in a way that encourages
this.

Something like a name will be much easier to search on by putting it in
a META header, is why I brought that up.

I despaired long ago of communicating with the True Believers in this
cult, so this post is directed at the bystanders. Please notice how
universal is the unwillingness to question any of the basic tenets of
the cult, to even acknowledge that there's any problem except the
stupid, messy-HTML web designers, who need to be ruthlessly spanked into
submission...

So long as this is the attitude, there will be a complete disconnect
between the XML community, with their text-databases, and the designer
community, who'll have no use for XML at all.

John Lamp

unread,

Aug 5, 1998, 3:00:00 AM8/5/98

to

Jorn Barger wrote:
>
> I think the W3C has XML all wrong.
>
> Requiring DTDs puts a straitjacket on the average web-page, which is
> freeform text and not fascist-lockstep-database-oriented.

Even HTML is a DTD! Not a very good one, perverts SGML by tagging
presentation and not structure. You never notice the HTML DTD because it
is hard coded into the browsers and editors. (Not very well coded IMHO
with very few exceptions.)

Cheers
John
--
_--_|\ John Lamp MACS, School of Management Information Systems
/ \ Deakin University, Waurn Ponds, Geelong Victoria 3217
\_.--._/ Room: GE27 Phone:03 5227 2110 mailto:John...@deakin.edu.au
v Fax: 03 5227 2151 http://www.man.deakin.edu.au/jw_lamp/

Alfie Kirkpatrick

unread,

Aug 5, 1998, 3:00:00 AM8/5/98

to

Jorn Barger <jo...@mcs.com> goes over some old ground
<1dd97iu.nif...@jorn.pr.mcs.net>...

Jorn,

> I despaired long ago of communicating with the True Believers in this
> cult, so this post is directed at the bystanders. Please notice how
> universal is the unwillingness to question any of the basic tenets of
> the cult, to even acknowledge that there's any problem except the
> stupid, messy-HTML web designers, who need to be ruthlessly spanked into
> submission...

1) HTML is not going away. Carry on as you were. HTML is fine in a large
number of cases.

2) SGML/XML (structured markup) is useful in a number of key areas:

- Single source publishing
- Information reuse
- Workflow/Content management
- Database publishing

All of these have been big issues where the Web is concerned,
not for the designers but for large "information providers".
This increasingly includes Intranets, where a lot of content
is becoming unmanagable.

As I've said to you before, XML is for a particular application area.
You aren't going to be forced to use it. But just because you don't
need it, don't try and claim the rest of us don't either!

> So long as this is the attitude, there will be a complete disconnect
> between the XML community, with their text-databases, and the designer
> community, who'll have no use for XML at all.

At the moment there is a "complete disconnect" as you say. That is
because the XML community is concentrating on the core technology and
the styling isn't quite there yet. As the technology becomes more
mature, there will be time for the designers to get their say. The
Web started with design, now it's time for some control! (deliberately
provocative comment...)

By the way, blue on purple is very hard to read, but it looks nice.

Alfie.

k98...@hotmail.com

unread,

Aug 5, 1998, 3:00:00 AM8/5/98

to

In article <1dd97iu.nif...@jorn.pr.mcs.net>,
jo...@mcs.com (Jorn Barger) wrote: Grrrr.

I agree the web needs a typesetting langage, but some people need more.

For instance, I have a bunch of spreadsheets I use to track things
like time and expenses. Sometimes for a report I generate a pie chart
from the spreadsheet. But when I want to update that information I don't
open a paint program and try to edit the pie chart.

I don't understand why it is you want somebody to tell you what
you can and cannot do. Rather than waiting for some committee to tell
you, "Here! Here's your 50 tags." XML lets you do whatever you want.

Or is that what threatens you so much?

Josh

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp Create Your Own Free Member Forum

Gordon Joly

unread,

Aug 5, 1998, 3:00:00 AM8/5/98

to

In article <1dd8aad.1hb...@jorn.pr.mcs.net>,
Jorn Barger <jo...@mcs.com> wrote:
>.

>I think the W3C has XML all wrong.
>
>Requiring DTDs puts a straitjacket on the average web-page, which is
>freeform text and not fascist-lockstep-database-oriented.
>

>And, contrary to the ivory-tower theory, text layout has virtually
>nothing to do with the sorts of semantics XML-folk are interested in:
>you won't ever have a FAXNUM-style that you'll use for every fax number
>on every page.
>

>But only a tiny portion of the Web is pure database, not nearly enough
>to swing a majority of surfers... nor authors, who'll have to start
>maintaining two versions of every page if they want true XML support.
>

>[...]

Maybe. But a suitable XML could easily generate HTML 3.2, HTML 4.0,
DHTML, SMIL, CDF, Latex, Tex, nroff, Ceefax, Teletext etc etc.

:-)

Gordo
--
--
Gordon Joly http://pobox.com/~anorak/
go...@dircon.co.uk gordo...@pobox.com

David Brownell

unread,

Aug 6, 1998, 3:00:00 AM8/6/98

to

Jorn Barger wrote:
>
> With a cult-ish zeal that's obviously doomed,

.... Jorn proceeded apace.

- Dave

Jorn Barger

unread,

Aug 28, 1998, 3:00:00 AM8/28/98

to

[Stupid people shouldn't even bother reading this, because it will just
confuse you. If you're not sure if you're stupid or not, give yourself
this test before replying: summarise my argument first, without blatant
caricature.]

Three weeks back, Robert Ducharme <duch...@squeegee.cs.nyu.edu> wrote,

quoting me:
> >Requiring DTDs puts a straitjacket on the average web-page
> XML doesn't require DTDs. See the "?" at the end of production 22,
> http://www.w3.org/TR/REC-xml#NT-prolog.

I finally got a chance to follow this link, to an extremely _trying_
150k technical document, where I found this quote:

"An XML document is valid if it has an associated document type
declaration and if the document complies with the constraints
expressed in it."

So I consider your "?" quibble to be in bad faith.

The point I'm trying to make, which was totally ignored in all the many
smug follow-ups, is that XML is a human-factors disaster, because it's
asserted in an all-or-nothing manner: either the author straitjackets
her page with a DTD, or else she loses the benefits of semantic-markup
experimentation.

There isn't any reason for this, except the cultish geek-esthetic that
thinks humans should pre-compile their HTML so that mega-MIPS parsers
don't have to.

An example:

I'm seeing a "play.dtd" used in XML demos [1] [2].

It seems to be tuned to Shakespeare's style, eg in that it allows two
speakers for one speech... something that most playwrights would never
dream of.

It also seems to insist that the front-matter precede the dramatis
personae, which may seem a small constraint, but in fact transfers a
burden onto all playwrights, to fit their presentation to the DTD
author's rulebook... which means every playwright has to learn that
rulebook, or learn to write their own DTDs.

And any hacker with any shred of professional pride should be
embarrassed by the wastefulness of tagging every line with
<LINE></LINE>. What are you guys thinking?!?

Among the replies on this thread, the following was quite typical in
tone:

"The average web pages I've seen need a good fascist-lockstep
spanking."

This, from a community that considers www.w3.org/TR/REC-xml exemplary,
when anyone in the 1000+ year history of page design would instantly
declare it unreadable.

Another example [3]:

"The MARQUEE tag made text dance about all over the screen - not
exactly a feature you would expect from a serious language
concerned with structural mark-up such as paragraphs, headings
and lists. "

The W3C cult is ***afraid of esthetic freedom***. (Remember the
hillbillies in "Easy Rider", beating Jack Nicholson to death as he
sleeps? That's YOU.) So they seek to subjugate the esthetic impulse to
semantic rulebooks... offering as rationalisation a claim rooted in
1960s systems design: that parsers need human pre-compilation.

Many of the replies said, "If you don't like it, don't use it."

This is unbelievably narrow-minded.

As I continually express, I want AI-experimentation to proceed at
maximum speed.

But the way to make this possible is to embrace a standard that
*encourages* arbitrary semantic tags in ordinary messy HTML. If I want
to tell a joke with "Sarah Jessica Parker Posey" as the punchline, I
should be allowed-- encouraged-- to tag it:

<NAME1>Sarah Jessica <NAME2>Parker</NAME1> Posey</NAME2>

To reject this for geek-esthetic reasons is just disgraceful, and will
result inevitably in XML remaining limited to the database-export niche.

j
[1] <URL:http://www.csclub.uwaterloo.ca/u/relander/XML/hamlet.xml>
[2] <URL:http://www.webreference.com/dlab/books/html/38-3.html>
[3] <URL:http://www.w3.org/People/Raggett/book4/ch02.html>

Lars Marius Garshol

unread,

Aug 28, 1998, 3:00:00 AM8/28/98

to

* Jorn Barger

|
| Three weeks back, Robert Ducharme <duch...@squeegee.cs.nyu.edu> wrote,
| quoting me:
| > >Requiring DTDs puts a straitjacket on the average web-page
| > XML doesn't require DTDs. See the "?" at the end of production 22,
| > http://www.w3.org/TR/REC-xml#NT-prolog.
|
| I finally got a chance to follow this link, to an extremely _trying_
| 150k technical document, where I found this quote:
|
| "An XML document is valid if it has an associated document type
| declaration and if the document complies with the constraints
| expressed in it."
|
| So I consider your "?" quibble to be in bad faith.

Jorn, you have to be aware that XML documents are not required to be
valid. They can be either well-formed or valid, which is made clear by
the first paragraph in section 2 of REC-XML:

"A data object is an XML document if it is well-formed, as defined
in this specification. A well-formed XML document may in addition be
valid if it meets certain further constraints."

In section 2.1, the requirements for well-formed documents are set
down:

"A textual object is a well-formed XML document if:
1. Taken as a whole, it matches the production labeled document.
2. It meets all the well-formedness constraints given in this
specification.
3. Each of its parsed entities is well-formed."

In other words, DTDs are not required, nor is validity. This is why
there are two kinds of XML parsers: validating ones and non-validating
ones. (The latter do not check for validity, and some of them even
ignore the DTD completely, even if it is present (although that is in
violation of REC-XML.)) Most XML parsers are in fact non-validating.

| The point I'm trying to make, which was totally ignored in all the
| many smug follow-ups, is that XML is a human-factors disaster,
| because it's asserted in an all-or-nothing manner: either the author
| straitjackets her page with a DTD, or else she loses the benefits of
| semantic-markup experimentation.

Sorry, Jorn, but you're still wrong. This is a well-formed XML
document and entirely legal according to REC-XML:

| An example:
|
| I'm seeing a "play.dtd" used in XML demos [1] [2].
|
| It seems to be tuned to Shakespeare's style, eg in that it allows two
| speakers for one speech... something that most playwrights would never
| dream of.

If so, it's a bad DTD for general plays, but I'm uncertain if that's
really what it was intended to be.

| It also seems to insist that the front-matter precede the dramatis
| personae, which may seem a small constraint, but in fact transfers a
| burden onto all playwrights, to fit their presentation to the DTD
| author's rulebook... which means every playwright has to learn that
| rulebook, or learn to write their own DTDs.

I could argue that here presentation and representation have been
separated, so that playwrights can still use a stylesheet that
displays the information in any desired order. However, I suspect that
this is not really what you're driving at, so I'll approach this a bit
differently.

Yes, a DTD is a straightjacket, something you are required to fit your
document into. This is so because the straightjacket make it possible
to write processing software that can do really fancy stuff with the
document. This would be far more difficult if the document were
unpredictable, as it would be without a straightjacket. (Any schema is
a straightjacket, and they all serve that same purpose.)

If you read a book on DTD design (there are several) you'll see that
almost all of them emphasise that it's important to make the
straightjacket comfortable, or people may refuse to use it and they
will require a lot more training. (Both of which may be expensive.)

However, in XML you can throw off the straightjacket if that's what
you want. Personally, I don't think there's much point in the cases
you're thinking of, but then I'm the kind of person who would
typically have to write the processing software. Anyway, you're allowed
to do it if you so wish.

| And any hacker with any shred of professional pride should be
| embarrassed by the wastefulness of tagging every line with
| <LINE></LINE>. What are you guys thinking?!?

A lot of (most of/all of) the people behind the XML recommendation had
previous experience with SGML, XMLs big brother. SGML (in which HTML
is defined) does not require this kind of thing and in fact SGML takes
considerable pains to ensure that authors do not have to write any
more markup than is strictly necessary.

In fact, an SGML DTD can be written so that the following HTML extract

can be written as

<TABLE>
/a|b|c
/d|e|f
/g|h|i
</TABLE>

However, the result of all these minimizations is that only full SGML
parsers can parse SGML reliably and that SGML has become so complex
that only 2-3 SGML parsers exist, even though the standard is 12 years
old.

So what people have been thinking, roughly, is that we want an
alternative to SGML, for use on the web. It has to be simple if
browser vendors are to support it, and so all this markup minimization
stuff has to go. Anyone who wants to use it can write SGML and convert
to XML. (There are programs to do this automatically.)

| This, from a community that considers www.w3.org/TR/REC-xml exemplary,
| when anyone in the 1000+ year history of page design would instantly
| declare it unreadable.

REC-XML is not meant to be a tutorial, it is meant to be precise. That
the parser writers who read it agree on what is allowed in XML and
what is not is far more important than readability to non-technical
people. And, BTW, REC-XML is far more readable than certain other
specifications I could point to...

| The W3C cult is ***afraid of esthetic freedom***. (Remember the
| hillbillies in "Easy Rider", beating Jack Nicholson to death as he
| sleeps? That's YOU.) So they seek to subjugate the esthetic impulse to
| semantic rulebooks... offering as rationalisation a claim rooted in
| 1960s systems design: that parsers need human pre-compilation.
|
|
| Many of the replies said, "If you don't like it, don't use it."
|
| This is unbelievably narrow-minded.

I think it's rather pragmatic, in the sense that a standard like XML
can't hope to both be a suitable format for structured data (which was
the main goal) and at the same time be a good format for graphical
presentations.

However, one can define graphical formats in XML (such as PGML and
VML) and use them to achieve esthetical freedom.

| But the way to make this possible is to embrace a standard that
| *encourages* arbitrary semantic tags in ordinary messy HTML. If I
| want to tell a joke with "Sarah Jessica Parker Posey" as the
| punchline, I should be allowed-- encouraged-- to tag it:
|
| <NAME1>Sarah Jessica <NAME2>Parker</NAME1> Posey</NAME2>
|
| To reject this for geek-esthetic reasons is just disgraceful, and
| will result inevitably in XML remaining limited to the
| database-export niche.

There are many good technical reasons why this isn't allowed, but they
seem a little beside the point. XML does limit you (it is a
straightjacket) like you say, and there are lots of things that are
hard to capture with it. However, a specification does limit you, and
that is the entire point of a specification. (Just throw away the
specification and, hey presto, you have no limits.)

So if XML allowed this, it would inevitably have to disallow something
else, and the line would have to be drawn somewhere. In order to
remain compatible with SGML, retain a simple data model and make XML
feasible to implement and work with, this particular feature was
considered to be on the wrong side of that line, even though it is
sometimes wanted, even by the straightjacket advocates.

The common solution to this problem in the past has been point markup:

<NAME-START-1/>Sarah Jessica <NAME-START-2/>Parker<NAME-END-1/>
Posey<NAME-END-2/>

--Lars M.

Jorn Barger

unread,

Aug 28, 1998, 3:00:00 AM8/28/98

to

Lars Marius Garshol <lar...@ifi.uio.no> wrote:
> Yes, a DTD is a straightjacket, something you are required to fit your
> document into. This is so because the straightjacket make it possible
> to write processing software that can do really fancy stuff with the
> document.

What fancy stuff?

> | This, from a community that considers www.w3.org/TR/REC-xml exemplary,
> | when anyone in the 1000+ year history of page design would instantly
> | declare it unreadable.
>
> REC-XML is not meant to be a tutorial, it is meant to be precise. That
> the parser writers who read it agree on what is allowed in XML and
> what is not is far more important than readability to non-technical
> people. And, BTW, REC-XML is far more readable than certain other
> specifications I could point to...

I'm talking about the page design; the writing style is a separate
issue. Having the text/bgcolor change in almost every line is as bad or
worse than the classic too-many-fonts complaint.

> | Many of the replies said, "If you don't like it, don't use it."
> | This is unbelievably narrow-minded.
> I think it's rather pragmatic, in the sense that a standard like XML
> can't hope to both be a suitable format for structured data (which was
> the main goal) and at the same time be a good format for graphical
> presentations.

Are you saying that XML-plus-stylesheets has hidden shortcomings, even
compared to NHTML (Netscape HTML)???

My point is that *some* semantic markup is urgently needed, but the XML
community seems determined to tie its implementation to their entirely
unrelated goal of 'spanking sloppy HTML-authors'.

The responsible approach would be to recognise that 300+ million HTML
pages exist on the Web, and encourage these authors to add XHTML tags as
anarchically as they like.

And then encourage the search engines to extend their word-indexes to
allow one to search for any arbitrary word within any arbitrary tag (the
example I gave was +faxnumber:312).

This adds zero burden to browsers, nor does it impose at all on whatever
straitjackets you want to demand for *.xml documents.

> | <NAME1>Sarah Jessica <NAME2>Parker</NAME1> Posey</NAME2>

> [...]

> So if XML allowed this, it would inevitably have to disallow something
> else, and the line would have to be drawn somewhere.

What you're still insistently NOT acknowledging is that there isn't any
deep connection between the markup one wants for styles, and the markup
one wants for semantics. So you continue to demand that the two be
handled by the same mechanism, and that it have rigid rules of
containership, which simply don't matter for most of the real world.

You say one has to draw the line somewhere, but I say you've drawn it in
the geekiest possible position, requiring the most extreme constraint on
free design.

And I simply don't believe your hand-waving claims that the parsers will
be liberated by these constraints, to do amazing miracles of... of what?
Layout? Retrieval?? Validation???

The whole ***validation fetish*** is just a neurotic pathology.

j

Lars Marius Garshol

unread,

Aug 30, 1998, 3:00:00 AM8/30/98

to

* Lars Marius Garshol

|
| Yes, a DTD is a straightjacket, something you are required to fit your
| document into. This is so because the straightjacket make it possible
| to write processing software that can do really fancy stuff with the
| document.

* Jorn Barger
|
| What fancy stuff?

What I was trying to say was that without the straightjacket what you
end up with may have semantic information embedded, but it will be
unpredictable and therefore harder to process than information wearing
a straightjacket.

If you skip the DTD you're still in the straightjacket of well-formed
XML, which is what makes writing a parser possible. You can still do
fancy stuff without a DTD, but when processing large amounts of data
your software will be surprised at times.

| [About REC-XML]

|
| I'm talking about the page design; the writing style is a separate
| issue. Having the text/bgcolor change in almost every line is as
| bad or worse than the classic too-many-fonts complaint.

It's functional. Examples, BNF productions and tables stand out, which
has definitely been very useful to me. If you don't like it you can
use the PDF version or turn of document colors in your browser.

* Lars Marius Garshol

|
| I think it's rather pragmatic, in the sense that a standard like XML
| can't hope to both be a suitable format for structured data (which was
| the main goal) and at the same time be a good format for graphical
| presentations.

* Jorn Barger

|
| Are you saying that XML-plus-stylesheets has hidden shortcomings, even
| compared to NHTML (Netscape HTML)???

No. I'm just saying that if esthethic freedom is what you want XML is
not the right place to look, even though you can achieve it there as
well. It's just harder than with tools designed specifically for that.

| My point is that *some* semantic markup is urgently needed, but the
| XML community seems determined to tie its implementation to their
| entirely unrelated goal of 'spanking sloppy HTML-authors'.

Do you mean that you want both esthetic freedom _and_ semantic markup?

| The responsible approach would be to recognise that 300+ million HTML
| pages exist on the Web, and encourage these authors to add XHTML tags as
| anarchically as they like.

The W3C is actually busy discussing this very topic. See

<URL:http://www.w3.org/MarkUp/future/papers.html>

The problem with mixing XML and HTML is that the result is neither
SGML, HTML nor XML, which makes for a something of a problem. However,
you can transfer HTML to XML (well, mostly) and then do as you wish
inside XML. (With or without a DTD.)

| What you're still insistently NOT acknowledging is that there isn't
| any deep connection between the markup one wants for styles, and the
| markup one wants for semantics.

That varies. In some cases one wants markup for both, in others one
wants just semantics and in yet others style is the only thing of
interest.

| So you continue to demand that the two be handled by the same
| mechanism, and that it have rigid rules of containership, which
| simply don't matter for most of the real world.

These rigid rules have worked very well for a number of Fortune 500
companies in production systems. That is a major part of the reason
why they're still with us. If that isn't "real world" enough for you
I'd like to know what would be.

| You say one has to draw the line somewhere, but I say you've drawn
| it in the geekiest possible position, requiring the most extreme
| constraint on free design.
|
| And I simply don't believe your hand-waving claims that the parsers
| will be liberated by these constraints, to do amazing miracles
| of... of what? Layout? Retrieval?? Validation???

It doesn't matter much to non-validating parsers, but it does matter
to validating ones and it also matters for the entire set of standards
that are built on top of SGML and XML. (I mean HyTime, DSSSL, XLink,
XPointer, CSS, XSL, the DOM etc etc)

Some of these might work without simple containment, others simply
won't. By keeping simple containment the XML effort can benefit from
decades of experience with these kinds of things (it's not by
coincidence that the set of XML standards rather closely mirrors that
of SGML ones).

Of course one might have thrown all this away, but all experience with
SGML so far indicates that this would be for little benefit. Do you
have any indications to the contrary?

| The whole ***validation fetish*** is just a neurotic pathology.

It's not a "neurotic pathology" when you're trying to maintain the
documentation of an aircraft through all its different models and
configurations, or the documentation of the equipment that makes up an
oil rig. In fact, it's sound business sense.

I know that's not what you're interested in, but like I wrote above,
throwing away containment means throwing away lots of other things as
well. If you really are dead set on this nothing can stop you from
making your own layer on top of XML where all markup is point-markup.

--Lars M.

Jorn Barger

unread,

Aug 30, 1998, 3:00:00 AM8/30/98

to

Lars Marius Garshol <lar...@ifi.uio.no> wrote:

[...]

> What I was trying to say was that without the straightjacket what you
> end up with may have semantic information embedded, but it will be
> unpredictable and therefore harder to process than information wearing
> a straightjacket.

This is what I've been calling "moving the burden of pre-compilation
onto the author".

> | [About REC-XML]
> | I'm talking about the page design; the writing style is a separate
> | issue. Having the text/bgcolor change in almost every line is as
> | bad or worse than the classic too-many-fonts complaint.
>
> It's functional. Examples, BNF productions and tables stand out, which
> has definitely been very useful to me. If you don't like it you can

> use the PDF version or turn off document colors in your browser.

No, this 'solution' is not acceptable (and is unbearably arrogant).

People who aim to define the standards of page-design need to master at
least the basic rudiments of page-design.

> Do you mean that you want both esthetic freedom _and_ semantic markup?

Duh. (I've referred many times in the past to this black-and-white
blindspot among the *ML cultists.)

> The problem with mixing XML and HTML is that the result is neither
> SGML, HTML nor XML, which makes for a something of a problem.

I'm asking for something very simple: arbitrary semantic tags within
existing HTML conventions. As far as I know, this already exists,
because the browser is supposed to ignore tags it doesn't recognise...
right?

So I'm asking for zero changes in the browser-code... all I really want
is for the W3C-et-al to *encourage* this XHTML experimentation,
especially the "+faxnumber:312" option in search engines, instead of
acting like the existing 300+ million pages are naughty and need to stay
after school for detention.

> | What you're still insistently NOT acknowledging is that there isn't
> | any deep connection between the markup one wants for styles, and the
> | markup one wants for semantics.
>
> That varies. In some cases one wants markup for both, in others one
> wants just semantics and in yet others style is the only thing of
> interest.

You're not getting it.

WHEN YOU STOP CARING ABOUT STYLES, YOU STOP CARING ABOUT READERS.

And when you demand that authors give each new style a new name, based
on structural analysis, you've stopped caring about authors.

> | So you continue to demand that the two be handled by the same
> | mechanism, and that it have rigid rules of containership, which
> | simply don't matter for most of the real world.
>
> These rigid rules have worked very well for a number of Fortune 500
> companies in production systems. That is a major part of the reason
> why they're still with us. If that isn't "real world" enough for you
> I'd like to know what would be.

So, can I call this the "McWeb" hypothesis?

Counter-questions:

- do any of these companies use this data on their websites?

- do they use it on their frontpage?

- does anyone use the resulting site?

- how much did they have to spend on DTD-PhDs?

- what do the writers say about the constraints they have to work
within? do they sneak copies of the SGML software home so they can
write their novel with it?

etc...

[...]

> | And I simply don't believe your hand-waving claims that the parsers
> | will be liberated by these constraints, to do amazing miracles
> | of... of what? Layout? Retrieval?? Validation???
>
> It doesn't matter much to non-validating parsers, but it does matter
> to validating ones

You sure do love your jargon!

(What you've just said is "validation is important to parsers that care
about validation'.)

> and it also matters for the entire set of standards
> that are built on top of SGML and XML. (I mean HyTime, DSSSL, XLink,
> XPointer, CSS, XSL, the DOM etc etc)

Does it really?

If "Tidy" can turn bad HTML into good, why don't these apps just run
Tidy themselves, before they parse?

I'm talking about a human-factors implementation disaster, where a task
that software can do perfectly well is imposed (sadistically) on
webmasters, with a distinct, insulting implication of the schoolmaster's
hickory switch.

> Some of these might work without simple containment, others simply
> won't. By keeping simple containment the XML effort can benefit from
> decades of experience with these kinds of things (it's not by
> coincidence that the set of XML standards rather closely mirrors that
> of SGML ones).

One *week* of the Web brings more HTML experience than the entire
history of SGML.

But the W3C website demonstrates vividly that you guys are stuck in
1992, in terms of *experiencing* the lessons of Web design.

> | The whole ***validation fetish*** is just a neurotic pathology.
>
> It's not a "neurotic pathology" when you're trying to maintain the
> documentation of an aircraft through all its different models and
> configurations, or the documentation of the equipment that makes up an
> oil rig. In fact, it's sound business sense.

Yeah, well, better get cracking that whip. We certainly need the
Web-in-general to be a lot more like aircraft-maintenance documentation!

> I know that's not what you're interested in, but like I wrote above,
> throwing away containment means throwing away lots of other things as
> well.

No, it doesn't.

> If you really are dead set on this nothing can stop you from
> making your own layer on top of XML where all markup is point-markup.

We'll call this the "Go make your own Web" argument. The standard reply
is "Back atcha", so what we're really fighting over is the souls of the
browser-authors.

Jorn Barger

unread,

Aug 30, 1998, 3:00:00 AM8/30/98

to

James K. Tauber <jta...@jtauber.com> wrote:
> [...] Most styles in documents are a direct result of a decision on the
> part of the author as to how a particular structural component of a document
> is to be presented. The decision that something is a heading comes long
> before the decision that something is to be 24 point times roman.

This may work for the most common 50% of document structure, but if you
actually try designing attractive pages that communicate well, you'll
quickly discover that it's all about thousands of subtle variations in
emphasis, none of which have structural names yet.

Pick up any published document at random (book, newspaper, etc, not
journal articles), and start at the top corner, and define a structural
tag for each new style. If you really do this, you'll see how
impossible the task is.

> PS Am I the only one that would love to see Jorn Barger and Ted Nelson on a
> panel discussion :-)

I don't know what this means, but I do know that in four years of
arguing, nobody has ever taken up my challenge to paraphrase my point of
view without caricature.

Peter Flynn

unread,

Aug 30, 1998, 3:00:00 AM8/30/98

to

Lars Marius Garshol wrote:
> Jorn, you have to be aware that XML documents are not required to be
> valid. They can be either well-formed or valid,

All XML documents must be well-formed. Well-formedness is not an option.
Those with a DTD must additionally be valid.

Jorn Barger wrote:
> | The point I'm trying to make, which was totally ignored in all the
> | many smug follow-ups, is that XML is a human-factors disaster,
> | because it's asserted in an all-or-nothing manner:

If you look at the rationales behind XML you'll find that this is
exactly part of the design: almost all the optional bits of SGML
(which made it hard to program for) have been removed. The result
is that XML is stricter on syntax than "full" SGML. The benefit is
that it's easier to program for.

[Lars]

> Yes, a DTD is a straightjacket, something you are required to fit your
> document into.

But it can be a very loose straitjacket, like the TEI, which is a DTD
designed for _descriptive_ markup, not _prescriptive_ markup.

[Jorn]

> | And any hacker with any shred of professional pride should be
> | embarrassed by the wastefulness of tagging every line with
> | <LINE></LINE>. What are you guys thinking?!?

So the professional hacker would also let me muck arbitrarily with
the syntax of C++ just because I don't happen to like curly braces?

Think again...it's a LANGUAGE. It has RULES. You have to FOLLOW them.
No-one is asking you to like them, just to obey them if you want the
required (predicted) result.

> | The W3C cult is ***afraid of esthetic freedom***. (Remember the

I'm not sure what cult this is. Maybe you mean W3C members? They have
to pursue a different ethic, as they are in business, presumably to
make money. Aesthetic freedom is possibly not involved here.

> | hillbillies in "Easy Rider", beating Jack Nicholson to death as he
> | sleeps? That's YOU.) So they seek to subjugate the esthetic impulse to
> | semantic rulebooks... offering as rationalisation a claim rooted in
> | 1960s systems design: that parsers need human pre-compilation.

No, you have completely misunderstood the entire issue. SGML (and its
derivatives) allow you to describe certain aspects of your documents in
a way that lends itself to one or more forms of automated processing. As
the designer of the structure of the document, you can choose how much
or
how little of this processing does get automated. Personally, I prefer
my
system to do the gruntwork for me, and leave me free to spend more time
on aesthetic decisions, but some people prefer to have the computer do
the whole job, and so they make a set of rules (stylesheet) that says
what
to do and when. Others prefer to make every decision themselves, as it
arises, so I'm not sure SGML is much use to them.

> | Many of the replies said, "If you don't like it, don't use it."
> | This is unbelievably narrow-minded.

No, it sounds rather sensible to me. No-one is forcing you to use XML.

> | But the way to make this possible is to embrace a standard that
> | *encourages* arbitrary semantic tags in ordinary messy HTML. If I
> | want to tell a joke with "Sarah Jessica Parker Posey" as the
> | punchline, I should be allowed-- encouraged-- to tag it:
> |
> | <NAME1>Sarah Jessica <NAME2>Parker</NAME1> Posey</NAME2>

You could, but not in SGML this way. There's nothing to stop you
designing
your own language to do this, but I fail to see why you are expecting a
language which expressly forbids it to be changed because of your own
whims.
And as Lars pointed out, simple point markup overcomes the problem
anyway.

///Peter

///Peter
--
DTDs are not common knowledge because programming students are not
taught markup. A markup language is not a programming language.

James K. Tauber

unread,

Aug 31, 1998, 3:00:00 AM8/31/98

to

Jorn Barger wrote in message <1dekinr.1iu...@jorn.pr.mcs.net>...

>Lars Marius Garshol <lar...@ifi.uio.no> wrote:
>> It's functional. Examples, BNF productions and tables stand out, which
>> has definitely been very useful to me. If you don't like it you can
>> use the PDF version or turn off document colors in your browser.
>
>No, this 'solution' is not acceptable (and is unbearably arrogant).

Why? The whole point of semantic markup is you can have different renditions
of the one document. One that appeals to Lars and one that appeals to you;
all from the one source document.

>People who aim to define the standards of page-design need to master at
>least the basic rudiments of page-design.

XML doesn't aim to define the standards of page-design at all. The spec is
agnostic as to presentation.

>I'm asking for something very simple: arbitrary semantic tags within
>existing HTML conventions. As far as I know, this already exists,
>because the browser is supposed to ignore tags it doesn't recognise...
>right?

"Arbitrary semantic tags within existing HTML conventions". Sounds perfectly
feasible for XML documents. What's the problem?

>You're not getting it.
>
>WHEN YOU STOP CARING ABOUT STYLES, YOU STOP CARING ABOUT READERS.

If (and note that's an if) the destination of your document is a human
reader, you can use a stylesheet to assign express styles.

>And when you demand that authors give each new style a new name, based
>on structural analysis, you've stopped caring about authors.

Rubbish. Most styles in documents are a direct result of a decision on the

part of the author as to how a particular structural component of a document
is to be presented. The decision that something is a heading comes long
before the decision that something is to be 24 point times roman.

James

frank.r...@reedtech.com

unread,

Aug 31, 1998, 3:00:00 AM8/31/98

to

In article <1dekro8.1r3...@jorn.pr.mcs.net>,
jo...@mcs.com (Jorn Barger) wrote:

>
> I don't know what this means, but I do know that in four years of
> arguing, nobody has ever taken up my challenge to paraphrase my point of
> view without caricature.
>

OK, I'll take a swing at it: I think you're saying that the primary purpose
of "successor to HTML" should have been to allow web-authors to create the
specific effect that they want to convey, and that all other objectives
should have been distantly secondary.

My response is that it wasn't done that way, and there were reasons for doing
it the way it was done. SGML for instance is primarily used in situations
where there are large (thousands of pages) of information which must be
maintained and disseminated. Wherefore, people make the tradeoff of accepting
less than optimal 'human factors' in exchange for confidence that the reader
has correct and complete information. If I'm gonna ride the airplane I'll
make that tradeoff the same way too.

That kind of information is now moving onto the web. Only after the
overriding concerns of 'correct, current, complete' are dealt with does
aesthetics come in.

Another situation, looking forward, is having machines communicate with each
other without human intervention. Here the choice is "optimize the human
factors, and tie up a human to deal with it." or "simplify it for machines and
let humans do something more worthwhile for 29 days of the month while putting
up with the ugliness on the 30th."

Frank

Jorn Barger

unread,

Sep 1, 1998, 3:00:00 AM9/1/98

to

<frank.r...@reedtech.com> wrote:
> [...] nobody has ever taken up my challenge to paraphrase my point of

> > view without caricature.
> OK, I'll take a swing at it: I think you're saying that the primary purpose
> of "successor to HTML" should have been to allow web-authors to create the
> specific effect that they want to convey, and that all other objectives
> should have been distantly secondary.

I think I agree with the summary, but I don't see why you should paint
it as such a zero-sum challenge (getting the best of both worlds is not
that difficult).

And I don't consider that you've met my challenge, because you haven't
begun to summarise my *arguments*.

> [...] the tradeoff of accepting

> less than optimal 'human factors' in exchange for confidence that the reader
> has correct and complete information. If I'm gonna ride the airplane I'll
> make that tradeoff the same way too.

So it doesn't bother you that the maintenance worker may be numbed into
insensibiity, by acres of featureless text?

> That kind of information is now moving onto the web. Only after the
> overriding concerns of 'correct, current, complete' are dealt with does
> aesthetics come in.

Something tells me you don't even read your html.logs every day...
right?

Nobody cares that much about correct or complete. They care about
current, and about *readable*. (My own site is mostly boring/
Lynx-optimised, but I'm converting it is fast as possible, and my
html.logs show definite results each time I do.)

> Another situation, looking forward, is having machines communicate with each
> other without human intervention.

Great. I eagerly await great wonders from this. But given that humans
and computers have almost entirely disjunct requirements from data
files... why demand one syntax for both?

This is geek-think... the esthetics of pocket-protectors and glasses
fixed with maskingtape. It doesn't fly for the other 95% of humanity.

> Here the choice is "optimize the human factors, and tie up a human to deal
> with it." or "simplify it for machines and let humans do something more
> worthwhile for 29 days of the month while putting up with the ugliness on
> the 30th."

Or: do both, using separate syntaxes.

Startingpoint: existing HTML for styles (though I admit much
improvement is needed), with semantic tags co-existing, separate but
equal, totally ignored by the layout engine.

Let me review some of my old arguments from ciwah:

- "<P> <P>" is an abomination by anyone's standards, but in fact
the abominator here was TimBL, not Dave Siegal. Page-designers have
agonised over whitespace for centuries. The elegant, basic-level-HTML
solution should have been (and still can be) to interpret "<P><P>" as
skip-two-lines. Insisting that that's meaningless and should be
displayed as skip-one-line was just insane.

- STRONG and EMPH are another bad joke. The real way people-who-use-
those-tags decide which to use is to ask themselves "Do I want bold or
italic here?" and then do an extra 'translate' to assign the
'structural' tag. (It is not coincidental that they take ego-pleasure
in managing this extra mental effort. This A-student showoff-motive is
one root of the whole disease.) Empahsis on a page can't be reduced to
the three-speed model: plain-emph-strong. There are millions of other
possible variations *in emphasis* (size, capitalisation, whitespace,
punctuation, color) that all have to be *checked in context* to see if
they communicate the right level of emphasis. Demanding that these be
isolated on a stylesheet with a stylename created for each is HF hell.

- CENTER is a perfectly good tag. (A good general reference-point for
fixing HTML would be WordPerfect's beloved "Reveal Codes" tagset.)

- The argument that so-called structural tags are more portable is just
bogus. Eg, blind-readers can translate styles as easily as structures.

- The really urgent thing for portability is to make all *sizes*
relative rather than absolute. And it's insane not to declare
equivalencies between eg <H1> and <BIG> (or whatever). Page designers
have to know what's going to display bigger or smaller. What's needed
is a 'ruler' centered on comfortable-reading-size as 1 (or 0, or 10, or
whatever), and barely-readable as another definite value, and
the-biggest-tolerable as another. (Cf the temperature scale being
calibrated to water's boiling and freezing points.) Authors then style
their text to this ruler, and readers set their browsers to the absolute
sizes that work for them.

Jorn Barger

unread,

Sep 1, 1998, 3:00:00 AM9/1/98

to

Peter Flynn <silm...@m-net.arbornet.org> wrote:
> So the professional hacker would also let me muck arbitrarily with
> the syntax of C++ just because I don't happen to like curly braces?

"We ought to make word processing documents more like computer code" is
grotesque geek-think.

> Think again...it's a LANGUAGE. It has RULES. You have to FOLLOW them.
> No-one is asking you to like them, just to obey them if you want the
> required (predicted) result.

Here is a fact that you can ignore at your peril: You're not going to
get a community of millions of authors to use a new set of rules unless
there's a proven payoff. The payoff of XML for databases is pretty
clear. Its payoff for readable webpages is highly dubious.

The alternate upgrade-path referred to in my subjectline is to encourage
HTML-plus-arbitrary-semantic-tags. This gives enormous potential payoff
with basically-zero new rules.

> > | The W3C cult is ***afraid of esthetic freedom***.

> I'm not sure what cult this is. Maybe you mean W3C members? They have
> to pursue a different ethic, as they are in business, presumably to
> make money. Aesthetic freedom is possibly not involved here.

Very revealing comment!

Who is paying them? What do their customers demand?

I have almost zero info about how the W3C operates (entirely the fault
of their own poor communication skills, I'd say), but my guess is that
there's an ongoing massive wrestling marathon between the W3C's ivory
tower idealism, and the browser companies whose user base wants
convenience and good visuals.

> [...] SGML (and its

> derivatives) allow you to describe certain aspects of your documents in
> a way that lends itself to one or more forms of automated processing.

Please spell out for me the important gains of tidied markup over sloppy
markup, aside from parser speed and complexity?

> > | Many of the replies said, "If you don't like it, don't use it."
> > | This is unbelievably narrow-minded.

> No, it sounds rather sensible to me. No-one is forcing you to use XML.

The W3C has a responsibility to consider everyone's needs. Tying
semantic-markup (which most everyone wants) to tidy-markup (which almost
no-one wants) is extortion.

> [...] There's nothing to stop you designing your own language to do this,

Given the unbelievable ongoing arrogance of this point of view, I
suggest the Web community should start ostracising the W3C as the
conceited, destructive, self-interested clique it obviously is.

> but I fail to see why you are expecting a language which expressly forbids
> it to be changed because of your own whims.

Oooh, spank me, daddy...!

grrr

Jorn Barger

unread,

Sep 1, 1998, 3:00:00 AM9/1/98

to

I wrote:
> My own site is mostly boring/ Lynx-optimised, but I'm converting it as

> fast as possible, and my html.logs show definite results each time I do.

Here's more proof that looks are extremely important: Even regarding one
of the pages I've tarted up the most, today's Village Voice says:

"Jorn Barger's Robot Wisdom WebLog might not be pretty, but it's one of
the best collections of news and musings culled from the Web -- and
updated daily."

<URL:http://www.villagevoice.com/ink/cyber/36bunn.shtml>

Jorn Barger

unread,

Sep 1, 1998, 3:00:00 AM9/1/98

to

Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
> >So it doesn't bother you that the maintenance worker may be numbed into
> >insensibiity, by acres of featureless text?
>

> And what makes you think that the (SGML) airplane manual contains
> nothing but text? Or that the text can only be rendered as plain text?

It has an artificially constrained set of styles. This is the problem.

[...]
> And hire additional people to maintain two separate versions of the
> same data.

Yes. Styling and semanticising are disjunct skill sets. (In fact,
styling is pretty easy, though, as is basic semanticising, so we're not
talking extra costs to average users.)

> But insisting that an empty paragraph be displayed is *really* insane.

Why? It's expressing a greater semantic gap between the paragraphs
before and after it.

> >- The argument that so-called structural tags are more portable is just
> >bogus. Eg, blind-readers can translate styles as easily as structures.
>

> Really? Book titles and emphasis are both (often) printed in italics.
> Should they be pronounced the same?

In each case, the reader will say "italics... [text]... end italics" (or
whatever else the listener programs it to do). That they can use bogus
structural markup to finetune _intonations_ is a ridiculous myth.

--
I EDIT THE NET: <URL:http://www.mcs.net/~jorn/html/weblogs/weblog.html>

"One of the best collections of news and musings culled from the Web --
and updated daily." -- Austin Bunn in the Village Voice, 01 Sept 1998

Richard Noteboom

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

On Tue, 1 Sep 1998 08:58:35 -0500, jo...@mcs.com (Jorn Barger) wrote:

><frank.r...@reedtech.com> wrote:
>> [...] the tradeoff of accepting
>> less than optimal 'human factors' in exchange for confidence that the reader
>> has correct and complete information. If I'm gonna ride the airplane I'll
>> make that tradeoff the same way too.
>

>So it doesn't bother you that the maintenance worker may be numbed into
>insensibiity, by acres of featureless text?

And what makes you think that the (SGML) airplane manual contains
nothing but text? Or that the text can only be rendered as plain text?

[snip]

>> Here the choice is "optimize the human factors, and tie up a human to deal
>> with it." or "simplify it for machines and let humans do something more
>> worthwhile for 29 days of the month while putting up with the ugliness on
>> the 30th."
>
>Or: do both, using separate syntaxes.

And hire additional people to maintain two separate versions of the
same data.

[snip]

>- "<P> <P>" is an abomination by anyone's standards, but in fact
>the abominator here was TimBL, not Dave Siegal. Page-designers have
>agonised over whitespace for centuries. The elegant, basic-level-HTML
>solution should have been (and still can be) to interpret "<P><P>" as
>skip-two-lines. Insisting that that's meaningless and should be
>displayed as skip-one-line was just insane.

But insisting that an empty paragraph be displayed is *really* insane.

[snip]

>- The argument that so-called structural tags are more portable is just
>bogus. Eg, blind-readers can translate styles as easily as structures.

Really? Book titles and emphasis are both (often) printed in italics.
Should they be pronounced the same?

--
Richard Noteboom
Ric...@noteboom.demon.nl
http://www.noteboom.demon.nl/

'I will use the word "Hypermedia" to indicate that one is not bound to text.'
(Tim Berners-Lee, Original WWW proposal)
<URL:http://www.w3.org/History/1989/proposal.html>

frank.r...@reedtech.com

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

In article <1deo177.17g...@jorn.pr.mcs.net>,

jo...@mcs.com (Jorn Barger) wrote:
> <frank.r...@reedtech.com> wrote:

> > [...] nobody has ever taken up my challenge to paraphrase my point of
> > > view without caricature.
> > OK, I'll take a swing at it: I think you're saying that the primary purpose
> > of "successor to HTML" should have been to allow web-authors to create the
> > specific effect that they want to convey, and that all other objectives
> > should have been distantly secondary.
>

>

> > That kind of information is now moving onto the web. Only after the
> > overriding concerns of 'correct, current, complete' are dealt with does
> > aesthetics come in.
>
> Something tells me you don't even read your html.logs every day...
> right?

>
> Nobody cares that much about correct or complete. They care about

> current, and about *readable*. (My own site is mostly boring/
> Lynx-optimised, but I'm converting it is fast as possible, and my
> html.logs show definite results each time I do.)

I do. I'm a geek. My wife threw out my pocket protectors.
I do text programming for a living. Websites are only one target.
Frankly your arguments sound just like those I used to get from people who did
trifold brochures in Pagemaker about why SGML was not just irrelevant to them
but positively evil because it prevented the kind of tweaks they made their
living by.

>
> > Another situation, looking forward, is having machines communicate with each
> > other without human intervention.
>
> Great. I eagerly await great wonders from this. But given that humans
> and computers have almost entirely disjunct requirements from data
> files... why demand one syntax for both?

Because then you know you're looking at the same data.

>
> This is geek-think... the esthetics of pocket-protectors and glasses
> fixed with maskingtape. It doesn't fly for the other 95% of humanity.

This is the outlook taken by people who build things that go bang, or fall out
of the sky, or cause stock market crashes when they don't work. They'll act
differently in their off hours, but that's different.

>
> > Here the choice is "optimize the human factors, and tie up a human to deal
> > with it." or "simplify it for machines and let humans do something more
> > worthwhile for 29 days of the month while putting up with the ugliness on
> > the 30th."
>
> Or: do both, using separate syntaxes.

Oh goody. Two different things to break, and a chance for inconsistency.

Look. Whether or not you believe it, I get payed for "right" not for
"pretty". "Ugly" might let the competition move in over a few years. "Wrong"
and we're dead tomorrow. Apparently you're in a different situation. Cool.
But if folks in my position make a spec for us to use, it doesn't have to fit
your interests as well.

Richard Noteboom

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

On Tue, 1 Sep 1998 22:23:44 -0500, jo...@mcs.com (Jorn Barger) wrote:

>Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
>> >So it doesn't bother you that the maintenance worker may be numbed into
>> >insensibiity, by acres of featureless text?
>>
>> And what makes you think that the (SGML) airplane manual contains
>> nothing but text? Or that the text can only be rendered as plain text?
>

>It has an artificially constrained set of styles. This is the problem.

What do you mean by that? SGML can be rendered in just about any form
(and as many forms) as technology allows.

>
>[...]

>> And hire additional people to maintain two separate versions of the
>> same data.
>

>Yes. Styling and semanticising are disjunct skill sets. (In fact,
>styling is pretty easy, though, as is basic semanticising, so we're not
>talking extra costs to average users.)

I'm not talking about the work that has to be done to mark up both
versions, I'm talking about the work that has to be done to ensure
that both versions contain the same actual data. (You know, company
moves to different address, address in human-readable version is
changed, address in computer-readable version is not...)

>
>> But insisting that an empty paragraph be displayed is *really* insane.
>

>Why? It's expressing a greater semantic gap between the paragraphs
>before and after it.

Why is that "semantic gap" there? Could it be that the paragraphs
*before* the gap belong to another section than the paragraphs *after*
the gap? That calls for a new SECTION element.

>
>> >- The argument that so-called structural tags are more portable is just
>> >bogus. Eg, blind-readers can translate styles as easily as structures.
>>
>> Really? Book titles and emphasis are both (often) printed in italics.
>> Should they be pronounced the same?
>

>In each case, the reader will say "italics... [text]... end italics" (or
>whatever else the listener programs it to do). That they can use bogus
>structural markup to finetune _intonations_ is a ridiculous myth.

Voice synthesizers that can't handle intonation? What *is* the world
coming to?

Jorn Barger

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
> >It has an artificially constrained set of styles. This is the problem.
>
> What do you mean by that? SGML can be rendered in just about any form
> (and as many forms) as technology allows.

In a word-processor, you select a range of text, and apply a style.

As I understand SGML/XML, there are extra steps required, and limits
imposed.

> [...] I'm talking about the work that has to be done to ensure

> that both versions contain the same actual data.

I don't know how separate-markup-systems became separate-documents.

> Why is that "semantic gap" there? Could it be that the paragraphs
> *before* the gap belong to another section than the paragraphs *after*
> the gap? That calls for a new SECTION element.

We're talking HTML.

But SECTION may be a good example of the geek-think constraints I object
to: When I write, I skip lines intuitively based on my current context.
I don't want to have to look back over the preceding paragraphs to
decide which are sections and subsections.

> Voice synthesizers that can't handle intonation? What *is* the world
> coming to?

Last I checked, they *don't*. My informant was Chaumont Devin, who
complains that a very simple intonation algorithm involving commas and
rising intonation would be useful, but has (had) never been done.

Do you really have any facts about this (or are you just like the rest
of your sub-species)?

--
I EDIT THE NET: <URL:http://www.mcs.net/~jorn/html/weblogs/weblog.html>

Richard Noteboom

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

On Tue, 1 Sep 1998 09:01:19 -0500, jo...@mcs.com (Jorn Barger) wrote:

>The alternate upgrade-path referred to in my subjectline is to encourage
>HTML-plus-arbitrary-semantic-tags. This gives enormous potential payoff
>with basically-zero new rules.

And how big is the (potential) payoff for an author who uses tags
no-one else knows?
Is FAXNUM the same as FAXNUMBER, FAX, FAXNR, FAXNO, TELEFAX?

[snip]

>The W3C has a responsibility to consider everyone's needs. Tying
>semantic-markup (which most everyone wants) to tidy-markup (which almost
>no-one wants) is extortion.

And what are the semantics of <FOO>...<BAR>...</FOO>...</BAR> ?

Jorn Barger

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
> And how big is the (potential) payoff for an author who uses tags
> no-one else knows?
> Is FAXNUM the same as FAXNUMBER, FAX, FAXNR, FAXNO, TELEFAX?

1) Back atcha. XML is just the same.

2) The route I propose is that:

a) people use whatever tags they like

b) at least one search engine index words accourding to whatever
arbitrary tags enclose them

c) the *useful* ones get standardised by darwinian processes

> And what are the semantics of <FOO>...<BAR>...</FOO>...</BAR> ?

That isn't really a coherent question, but here's another
counter-example:

<person>Steven <magazine>Brill</person>'s Content</magazine>

My 2b above should have no problem indexing this.

j

Richard Noteboom

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

On Wed, 2 Sep 1998 15:13:31 -0500, jo...@mcs.com (Jorn Barger) wrote:

>Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
>> >It has an artificially constrained set of styles. This is the problem.
>>
>> What do you mean by that? SGML can be rendered in just about any form
>> (and as many forms) as technology allows.
>
>In a word-processor, you select a range of text, and apply a style.
>
>As I understand SGML/XML, there are extra steps required, and limits
>imposed.

Yes. The limits of the actual structure of the text. So?

>
>> [...] I'm talking about the work that has to be done to ensure
>> that both versions contain the same actual data.
>
>I don't know how separate-markup-systems became separate-documents.
>
>> Why is that "semantic gap" there? Could it be that the paragraphs
>> *before* the gap belong to another section than the paragraphs *after*
>> the gap? That calls for a new SECTION element.
>
>We're talking HTML.

DIV, then.

>
>But SECTION may be a good example of the geek-think constraints I object
>to: When I write, I skip lines intuitively based on my current context.
>I don't want to have to look back over the preceding paragraphs to
>decide which are sections and subsections.

But that's what you *do* by skipping lines. Why can't you just add the
appropriate tags instead?

>
>> Voice synthesizers that can't handle intonation? What *is* the world
>> coming to?
>
>Last I checked, they *don't*. My informant was Chaumont Devin, who
>complains that a very simple intonation algorithm involving commas and
>rising intonation would be useful, but has (had) never been done.

That merely means that there is no algorithm (yet) that can transform
natural language into something that causes a voice synthesizer to
change intonation. It doesn't mean that a voice synthesizer is
incapable of doing intonation.

>
>Do you really have any facts about this (or are you just like the rest
>of your sub-species)?

What sub-species would that be? "People who understand structural
markup"? In that case, the people in the Graphic Communications
Association <URL:http://www.gca.org> also belongs to that sub-species.

Jorn Barger

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
> >As I understand SGML/XML, there are extra steps required, and limits
> >imposed.
>
> Yes. The limits of the actual structure of the text. So?

Mmm-hmmm, a Platonist.

> >I don't want to have to look back over the preceding paragraphs to
> >decide which are sections and subsections.
>
> But that's what you *do* by skipping lines. Why can't you just add the
> appropriate tags instead?

Why don't you accept a non-Aristotelian paradigm that allows me to
express my intuitive sense of semantic distance without having to *go
back* and re-jigger the ideal Platonic hierarchy?

> That merely means that there is no algorithm (yet) that can transform
> natural language into something that causes a voice synthesizer to
> change intonation. It doesn't mean that a voice synthesizer is
> incapable of doing intonation.

Quibbling about the meaning of 'can' is transparently pathetic.

> >Do you really have any facts about this (or are you just like the rest
> >of your sub-species)?
>
> What sub-species would that be? "People who understand structural
> markup"?

You're really barking up the wrong tree if you think my AI is weak.
(I got a nice letter this morning from a Russian fellow who wants to
translate my AI FAQ [1], which he called "...real new experience, new
point of view. Such wonderful criticism woven into high content!")

The truth is, *ML-cultists labor under a circa-1950 model of AI, in
which machine translation is just a matter of a few more algorithmic
hacks.

j
[1] <URL:http://www.mcs.net/~jorn/html/ai.html>

--
I EDIT THE NET: <URL:http://www.mcs.net/~jorn/html/weblogs/weblog.html>

Richard Noteboom

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

On Wed, 2 Sep 1998 16:03:25 -0500, jo...@mcs.com (Jorn Barger) wrote:

>Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
>> >As I understand SGML/XML, there are extra steps required, and limits
>> >imposed.
>>
>> Yes. The limits of the actual structure of the text. So?
>
>Mmm-hmmm, a Platonist.
>
>> >I don't want to have to look back over the preceding paragraphs to
>> >decide which are sections and subsections.
>>
>> But that's what you *do* by skipping lines. Why can't you just add the
>> appropriate tags instead?
>
>Why don't you accept a non-Aristotelian paradigm that allows me to
>express my intuitive sense of semantic distance without having to *go
>back* and re-jigger the ideal Platonic hierarchy?

Why would you have to go back?
Version 1
1. Human thinks "New subdivision"
2. Human types 2 newlines
3. Human starts on new text

Version 2
1. Human thinks "New subdivision"
2. Human types </DIV><DIV>
3. Human starts on new text

>
>> That merely means that there is no algorithm (yet) that can transform
>> natural language into something that causes a voice synthesizer to
>> change intonation. It doesn't mean that a voice synthesizer is
>> incapable of doing intonation.
>
>Quibbling about the meaning of 'can' is transparently pathetic.

How about this:
Computers are (still) uncapable of processing natural language (well),
so humans add markup to their texts if they want them to be processed
by computers.

>
>> >Do you really have any facts about this (or are you just like the rest
>> >of your sub-species)?
>>
>> What sub-species would that be? "People who understand structural
>> markup"?
>
>You're really barking up the wrong tree if you think my AI is weak.
>(I got a nice letter this morning from a Russian fellow who wants to
>translate my AI FAQ [1], which he called "...real new experience, new
>point of view. Such wonderful criticism woven into high content!")
>
>The truth is, *ML-cultists labor under a circa-1950 model of AI, in
>which machine translation is just a matter of a few more algorithmic
>hacks.

Who said anything about AI? I don't want a computer to understand the
text (yet), I just want it to process the text according to my idea of
what the structure of the text is.

Richard Noteboom

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

On Wed, 2 Sep 1998 15:26:41 -0500, jo...@mcs.com (Jorn Barger) wrote:

>Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
>> And how big is the (potential) payoff for an author who uses tags
>> no-one else knows?
>> Is FAXNUM the same as FAXNUMBER, FAX, FAXNR, FAXNO, TELEFAX?
>
>1) Back atcha. XML is just the same.
>
>2) The route I propose is that:
>
>a) people use whatever tags they like
>
>b) at least one search engine index words accourding to whatever
>arbitrary tags enclose them
>
>c) the *useful* ones get standardised by darwinian processes

And how do I use these search engines to find information? Asking it
for "123" enclosed in FAXNUM tags is nice, but what if the people with
fax numbers containing "123" use FAXNR tags? How do I know which tags
to search for - there are no publicly available specs to tell me...

>
>> And what are the semantics of <FOO>...<BAR>...</FOO>...</BAR> ?
>
>That isn't really a coherent question, but here's another
>counter-example:
>
><person>Steven <magazine>Brill</person>'s Content</magazine>

And how does a computer know that PERSON doesn't end where MAGAZINE
begins? Or are we using some meta-language rule saying that start tags
*must* be followed by end tags?

Jorn Barger

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

[combining two messages for efficiency]

Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
> Why would you have to go back? [...]

> Version 2
> 1. Human thinks "New subdivision"
> 2. Human types </DIV><DIV>
> 3. Human starts on new text

Gimme a break! Do you really not see the fallacy here?

> How about this:
> Computers are (still) uncapable of processing natural language (well),
> so humans add markup to their texts if they want them to be processed
> by computers.

You can't justify SGML by claiming it will be useful to AI someday.

> Who said anything about AI? I don't want a computer to understand the
> text (yet), I just want it to process the text according to my idea of
> what the structure of the text is.

Any analysis of text structures counts as AI, in my book.

> And how do I use these search engines to find information? Asking it
> for "123" enclosed in FAXNUM tags is nice, but what if the people with
> fax numbers containing "123" use FAXNR tags? How do I know which tags
> to search for - there are no publicly available specs to tell me...

Here's how it works:

Many people try many different sorts of tags. There is no
standardisation. One especially popular website uses one tag that turns
out to be useful. Others copy it.

This is called 'natural selection' (more or less).

> ><person>Steven <magazine>Brill</person>'s Content</magazine>
>
> And how does a computer know that PERSON doesn't end where MAGAZINE
> begins? Or are we using some meta-language rule saying that start tags
> *must* be followed by end tags?

Christ, meet me halfway!!!

Obviously I don't want parsers to become *less* forgiving.

The </> end-tag only works if people are meticulous about containership
relations, so for XHTML I'd clearly want to stick with the </explicit>
approach, with the same exceptions that are currently allowed.

(I really suspect insincerity, given the way you're confusing various
distinct strands in that question.)

j
--
I EDIT THE NET: <URL:http://www.mcs.net/~jorn/html/weblogs/weblog.html>

Richard Noteboom

unread,

Sep 2, 1998, 3:00:00 AM9/2/98

to

On Wed, 2 Sep 1998 18:01:59 -0500, jo...@mcs.com (Jorn Barger) wrote:

>[combining two messages for efficiency]
>
>Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
>> Why would you have to go back? [...]
>> Version 2
>> 1. Human thinks "New subdivision"
>> 2. Human types </DIV><DIV>
>> 3. Human starts on new text
>
>Gimme a break! Do you really not see the fallacy here?

No.

>
>> How about this:
>> Computers are (still) uncapable of processing natural language (well),
>> so humans add markup to their texts if they want them to be processed
>> by computers.
>
>You can't justify SGML by claiming it will be useful to AI someday.

What does processing text have to do with AI. Processing *language*,
yes, but *text*, no.

>
>> Who said anything about AI? I don't want a computer to understand the
>> text (yet), I just want it to process the text according to my idea of
>> what the structure of the text is.
>
>Any analysis of text structures counts as AI, in my book.

But the *human* does the analysis. The computer merely gets a text in
which the structures have already been identified.

>
>> And how do I use these search engines to find information? Asking it
>> for "123" enclosed in FAXNUM tags is nice, but what if the people with
>> fax numbers containing "123" use FAXNR tags? How do I know which tags
>> to search for - there are no publicly available specs to tell me...
>
>Here's how it works:
>
>Many people try many different sorts of tags. There is no
>standardisation. One especially popular website uses one tag that turns
>out to be useful. Others copy it.

And how do I know that FAXNUM om website X is more useful than FAXNR
on website Y? How do I even know that website X is more popular than
website Y?

>
>This is called 'natural selection' (more or less).
>
>> ><person>Steven <magazine>Brill</person>'s Content</magazine>
>>
>> And how does a computer know that PERSON doesn't end where MAGAZINE
>> begins? Or are we using some meta-language rule saying that start tags
>> *must* be followed by end tags?
>
>Christ, meet me halfway!!!
>
>Obviously I don't want parsers to become *less* forgiving.
>
>The </> end-tag only works if people are meticulous about containership
>relations, so for XHTML I'd clearly want to stick with the </explicit>
>approach, with the same exceptions that are currently allowed.

Now thing get really difficult:
<P>Some text <PERSON>Name</PERSON> More text.<P>Still more text

Where does the first P element end?

k98...@hotmail.com

unread,

Sep 3, 1998, 3:00:00 AM9/3/98

to

In article <1deqhu3.q2e...@jorn.pr.mcs.net>,

jo...@mcs.com (Jorn Barger) wrote:
> Do you really have any facts about this (or are you just like the rest
> of your sub-species)?

While I fully support the right of Jorn Barger (or anyone else) to speak
here, I wish he would choose to do so with a bit more courtesy.

josh

Mark C. Swenson

unread,

Sep 3, 1998, 3:00:00 AM9/3/98

to

Jorn Barger wrote:
>
> Richard Noteboom <Ric...@noteboom.demon.nl> wrote:
> > And how big is the (potential) payoff for an author who uses tags
> > no-one else knows?
> > Is FAXNUM the same as FAXNUMBER, FAX, FAXNR, FAXNO, TELEFAX?
>

> c) the *useful* ones get standardised by darwinian processes

Yes, but in the meantime, the users get penalized, because
they cannot find their damn information, and all authors
start adding all META tags, so they can have their info found

<META NAME="FAXNUM" CONTENT="your number">
<META NAME="FAXNUMBER" CONTENT="your number">
<META NAME="FAX" CONTENT="your number">
<META NAME="FAXNR" CONTENT="your number">
<META NAME="FAXNO" CONTENT="your number">
<META NAME="TELEFAX" CONTENT="your number">
...

While I agree that trying to standardize is a tough thing to do,
at least some of the tags could be standardized.

--
Mark C. Swenson
mailto:mark.c....@boeing.com
"Employees are our most valuable asset . . . I say we sell them"

Peter Flynn

unread,

Sep 6, 1998, 3:00:00 AM9/6/98

to Jorn Barger

Jorn Barger wrote:
> [...] if you

> actually try designing attractive pages that communicate well, you'll
> quickly discover that it's all about thousands of subtle variations in
> emphasis, none of which have structural names yet.

That's because they are not structural features. Last night I changed
the
spacing between all lowercase "av"s in a document because the font I'm
using appears to have a poorly assigned kern for that pair. This is one
of those subtle things the user will never notice unless I left it
undone,
in which case lots of people would say "yuck, how uneven that looks".
But
I didn't use SGML to make this change, even though the document
originates
in an SGML system. I did it in the typesetting system.

> Pick up any published document at random (book, newspaper, etc, not
> journal articles), and start at the top corner, and define a structural
> tag for each new style. If you really do this, you'll see how
> impossible the task is.

For books this is usually fairly simple, and it is often revealing about
how poorly edited some books are, even from reputable houses (and I'm
not immune: my own books have similar flaws, although I hope fewer :-)

For newspapers this would be rather pointless, as the size and style is
often dictated by pressure of space or social/political expediency,
rather
than typographics. The days when the London _Times_ would make weighty
and
considered typographic judgments over the location and setting of an
article
are, alas, long gone. Nevertheless, I submit that most articles have a
caption or headline which is typographically distinct from the body
copy,
and that recurrent inline features like quotation, citation, and
annotation
can readily be detected, and could thus be assigned some form of markup.

Peter Flynn

unread,

Sep 14, 1998, 3:00:00 AM9/14/98

to

Jorn Barger wrote:
> > But insisting that an empty paragraph be displayed is *really* insane.
>
> Why? It's expressing a greater semantic gap between the paragraphs
> before and after it.

But that's the whole point: it's not. If the designer/author or
whoever wants a bigger gap than normal between two paragraphs,
there must be a good reason for it...so you need markup which
describes that reason (eg there is a change in the argument, or
some similar disjunction in the train of thought). In designing
markup systems, the markup itself should be the slave of the
function, not its master. This is what document analysis is for:
if you have to abuse markup to get the desired result, either
you have done the doc anal wrong, or you're using the wrong doc
type.

Peter Flynn

unread,

Sep 14, 1998, 3:00:00 AM9/14/98

to

Jorn Barger wrote:
> In a word-processor, you select a range of text, and apply a style.

But only for a reason. The markup you select should describe that
reason.

The difference between SGML systems and wordprocessors is that SGML
systems
let you describe that reason, whereas wordprocessors only show how it
looks
after that reasoning has been applied and elided.

> As I understand SGML/XML, there are extra steps required, and limits
> imposed.

None at all if the markup available describes the features of the
document
type correctly.

Jorn Barger

unread,

Sep 14, 1998, 3:00:00 AM9/14/98

to

Peter Flynn <silm...@m-net.arbornet.org> wrote:
> [...] If the designer/author or

> whoever wants a bigger gap than normal between two paragraphs,
> there must be a good reason for it

Why? Or more important: What if the author doesn't know the reason?

>...so you need markup which
> describes that reason (eg there is a change in the argument, or
> some similar disjunction in the train of thought).

Only an academic could make this argument without noticing how much
unnecessary HF-burden it adds for authors.

> In designing
> markup systems, the markup itself should be the slave of the
> function, not its master.

Why?

The usual, bogus reason is that it makes documents more portable, or
allows bots to extract useful info.

But in fact, if you actually understand page design, the choice of
styles has essentially *nothing* to do with semantics, it's primarily
about variations in emphasis, entirely context-dependent, and in the
case of color and font-face *impossible to analyse*.

(Of course this doesn't apply if you use rock-dull academic journals as
your style paradigm.)

> This is what document analysis is for:

And the difficulty of it is why experts get paid so much money... which
is why it's such a bad idea for a general standard.

> if you have to abuse markup to get the desired result, either
> you have done the doc anal wrong, or you're using the wrong doc
> type.

Gosh, I'm so naughty... again! Why didn't I stay in college for another
six years like I oughta???

My new year's resolution: From now on, I'll submit all my documents to
this forum for prior markup-semantics approval...

j
wondering how i got suckered back into this, but inclined to blame a
persistent auto-tag filter

--
I EDIT THE NET: <URL:http://www.mcs.net/~jorn/html/weblogs/weblog.html>
"One of the best collections of news and musings culled from the Web --

and updated daily." -- Austin Bunn in the Village Voice, 8 Sept 1998

Jorn Barger

unread,

Sep 14, 1998, 3:00:00 AM9/14/98

to

I wrote:
> But in fact, if you actually understand page design, the choice of
> styles has essentially *nothing* to do with semantics, it's primarily
> about variations in emphasis, entirely context-dependent, and in the
> case of color and font-face *impossible to analyse*.

On re-reading this, I heard howls of knee-jerk protest that I'd recently
said all whitespace has semantics, so I'm contradicting myself.

The point is, the semantics *that bots want* have nothing to do with
styles. There's no faxnum style, there's no unit-price style.

The styles that designers want have deep semantics, presumably, but it's
absurd to demand that they be formalised, because they're 99% intuition,
once you get beyond the basic journal-article template (etc).

So what's needed is disjunct markup for semantics and styles, which I'm
calling "XHTML".

(I read this morning that Microsoft has apparently rejected the
straight-XML model, and I say good for them. Leave that to the database
jocks.)

Oyarce Guillermo Alfredo

unread,

Sep 14, 1998, 3:00:00 AM9/14/98

to

Jorn,

Actually it is not that you contradicted yourself elsewhere but in
the same paragraph because there is meaning in every part of a [good]
design - meaning styles and everything else you mention even though
you say some of them are *impossible to analyse*:

> > But in fact, if you actually understand page design, the choice of
> > styles has essentially *nothing* to do with semantics, it's primarily
> > about variations in emphasis, entirely context-dependent, and in the
> > case of color and font-face *impossible to analyse*.

explainable when you try to qualify your statement:

> The point is, the semantics *that bots want* have nothing to do with
> styles. There's no faxnum style, there's no unit-price style.

You are limiting yourself to only what *current* bots want, without
considering what they will want, given the opportunity. You seem to
be able to understand the need for some specific tags but meaning can
be conveyed through some other *tagless* means - say, different styles.
The adoption of flexible standards and means can only empower authors.
What about non-authors?

Most of the current web standards are author oriented. This is
fine and dandy but the web has users to who want to *find* that
information. In fact, most people I know place information in
the web with hopes to share it - thus they want others to find it.
It is a common and understandable goal. Let us have some more
information seekers tools. This does not necessarily mean more
tags or DTDs but intelligent bots and other similarly intelligent
document analysis software.

A flexible structure can still foster creativity while facilitating
document retrieval.

Cheers!

Guillermo Oyarce

David Brownell

unread,

Sep 16, 1998, 3:00:00 AM9/16/98

to

Jorn Barger wrote:
>
> Peter Flynn <silm...@m-net.arbornet.org> wrote:
> > [...] If the designer/author or
> > whoever wants a bigger gap than normal between two paragraphs,
> > there must be a good reason for it
>

> Why? Or more important: What if the author doesn't know the reason?

If there's a reason, SOMEBODY knows it and it should be
made evident, else it's not going to be understood. And
what about trying to communicate that reason on a device
for which whitespace isn't free -- say a color printer
(some are many $/page!) or a pager with a 4 line display.

> >...so you need markup which
> > describes that reason (eg there is a change in the argument, or
> > some similar disjunction in the train of thought).
>
> Only an academic could make this argument without noticing how much
> unnecessary HF-burden it adds for authors.

Bull. If an author wants to communicate, they must take care
to get their message across. There's no substitute for being
explicit. Anyone who writes for a living knows that.

> (I read this morning that Microsoft has apparently rejected the
> straight-XML model, and I say good for them. Leave that to the
> database jocks.)

Or people who care about open systems, perhaps. As a monopolist,
it's in Microsoft's interest to preserve barriers to entry, and
even to creatate new ones. Why don't you ask them how serious
they are about adopting open standards? Of course when they mix in
all their proprietary stuff (XHTML == XML + HTML, neither one nor
the other) it's no longer "open", and only Microsoft can evolve it.

- Dave

Jorn Barger

unread,

Sep 16, 1998, 3:00:00 AM9/16/98

to

David Brownell <brow...@ix.netcom.com> wrote:
> > > [...] If the designer/author or
> > > whoever wants a bigger gap than normal between two paragraphs,
> > > there must be a good reason for it
> > Why? Or more important: What if the author doesn't know the reason?
>
> If there's a reason, SOMEBODY knows it and it should be
> made evident, else it's not going to be understood.

This is the basic philosophical dichotomy that I find so frustrating.
Michael Polanyi's "Personal Knowledge" comes to mind.

Again, as a challenge I propose you look at any random copyright page of
a book, and formalise the spacing between each line.

It's not an easy or fun task, and it's only useful in a highly unproven
ivory-tower theory.

> And what about trying to communicate that reason on a device
> for which whitespace isn't free -- say a color printer
> (some are many $/page!) or a pager with a 4 line display.

You just build a 'suppress whitespace' option into your print dialog.

[...]

> Bull. If an author wants to communicate, they must take care
> to get their message across. There's no substitute for being
> explicit. Anyone who writes for a living knows that.

Well, the Perseus Project tried their best to do it with TEI, and here's
what they concluded:

"In most instances, we enter the entire text of a print document, but
it is not always clear how much tagging is worth the effort -- a
determined worker can spend years using TEI-conformant tags to add
structure to a given document. Determining the appropriate level of
tagging is the major decision that any editor must make, and there are
no simplistic answers." [1]

Given that these people actually tried to do it, and learned how
ivory-tower their ideals were... wouldn't the W3C be well-advised to
show a little restraint, in trying to impose this on the 300+ million
pages already on the web as a whole?

> > (I read this morning that Microsoft has apparently rejected the
> > straight-XML model, and I say good for them. Leave that to the
> > database jocks.)
>
> Or people who care about open systems, perhaps. As a monopolist,
> it's in Microsoft's interest to preserve barriers to entry, and
> even to creatate new ones. Why don't you ask them how serious
> they are about adopting open standards? Of course when they mix in
> all their proprietary stuff (XHTML == XML + HTML, neither one nor
> the other) it's no longer "open", and only Microsoft can evolve it.

Think for two seconds before you leap from not-XML to not-open. The
connection is entirely in your own imagination.

j
[1] <URL:http://www.dlib.org/dlib/january98/01crane.html>

David Brownell

unread,

Sep 16, 1998, 3:00:00 AM9/16/98

to

Jorn Barger wrote:
>
> Again, as a challenge I propose you look at any random copyright page of
> a book, and formalise the spacing between each line.

That layout improves presentation; art, more than formality. Next?

> It's not an easy or fun task, and it's only useful in a highly unproven
> ivory-tower theory.

Gee, and I thought you were blaming everyone ELSE for being an
ivory tower person. Now that you're coming out of the closet...

> [...]
> > Bull. If an author wants to communicate, they must take care
> > to get their message across. There's no substitute for being
> > explicit. Anyone who writes for a living knows that.
>
> Well, the Perseus Project tried their best to do it with TEI, and here's
> what they concluded:
>
> "In most instances, we enter the entire text of a print document, but
> it is not always clear how much tagging is worth the effort -- a
> determined worker can spend years using TEI-conformant tags to add
> structure to a given document. Determining the appropriate level of
> tagging is the major decision that any editor must make, and there are
> no simplistic answers." [1]
>
> Given that these people actually tried to do it, and learned how
> ivory-tower their ideals were...

That wasn't the conclusion, reread it!

One of the reasons that project LIKED the {X,SG}ML/TEI stuff was the
way it offered many different editors the flexibility they needed.
As they said, "we can already add quite a bit of functionality to a
properly tagged TEI conformant document when it is published in HTML
form on the Web." Their large scale project had different goals than
a scholar's focussed work; TEI supports both.

That is, your reference identifies the VALUE of semantic markup.

> wouldn't the W3C be well-advised to
> show a little restraint, in trying to impose this on the 300+ million
> pages already on the web as a whole?

Absolutely. The Perseus folk said it's a hard problem, with no simple
"one size fits all" answers; you're saying there's one answer, yours.

Wouldn't you be well advised to show a little restraint in trying
to impose your answer on N-billion pages of content to be authored,
generated, revised, enhanced, etc over the next decade?

> > > (I read this morning that Microsoft has apparently rejected the
> > > straight-XML model, and I say good for them. Leave that to the
> > > database jocks.)
> >
> > Or people who care about open systems, perhaps. As a monopolist,
> > it's in Microsoft's interest to preserve barriers to entry, and
> > even to creatate new ones. Why don't you ask them how serious
> > they are about adopting open standards? Of course when they mix in
> > all their proprietary stuff (XHTML == XML + HTML, neither one nor
> > the other) it's no longer "open", and only Microsoft can evolve it.
>
> Think for two seconds before you leap from not-XML to not-open. The
> connection is entirely in your own imagination.

Think for a moment before you accuse. I didn't say that; I said XML
is open, so it's natural Microsoft would have reasons to avoid it.

- Dave

> [1] http://www.dlib.org/dlib/january98/01crane.html

Jorn Barger

unread,

Sep 16, 1998, 3:00:00 AM9/16/98

to

David Brownell <brow...@ix.netcom.com> refuses my attempts to raise the
level of discourse, putting an uncalled-for personal attack in the
subjectline, and writes:

> > Again, as a challenge I propose you look at any random copyright page of
> > a book, and formalise the spacing between each line.
>
> That layout improves presentation; art, more than formality. Next?

But you want page-authors to have to label the meaning of every
style-choice, right?

> Absolutely. The Perseus folk said it's a hard problem, with no simple
> "one size fits all" answers; you're saying there's one answer, yours.

Considering that I've said over and over that the database jocks are
welcome to do whatever they want with XML, this seems pretty
disingenuous.

The 'one answer' I'm asking for is a compromise standard that adds
freeform semantic-markup experimentation to existing HTML conventions.
And yet you attack me as if I'm violating someone's freedom???

> Wouldn't you be well advised to show a little restraint in trying
> to impose your answer on N-billion pages of content to be authored,
> generated, revised, enhanced, etc over the next decade?

Would someone else please speak up for the principles of free debate,
and against this cultish dishonesty?

>
> > > > (I read this morning that Microsoft has apparently rejected the
> > > > straight-XML model, and I say good for them. Leave that to the
> > > > database jocks.)
> > >
> > > Or people who care about open systems, perhaps. As a monopolist,
> > > it's in Microsoft's interest to preserve barriers to entry, and
> > > even to creatate new ones. Why don't you ask them how serious
> > > they are about adopting open standards? Of course when they mix in
> > > all their proprietary stuff (XHTML == XML + HTML, neither one nor
> > > the other) it's no longer "open", and only Microsoft can evolve it.
> >
> > Think for two seconds before you leap from not-XML to not-open. The
> > connection is entirely in your own imagination.
>
> Think for a moment before you accuse. I didn't say that; I said XML
> is open, so it's natural Microsoft would have reasons to avoid it.

But you're obviously trying to tar my proposal with their sins, to shut
down the debate without ever acknowledging my valid points.

Steve Hueners

unread,

Sep 16, 1998, 3:00:00 AM9/16/98

to

[Maybe I shouldn't be adding anything to this well-tended thread cuz
of the patient and thorough job being done by the regulars but what
the hell...if they can be as patient as they've been with your trolls
they can bear my short interjection.]

Anyway....not like anyone's going to convince you of the invalidity of
your argument (can't walk much less fly) but try this.

You don't want rules restricting markup relying instead on chaos to
sort the best out. This has some degree of merit...the internet is the
result of an astonishing balance t'wixt chaos and standards. It's
continued evolution will likely be more of the same.

On the other hand, the web has become a dangerous timesink _because of
the lack of markup rules. It's counter-balancing claim to creative
contribution (dubious at best) isn't impeded by a strict, formal XML
spec.

Just like roads would be lethal without strict adherence to rules.
There's appropriate places for those who need to express themselves
outside the bounds...just can't intermingle the two. Can't. No matter
how condescending you become towards "those fascist traffic cops".

and...
Perhaps you replied to the earlier question and the inadequate
threading capabilities of Usenet (a good DTD would take care of that,
right?) mangled the navigation but did you ever reply to:

> Please provide _one_ single proof of where whitespace can be used to
> convey informative content.

[vertical space clipped]

>Ok, good example... (quoting your own words here)
>
>"I just lost all respect for you"
>
>Rearranging your own words with a different whitespace inclusion
>now conveys a different meaning to you?

Since so many folks have contributed time attempting to enlighten you,
it seems only fair to reply to direct questions that clearly go to the
meat of your matter.

--steve...

Jorn Barger

unread,

Sep 16, 1998, 3:00:00 AM9/16/98

to

Steve Hueners <st...@juststeve.com> wrote:
> [Maybe I shouldn't be adding anything to this well-tended thread cuz
> of the patient and thorough job being done by the regulars but what
> the hell...if they can be as patient as they've been with your trolls
> they can bear my short interjection.]

How can anyone let this sort of scornful dismissal pass, and then turn
around and blame me for flaming back?

If any of you make the slightest effort to look into my website, you'll
see that my motivations are deeply sincere and consistent, and
articulated with greater care over a wider range of topics, I'll assert,
than any other site on the Net.

So these personal attacks just go to prove how false your own motives
must be.

> Anyway....not like anyone's going to convince you of the invalidity of
> your argument (can't walk much less fly) but try this.

Nobody can even paraphrase my arguments!

I've shown again and again that I'm not proposing the trivial straw-men
you accuse me of, but that never in the least engages any sort of
reality-testing in your attitudes.

> You don't want rules restricting markup relying instead on chaos to
> sort the best out. This has some degree of merit...the internet is the
> result of an astonishing balance t'wixt chaos and standards. It's
> continued evolution will likely be more of the same.
>
> On the other hand, the web has become a dangerous timesink _because of
> the lack of markup rules.

Example, please?

I can only guess you mean that pages take longer to display because the
parser can't rely on end-tags, etc. But this is just a fiction-- the
extra clock cycles wouldn't amount to a hundredth of a second if the
coding were done with any degree of efficiency.

> Its counter-balancing claim to creative
> contribution (dubious at best)

Are you dismissing the entire web here? That used to be fashionable on
alt.hypertext in 1994, but it just doesn't wash anymore.

> isn't impeded by a strict, formal XML
> spec.

It's much, much worse than this-- the strict, formal spec isn't even
going to be seriously considered by 99% of web authors, because it's
burdensome beyond imagining.

> Just like roads would be lethal without strict adherence to rules.

"Careful with that unclosed <li> element... you'll poke your eye out!"

> There's appropriate places for those who need to express themselves
> outside the bounds...just can't intermingle the two. Can't.

"The two" being semantic and stylistic markup? Are you nuts?

> and...
> Perhaps you replied to the earlier question and the inadequate
> threading capabilities of Usenet (a good DTD would take care of that,
> right?)

If you have godlike powers to enforce HF-disasters on the mass of
humanity, then you can achieve all sorts of hypothetical wonders, yes.

> mangled the navigation but did you ever reply to:

It made my jaw drop with its shallowness, so I ignored it.

> > Please provide _one_ single proof of where whitespace can be used to
> > convey informative content.
>
> [vertical space clipped]
>
> >Ok, good example... (quoting your own words here)
> >
> >"I just lost all respect for you"
> >
> >Rearranging your own words with a different whitespace inclusion
> >now conveys a different meaning to you?

Your argument reduces to "emphasis has no semantic content". It's just
unbelievably stupid.

> Since so many folks have contributed time attempting to enlighten you,
> it seems only fair to reply to direct questions that clearly go to the
> meat of your matter.

Nothing is too good for the likes of you...

Chris Maden

unread,

Sep 16, 1998, 3:00:00 AM9/16/98

to

jo...@mcs.com (Jorn Barger) writes:

> Again, as a challenge I propose you look at any random copyright
> page of a book, and formalise the spacing between each line.

O'Reilly's copyright pages are generally prepared in Frame, according
to a template.

The paragraph types used are Title, Author, Copyright, Address,
Editor, ProdEditor, PrintHistory, History, SmallPrint, and Recycle.

The Recycle paragraph (with accompanying graphic above) is placed
flush with the bottom of the printed area.

The Title starts at the top of the printed area; each paragraph has 10
pt of space above, except the History ones (the individual entries in
the printing history list), which have 6 pt.

The SmallPrint has 8 pt above each paragraph, with 26 pt before the
first one (after the printing history).

Spacing isn't always formalized as space above and space below;
sometimes a block is vertically balanced between two other objects.
Frame is not sophisticated enough for a user to be able to codify
these rules in a template, but that doesn't mean that the formal rules
don't exist. Other software, such as TeX, can accept specifications
phrased in that manner.

> It's not an easy or fun task, and it's only useful in a highly
> unproven ivory-tower theory.

It's one of the things we pay our designers to do.

> You just build a 'suppress whitespace' option into your print
> dialog.

But *which* whitespace gets suppressed? You don't want to lose all of
it, only some of it.

> Given that these people actually tried to do it, and learned how

> ivory-tower their ideals were... wouldn't the W3C be well-advised to

> show a little restraint, in trying to impose this on the 300+
> million pages already on the web as a whole?

Most of your posts show a misconception that HTML or XML is
obligatory. HTML was proposed, people found it useful, and it became
widespread. If you want to distribute visual-fidelity documents, use
PDF; it exists, and is also widespread. If you're satisfied with
neither, make up your own markup language and use it. People may not
be able to read documents in it initially, but you could provide
alternative forms. If your proposal is compelling, browser makers
will implement it.

-Chris
--
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>

Jorn Barger

unread,

Sep 16, 1998, 3:00:00 AM9/16/98

to

Chris Maden <cr...@oreilly.com> wrote:
> O'Reilly's copyright pages are generally prepared in Frame, according
> to a template.
> The paragraph types used are Title, Author, Copyright, Address,
> Editor, ProdEditor, PrintHistory, History, SmallPrint, and Recycle.

And how much training was required for the person who established this?

And how often do exceptions arise?

And how flexible is the ordering?

And what happens when one item runs unusually long?

> The Recycle paragraph (with accompanying graphic above) is placed
> flush with the bottom of the printed area.
>
> The Title starts at the top of the printed area; each paragraph has 10
> pt of space above, except the History ones (the individual entries in
> the printing history list), which have 6 pt.
>
> The SmallPrint has 8 pt above each paragraph, with 26 pt before the
> first one (after the printing history).
>
> Spacing isn't always formalized as space above and space below;
> sometimes a block is vertically balanced between two other objects.

And this tweaking is arbitrarily asssociated with either the item above
or the item below.

> Frame is not sophisticated enough for a user to be able to codify
> these rules in a template, but that doesn't mean that the formal rules
> don't exist. Other software, such as TeX, can accept specifications
> phrased in that manner.

Requiring not just a semanticist and a stylist, but also a geometer...

> > It's not an easy or fun task, and it's only useful in a highly
> > unproven ivory-tower theory.
> It's one of the things we pay our designers to do.

So, would you say it's a good idea to tie semantic markup on the Web to
strict XML, given that it requires this level of expertise?

> > You just build a 'suppress whitespace' option into your print
> > dialog.
> But *which* whitespace gets suppressed? You don't want to lose all of
> it, only some of it.

And your solution accomplishes this how, exactly?

> > Given that these people actually tried to do it, and learned how
> > ivory-tower their ideals were... wouldn't the W3C be well-advised to
> > show a little restraint, in trying to impose this on the 300+
> > million pages already on the web as a whole?
>
> Most of your posts show a misconception that HTML or XML is
> obligatory.

On the contrary, I've said several times that all I'm asking is that the
XML community stop taking the view that MS's 'XHTML' compromise is
beneath contempt, and recognise that something along these lines is
absolutely required.

> HTML was proposed, people found it useful, and it became
> widespread. If you want to distribute visual-fidelity documents, use
> PDF; it exists, and is also widespread.

This is one of the standard ciwah-cult responses, and anyone with a
grain of sense knows it's entirely bogus. No serious web publisher uses
this ridiculously awkward solution.

> If you're satisfied with
> neither, make up your own markup language and use it. People may not
> be able to read documents in it initially, but you could provide
> alternative forms. If your proposal is compelling, browser makers
> will implement it.

This is another bogus (and contemptuous) cult reply.

I've made a very simple and appropriate suggestion. It's the XML
community's responses that are profoundly *nuts*.

Christopher B. Browne

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

On Wed, 16 Sep 1998 11:33:26 -0500, Jorn Barger <jo...@mcs.com> posted:

>David Brownell <brow...@ix.netcom.com> refuses my attempts to raise the
>level of discourse, putting an uncalled-for personal attack in the

>subjectline, and writes:
>
>> > Again, as a challenge I propose you look at any random copyright page of
>> > a book, and formalise the spacing between each line.
>>

>> That layout improves presentation; art, more than formality. Next?
>
>But you want page-authors to have to label the meaning of every
>style-choice, right?

Certainly they'll have to label every style choice that they wish to be
significant/distinct. That's pretty much obvious.

At some point the value of the benefits will not exceed the cost; at
that point it surely makes sense to "stop tagging." It is not always
clear where that point will be.

If someone is tagging material from an ancient manuscript, it may, or
may not, make sense to indicate such things as:

- Where the line "broke" in the original. (Obviously important with
poetry; less so with running text.)

- What original page/leaf/bit of parchment/chunk of pottery the text was
on.

What people should tag obviously depends on their intent.

>> Absolutely. The Perseus folk said it's a hard problem, with no simple
>> "one size fits all" answers; you're saying there's one answer, yours.
>
>Considering that I've said over and over that the database jocks are
>welcome to do whatever they want with XML, this seems pretty
>disingenuous.
>
>The 'one answer' I'm asking for is a compromise standard that adds
>freeform semantic-markup experimentation to existing HTML conventions.
>And yet you attack me as if I'm violating someone's freedom???

The problem is that HTML isn't a great markup system for anything other
than web pages. It has been taken a whole lot past what it was intended
to do. It is encouraging that the Web has survived thus far; it makes
sense, when building documents that might not be web pages to use a
form of tagging more closely directed to the particular sort of document
in question. That's not going to be "one size fits all."

It makes little sense to try to hammer everything into a hole that looks
like HTML regardless of how well suited HTML is to receive such hammering.

>> Wouldn't you be well advised to show a little restraint in trying
>> to impose your answer on N-billion pages of content to be authored,
>> generated, revised, enhanced, etc over the next decade?
>
>Would someone else please speak up for the principles of free debate,
>and against this cultish dishonesty?

Free debate involves the possibility of people disagreeing with you.
I'm afraid I don't see what's either cultish or dishonest here.

--
Those who do not understand Unix are condemned to reinvent it, poorly.
-- Henry Spencer <http://www.hex.net/~cbbrowne/lsf.html>
cbbr...@hex.net - "What have you contributed to Linux today?..."

Ken Fox

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

jo...@mcs.com (Jorn Barger) writes:

> David Brownell <brow...@ix.netcom.com> writes:
> >
> > That layout improves presentation; art, more than formality.
>

> But you want page-authors to have to label the meaning of every
> style-choice, right?

What does this have to do with XML vs. HTML? Unless I *totally*
mis-read the XML spec, it describes what an XML DTD is and how an
XML document is permitted to be structured. In other words, XML
is a regular syntax for structuring information. An XML document
is a token stream, which (optionally) conforms to some rules.

I can use XML to develop a PostScript-like-DTD. I can also develop
an HTML-like-DTD and a LaTeX-like-DTD. Both of these types can have
XSL mappings to the PostScript-like-DTD. (Maybe XSL isn't powerful
enough for the job. Is it? It has to be.)

The only thing we need is the ability to unambiguously combine
multiple DTDs together. For instance, I might want to normally
use my LaTeX-like-DTD, but occasionally stick in a PostScript-like
element.

Most web browsers today are able to understand some variant of some
version of HTML. All we need to do is formalize HTML in XML and get
the browsers to move towards that formalization. Then any DTD that
has an XSL mapping to HTML can be read by any browser. This makes
browsers *simpler*. It makes authoring *easier*. It doesn't require
standards bodies to tell me what "semantic" tags I can use.

What am I missing?

The thing I hate about the "just add 'semantic' markup to HTML"
argument is that it means putting in redundant markup. That's too
much work. Computers should be smart enough to translate things for
me. If it screws up too badly, I might want to tweak it in the
right direction.

- Ken

Jorn Barger

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

Ken Fox <kf...@pt0204.pto.ford.com> wrote:
> > But you want page-authors to have to label the meaning of every
> > style-choice, right?

> What does this have to do with XML vs. HTML? [...]

> Most web browsers today are able to understand some variant of some
> version of HTML. All we need to do is formalize HTML in XML and get
> the browsers to move towards that formalization.

Meaning, refuse to display pages that don't conform?

> Then any DTD that
> has an XSL mapping to HTML can be read by any browser. This makes
> browsers *simpler*. It makes authoring *easier*. It doesn't require
> standards bodies to tell me what "semantic" tags I can use.
>
> What am I missing?

I'm tired of restating it. Is there *anyone* reading this who thinks
they can step in here?

> The thing I hate about the "just add 'semantic' markup to HTML"
> argument is that it means putting in redundant markup.

???

My ideal is a simple, resizable styles-oriented tagset, plus
any-old-thing to add semantics where they're wanted.

What I reject is a single rigid syntax for both, that places styles in
the subordinate position.

David Brownell

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

Jorn Barger wrote:
>
> > Wouldn't you be well advised to show a little restraint in trying
> > to impose your answer on N-billion pages of content to be authored,
> > generated, revised, enhanced, etc over the next decade?
>
> Would someone else please speak up for the principles of free debate,
> and against this cultish dishonesty?

Certainly. Many have. It'd really help if you would actually
follow those principles when you espouse them. I'm clearly not
the only one who's noticed that you are not doing so.

You've referred regularly to "cultish" behaviour any time someone
disagrees with you. Ditto dishonesty. I struck a nerve, it seems,
to get accused of both in one sentence. As they say, "point one
finger and three of them are pointing back at you" ... it's rather
dishonest to misquote people as you've regularly done. Cults deny
the validity of other viewpoints, merely because they're different.
Debates relish factual counter-arguments, rather than adopting that
"don't confuse me with facts, I've made up my mind" attitude.

- Dave

Jorn Barger

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

David Brownell <brow...@ix.netcom.com> wrote:
> it's rather
> dishonest to misquote people as you've regularly done.

Example?

> Cults deny
> the validity of other viewpoints, merely because they're different.
> Debates relish factual counter-arguments, rather than adopting that
> "don't confuse me with facts, I've made up my mind" attitude.

And yet instead of demonstrating that you understand my arguments, you
eliminate them all with zero argument, as being vacuous, while I've
repeatedly proved I understand yours by restating them.

So what's the way out of this deadlock?

Jorn Barger

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

Christopher B. Browne <cbbr...@news.brownes.org> wrote:
> [...] it makes

> sense, when building documents that might not be web pages to use a
> form of tagging more closely directed to the particular sort of document
> in question. That's not going to be "one size fits all."

Fine. This is what I refer to as 'database jocks', and it's entirely
irrelevant to my request for simple semantic tags that work with
currrent HTML, without the enormous cognitive overhead of DTDs.

> It makes little sense to try to hammer everything into a hole that looks
> like HTML regardless of how well suited HTML is to receive such hammering.

This argument is just bizarre, given that I'm asking for a system that's
compatible with the 300+ million pages already out there, and leaving
the database jocks free to do whatever they like, while you're asking
that all 300M be re-tidied to fit your spec.

Ken Fox

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

jo...@mcs.com (Jorn Barger) writes:
> Ken Fox <kf...@pt0204.pto.ford.com> wrote:
> > > But you want page-authors to have to label the meaning of every
> > > style-choice, right?
> > What does this have to do with XML vs. HTML? [...]
> > Most web browsers today are able to understand some variant of some
> > version of HTML. All we need to do is formalize HTML in XML and get
> > the browsers to move towards that formalization.
>
> Meaning, refuse to display pages that don't conform?

No, of course not. That would break the policy of "lenient reading,
strict writing" that has ruled the Internet for decades. Any
implementation of XML is fundamentally broken if it can't gracefully
handle non-conformant documents.

> I'm tired of restating it. Is there *anyone* reading this who thinks
> they can step in here?

1. You want to preserve the existing body of HTML documents.

2. You want to maintain (or improve) your abilility to control
page layout.

3. You want a painless way of adding "semantics" to existing
documents. (You haven't said much at all about how you want to
write new documents.)

Am I missing anything?

Versioning gives you the first requirement. Introducing something new
won't bust any of the existing documents. That's a browser detail.

XML gives you the last two -- assuming you use a DTD that allows it. If
you want a bondage-and-discipline DTD that doesn't enable page layout
tweaks then go for it. If you want something more flexible, just write
the DTD that does what you want. XSL can transform back to the browser's
DTD. (If you can't write a DTD then argue for a DTD that does what you
want instead of arguing against XML.)

The key thing to argue for is a *good* implementation of XSL in
every browser. That's what I meant by getting browsers to move towards
the XML standard. XML in a browser without any way of transforming
it is nearly useless.

> My ideal is a simple, resizable styles-oriented tagset, plus
> any-old-thing to add semantics where they're wanted.
>
> What I reject is a single rigid syntax for both, that places styles in
> the subordinate position.

I didn't read about that limitation in the XML spec.

In fact, the XML spec allows you to do *exactly* what you want. We'll
probably end up with a near-identical HTML DTD that browsers can render.
You can come up with a DTD that extends that with your any-old-thing
tags. XSL can be used to map the any-old-thing tags into HTML.

What makes you think HTML in XML is going to be any more rigid than
the HTML we have now? I *hope* that the next HTML will be *more*
flexible on page layout than the current HTML. After all, if people
want "semantics", they don't have to use HTML. On the other hand,
if people want page layout, they're restricted to what HTML (and the
browser) is able to do.

Ken Fox

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

jo...@mcs.com (Jorn Barger) writes:
> ... my request for simple semantic tags that work with

> currrent HTML, without the enormous cognitive overhead of DTDs.

Hmm. I guess we know why you don't like XML now. It does exactly
what you need, but you can't figure out how it works. Is it easier
to argue against it than to learn it?

I expect very few people will need to learn how to create a DTD or an
XSL transformation. Word processors will allow people to create new
DTDs easily -- they're called templates and wizards.

Chris Maden

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

jo...@mcs.com (Jorn Barger) writes:

> Fine. This is what I refer to as 'database jocks', and it's

> entirely irrelevant to my request for simple semantic tags that work

> with currrent HTML, without the enormous cognitive overhead of DTDs.

I missed the degeneration of this thread, but this is an entirely
reasonable suggestion, and one that's been in mind of the designers of
XML from the beginning. This is one reason why XML does not require a
DTD.

There is a potential problem using XML syntax for HTML pages, but most
browsers seem to be able to deal with it if you're careful, as
described in the XML FAQ (<URL:http://www.ucc.ie/xml/>).

Jorn Barger

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

Chris Maden <cr...@oreilly.com> wrote:
> I missed the degeneration of this thread,

It cain't DEgenerate when it ain't never generated, yet...

> but this is an entirely reasonable suggestion,

Thanks. So far, you're unique...

> and one that's been in mind of the designers of
> XML from the beginning. This is one reason why XML does not require a
> DTD.

I understand that attempts were made in this direction. I'd like to see
them go a fractional step farther, so that no 'tidying' is required, and
freeform semantic tagging is encouraged.

> There is a potential problem using XML syntax for HTML pages, but most
> browsers seem to be able to deal with it if you're careful, as
> described in the XML FAQ (<URL:http://www.ucc.ie/xml/>).

This is fine, but it's the flip side of my request. I want
old-fashioned HTML documents with arbitrary semantic tags allowed. As I
understand it, MSIE does/will support this, but the XML community
considers this a bastardisation of the pure ideal, and refuses to
acknowledge that it's good HF-sense.

Jorn Barger

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

Christopher B. Browne <cbbr...@news.brownes.org> wrote:

> I never used the word database, so I think it's rather inappropriate for
> you to assume anything about databases.

My basic point is that XML is a database language, and that our
understanding of texts is still decades away from accurately reducing
text to database. So markup should be predominately style-oriented,
with the option of freeform semantic experimentation.

> My point would be twofold:
> a) If what we want to do doesn't fit with the specific document structure
> implied by HTML, then it is appropriate to look at a *DIFFERENT* document
> structure, and therefore a different DTD.

I have no idea what you're talking about here. How does HTML imply a
specific document structure?

> b) If we take your approach, adding additional "semantic tags" to HTML,
> this gets us quickly to having an enormous cognitive overhead much as
> you suggest is the case of custom DTDs.

How do you figure?

There may be an infinite number of possible semantic tags, but people
only have to learn the two or three that matter to their current
document.

> We can see this in HTML *today* in that people put complex table
> structures into HTML documents in order to get particular presentation
> results.

So, your previous statement can be reduced from:

additional "semantic tags" lead to enormous cognitive overhead

to

no additional tags also result in enormous cognitive overhead

?

> The authors might be better served by using DTDs tuned to their particular
> requirements; there is a certain amount of dilemna here. Choose:
> a) Having wacky looking HTML structures that have to be maintained, or
> b) Having a custom DTD with that "cognitive overhead".
> You can't have complete simplicity; you must choose one or the other.

With TABLEs being the wacky-looking structures?

And with XML somehow making the same tables look less wacky?

> I never said that existing web pages would have to be reauthored or
> otherwise "re-tidied."

Great, we're almost home, then. Can I add semantic tags to my existing
HTML, without tidying?

> There is certainly material out there that *does* fit well with HTML as
> it exists now.

???

> It may prove worthwhile to "tidy" things into HTML 4.0,
> which provides some tools to separate structure and style, which makes
> traditional web pages a little more easily maintained.

Oops. Style-macros and structures keep getting conflated, by your team.

Macros make maintenance slightly easier. This has nothing to do with
structures.

> Be that as it
> may, it would be *incorrect* to suggest that I'm trying to imply that
> "everything should be rewritten using XML."

Well, almost all the members of your 'team' do seem to be implying that.

> It's not the "average web page" that would benefit from XML. It is
> the:
> - Store catalog

Database.

> - Web search engine

??? You mean pages-indexed-by-search-engine? That's all of them.

> - Personal finance application

Database.

> - General purpose document processing system

Uhn-uhn.

> - ...
> that would benefit from the use of XML.
>
> The things that are proposed to be *replaced* by the use of XML are
> those applications that have been working poorly when implemented using
> HTML.

Well, page-layout in HTML is certainly a disaster.

Adding structures is a perfect NON-solution, though.

> These are things that presently work only moderately well, require
> specific attention (e.g. - authors *need* to go through that "cognitive
> overhead"), and keep getting reimplemented because they're really not good
> enough now.

The goal should be what I've called the 'liquid page', meaning that a
page-designer lays things out in terms of relative proportions, and the
rendering-engine knows how to scale them to any size or shape of window.

Structures really don't come into it.

Jorn Barger

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

Ken Fox <kf...@pt0204.pto.ford.com> wrote:
> > Meaning, refuse to display pages that don't conform?
> No, of course not. That would break the policy of "lenient reading,
> strict writing" that has ruled the Internet for decades. Any
> implementation of XML is fundamentally broken if it can't gracefully
> handle non-conformant documents.

Good. Please step in and re-assert this every time one of my opponents
contradicts it.

> > I'm tired of restating it. Is there *anyone* reading this who thinks
> > they can step in here?
> 1. You want to preserve the existing body of HTML documents.

> 2. You want to maintain (or improve) your ability to control

> page layout.
> 3. You want a painless way of adding "semantics" to existing
> documents. (You haven't said much at all about how you want to
> write new documents.)
> Am I missing anything?

In terms of goals, maybe not. In terms of arguments, there's still the
whole anti-layout mindset that needs to be laid to rest.

> Versioning gives you the first requirement. Introducing something new
> won't bust any of the existing documents. That's a browser detail.

Ok-a-y...

> XML gives you the last two -- assuming you use a DTD that allows it. If
> you want a bondage-and-discipline DTD that doesn't enable page layout
> tweaks then go for it. If you want something more flexible, just write
> the DTD that does what you want.

What happened to 'painless'?

> XSL can transform back to the browser's DTD.

I don't understand this.

> (If you can't write a DTD then argue for a DTD that does what you
> want instead of arguing against XML.)

This either.

How can this not involve pre-analysis of what tags will be used, in what
order? How can it not involve re-writing the DTD each time new needs
arise?

> The key thing to argue for is a *good* implementation of XSL in
> every browser. That's what I meant by getting browsers to move towards
> the XML standard. XML in a browser without any way of transforming
> it is nearly useless.

By this point, you seem to be waving your hand and saying, "XSL is the
magic bullet".

Here's another challenge: if you take the styles on the stylesheet, and
'promote' them back onto the main text page, in place of the
'structures', is this an improvement on HTML? How?

[...]

> In fact, the XML spec allows you to do *exactly* what you want.

No, this can't be accurate.

> We'll
> probably end up with a near-identical HTML DTD that browsers can render.

'Near' meaning 'with a little tidying'?

> You can come up with a DTD that extends that with your any-old-thing
> tags.

I don't call this painless.

> XSL can be used to map the any-old-thing tags into HTML.

I've been violently rejecting the idea that semantics need styles at
all.

> What makes you think HTML in XML is going to be any more rigid than
> the HTML we have now?

Every source says end-tags, etc need tidying.

> I *hope* that the next HTML will be *more*
> flexible on page layout than the current HTML.

I don't doubt that it will, but if it's based on a confusion of
structures with style-macros, then it's going to be unnecessarily rigid
in the authoring.

> After all, if people
> want "semantics", they don't have to use HTML.

??? What have I been saying? We have 300+ million pages of HTML.
Adding some simple semantic tags to these ought to allow improved
search-engine functions. So, yes, this would have to use HTML, to be
realistically implemented in any foreseeable future.

> On the other hand,
> if people want page layout, they're restricted to what HTML (and the
> browser) is able to do.

???

David Brownell

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

Ken Fox wrote:
>
> Any
> implementation of XML is fundamentally broken if it can't gracefully
> handle non-conformant documents.

If by "non-conformant" you mean "doesn't match some DTD", then the
only real issue is what you mean by "gracefully handle". (Perhaps
just treat it like the relevant tags or attributes aren't there.)
Technically, that's the business of the application, not of the XML
implementation it uses.

But if what you mean by "non conformant" is "not well formed" ...
then this statement is false. The XML spec is quite explicit about
how processors (e.g. in browsers) must handle documents that aren't
"well formed": Stop processing, it's a "fatal error".

That error handling was requested by major browser vendors (I'm told)
to get out of the "my error handling is different/better/..." race,
which causes lots of broken HTML to exist. All browser vendors must
waste a lot of effort tracking other vendors' "graceful" strategies
for processing broken HTML; we'd be better off with CSS 1 support!

- Dave

Jorn Barger

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

David Brownell <brow...@ix.netcom.com> wrote:
> ... The XML spec is quite explicit about

> how processors (e.g. in browsers) must handle documents that aren't
> "well formed": Stop processing, it's a "fatal error".
>
> That error handling was requested by major browser vendors (I'm told)
> to get out of the "my error handling is different/better/..." race,
> which causes lots of broken HTML to exist. All browser vendors must
> waste a lot of effort tracking other vendors' "graceful" strategies
> for processing broken HTML; we'd be better off with CSS 1 support!

This sort of background info is incredibly important for understanding
what's going on ...and the W3C is scandalously remiss in communicating
it, imho.

So the idea is that after some transition period, there will be a
browser-release that refuses to display a fairly large number of pages,
mostly older ones or stupider ones.

And users will tolerate that because they want the other advantages,
possibly including faster rendering...?

So what I've been calling an HF-disaster is not being stumbled toward,
blindly, but in fact is being imposed, quietly, by fiat-- you have until
Day X to fix all the pages on your site, or they stop working for a
growing percentage of surfers...?

It seems like there are a lot of other options, that aren't quite that
radical, like ignoring all the markup, and just displaying the text...

Or just ignoring the 'broken' markup, and doing your best with the
rest...

But what exactly are these markup-complications? Surely it's not just
the end-tags, etc, that XML wants tidied???

Christopher B. Browne

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

On Thu, 17 Sep 1998 11:51:18 -0500, Jorn Barger <jo...@mcs.com> posted:

>Christopher B. Browne <cbbr...@news.brownes.org> wrote:

>> [...] it makes
>> sense, when building documents that might not be web pages to use a
>> form of tagging more closely directed to the particular sort of document
>> in question. That's not going to be "one size fits all."
>

>Fine. This is what I refer to as 'database jocks', and it's entirely
>irrelevant to my request for simple semantic tags that work with
>currrent HTML, without the enormous cognitive overhead of DTDs.

I never used the word database, so I think it's rather inappropriate for

you to assume anything about databases.

My point would be twofold:

a) If what we want to do doesn't fit with the specific document structure
implied by HTML, then it is appropriate to look at a *DIFFERENT* document
structure, and therefore a different DTD.

b) If we take your approach, adding additional "semantic tags" to HTML,

this gets us quickly to having an enormous cognitive overhead much as
you suggest is the case of custom DTDs.

We can see this in HTML *today* in that people put complex table

structures into HTML documents in order to get particular presentation
results.

The authors might be better served by using DTDs tuned to their particular

requirements; there is a certain amount of dilemna here. Choose:
a) Having wacky looking HTML structures that have to be maintained, or
b) Having a custom DTD with that "cognitive overhead".
You can't have complete simplicity; you must choose one or the other.

>> It makes little sense to try to hammer everything into a hole that looks

>> like HTML regardless of how well suited HTML is to receive such hammering.
>
>This argument is just bizarre, given that I'm asking for a system that's
>compatible with the 300+ million pages already out there, and leaving
>the database jocks free to do whatever they like, while you're asking
>that all 300M be re-tidied to fit your spec.

I never said that existing web pages would have to be reauthored or
otherwise "re-tidied."

There is certainly material out there that *does* fit well with HTML as
it exists now. It may prove worthwhile to "tidy" things into HTML 4.0,

which provides some tools to separate structure and style, which makes

traditional web pages a little more easily maintained. Be that as it

may, it would be *incorrect* to suggest that I'm trying to imply that
"everything should be rewritten using XML."

It's not the "average web page" that would benefit from XML. It is
the:
- Store catalog
- Web search engine
- Personal finance application

- General purpose document processing system

- ...
that would benefit from the use of XML.

The things that are proposed to be *replaced* by the use of XML are
those applications that have been working poorly when implemented using

HTML. These are things that presently work only moderately well, require

specific attention (e.g. - authors *need* to go through that "cognitive
overhead"), and keep getting reimplemented because they're really not good
enough now.

Toby Speight

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

Jorn> Jorn Barger <URL:mailto:jo...@mcs.com>

[not sure whether this is relevant to c.i.s anymore, but keeping it]

0> In <URL:news:1dfix26.zf...@jorn.pr.mcs.net>, Jorn wrote:

Jorn> David Brownell <brow...@ix.netcom.com> wrote:

>> ... The XML spec is quite explicit about how processors (e.g. in
>> browsers) must handle documents that aren't "well formed": Stop
>> processing, it's a "fatal error".
>>
>> That error handling was requested by major browser vendors (I'm told)
>> to get out of the "my error handling is different/better/..." race,
>> which causes lots of broken HTML to exist. All browser vendors must
>> waste a lot of effort tracking other vendors' "graceful" strategies
>> for processing broken HTML; we'd be better off with CSS 1 support!

Jorn> So the idea is that after some transition period, there will be
Jorn> a browser-release that refuses to display a fairly large number
Jorn> of pages, mostly older ones or stupider ones.

No - the idea is that all XML is well-formed from the start, so there
need not be any "transition period" (because there's no transition).

HTML, of course, won't have a transition period either - I don't see
browser vendors dropping support for HTML (or their own pseudo-HTML
variants) for a long time, if ever. There are too many legacy documents
out there. I'm sure HTML will continue to be written, too - by those
who don't need the cleverer processing you can do with XML, or by those
using pHTML spewers like MS Office.

I do see a shift among authors to those XML DTDs that match their
domain requirements, whether alongside HTML or instead of it (most of
my XML documents are at some point rendered to HTML, in addition to
PostScript and formatted text). And a HTML-like XML document-type is
a distinct possibility, if DocBook etc. doesn't appeal. But that
doesn't mean that HTML itself will go away. HTML is not XML.

Steve Hueners

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

>> On the other hand, the web has become a dangerous timesink _because of
>> the lack of markup rules.
>
>Example, please?

Using font size attributes instead of heading tags. Gets in the way of
intelligently representing pages for search engines resulting in lost
time at lots of quarters.

>
>Are you dismissing the entire web here? That used to be fashionable on
>alt.hypertext in 1994, but it just doesn't wash anymore.

Good lord, No! But remember how in '94 we had all those lofty ideas
about hypertexted novels? Seen any? that are any good? The creativity
we see on the net is, for the most part, _not coming from the artists.
Likely the vocabulary hasn't been conventionalized but you need to
look to nerds 'n geeks to find true creativity these days.

Which seems to be the crux of the bee in your bonnet....the language
being defined by geeks 'n nerds. But their efforts to date stand them
in good stead. They originally defined HTML to be much of what XML
intends to become: structure of content not tied to display of
content. The artsies and the suits bastardized it beyond recognition
and so we are saddled with a half assed solution.

This is basic stuff....why would we want to allow the same to happen
again? Sure, XML will be harder to code. Wonderful! Ease of use does
not promote quality. Natch you'll cry elitist....<shrug>. Are
newspapers worried about how "easy" it is for reporters to get stories
into print?

>
>> isn't impeded by a strict, formal XML
>> spec.
>
>It's much, much worse than this-- the strict, formal spec isn't even
>going to be seriously considered by 99% of web authors, because it's
>burdensome beyond imagining.
>

Fine. If the 99% want to produce HTML let 'em. But I think we both
know that tools will arrive to make things far easier than you imagine
them to be.

>> > Please provide _one_ single proof of where whitespace can be used to
>> > convey informative content.
>>
>> [vertical space clipped]
>>
>> >Ok, good example... (quoting your own words here)
>> >
>> >"I just lost all respect for you"
>> >
>> >Rearranging your own words with a different whitespace inclusion
>> >now conveys a different meaning to you?
>
>Your argument reduces to "emphasis has no semantic content". It's just
>unbelievably stupid.
>

1) Not _my argument -though it's a good one...and is at the heart of
your matter....you'd think you'd jump at the chance to make yourself
clear; 2) Glad to see you are doing your part to elevate this
"debate".

Jorn Barger

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

Steve Hueners <st...@juststeve.com> wrote:
> >> On the other hand, the web has become a dangerous timesink _because of
> >> the lack of markup rules.
> >Example, please?
>
> Using font size attributes instead of heading tags. Gets in the way of
> intelligently representing pages for search engines resulting in lost
> time at lots of quarters.

Sorry, this is folklore. Search engines can use fontsize just as
effectively.

> >Are you dismissing the entire web here? That used to be fashionable on
> >alt.hypertext in 1994, but it just doesn't wash anymore.
>
> Good lord, No! But remember how in '94 we had all those lofty ideas
> about hypertexted novels? Seen any? that are any good? The creativity
> we see on the net is, for the most part, _not coming from the artists.
> Likely the vocabulary hasn't been conventionalized but you need to
> look to nerds 'n geeks to find true creativity these days.

My counter-example: unreadable sites like W3C.

Even Nielsen's UseIt could use a dose of art.

> Which seems to be the crux of the bee in your bonnet....the language
> being defined by geeks 'n nerds.

More specifically, the human factors being constrained by HF-aphasiacs
(eg the anti-whitespace insanity, eg <P><P> reducing to <P>)

> But their efforts to date stand them
> in good stead.

Heh.

> They originally defined HTML to be much of what XML
> intends to become: structure of content not tied to display of
> content. The artsies and the suits bastardized it beyond recognition
> and so we are saddled with a half assed solution.

Since you've effectively returned this argument to ground zero, and I'm
exceedingly tired of restating my arguments for an audience that's not
really interested in listening, I think I'll create a webpage that I can
refer to in future. It will be up in a few hours at:

<URL:http://www.mcs.net/~jorn/html/net/structure.html>

> This is basic stuff....why would we want to allow the same to happen
> again?

How is a slight modification to the existing rules 'again'? It's
'still'.

> Sure, XML will be harder to code. Wonderful! Ease of use does
> not promote quality. Natch you'll cry elitist....<shrug>. Are
> newspapers worried about how "easy" it is for reporters to get stories
> into print?

This is what I call HF-aphasia.

[...]

> Fine. If the 99% want to produce HTML let 'em. But I think we both
> know that tools will arrive to make things far easier than you imagine
> them to be.

This is another HF-miscue. Most HTML is maintained in text editors, at
least part of the time.

[...]

> >Your argument reduces to "emphasis has no semantic content". It's just
> >unbelievably stupid.
>
> 1) Not _my argument

It appears under your From: line

> -though it's a good one...

So you truly assert emphasis is semantically irrelevant?

> and is at the heart of
> your matter....you'd think you'd jump at the chance to make yourself
> clear;

???

> 2) Glad to see you are doing your part to elevate this
> "debate".

When people are thickheaded, it's fair play to scream at them to try and
get their attention. What distinguishes upward-tending screaming from
downward-tending is whether it's accompanied by sincere attempts to
paraphrase one's understanding of the other's views, and to clarify
one's own views.

By these measures, I'm ~infinitely ahead in this race.

David Brownell

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

Jorn Barger wrote:
>
> David Brownell <brow...@ix.netcom.com> wrote:
> > ... The XML spec is quite explicit about
> > how processors (e.g. in browsers) must handle documents that aren't
> > "well formed": Stop processing, it's a "fatal error".
> >
> > That error handling was requested by major browser vendors (I'm told)
> > to get out of the "my error handling is different/better/..." race,
> > which causes lots of broken HTML to exist. All browser vendors must
> > waste a lot of effort tracking other vendors' "graceful" strategies
> > for processing broken HTML; we'd be better off with CSS 1 support!
>

> So the idea is that after some transition period, there will be a
> browser-release that refuses to display a fairly large number of pages,

> mostly older ones or stupider ones.

No! Because XML != HTML (as has been pointed out before) no transition
will be required.

The existing pages are HTML; they will continue to "work" as they do
today, lots of browser differences, etc. XML is defined for new pages;
see the spec, it's defined as a format for full documents.

Lots of people are working with XML pages that happen to use the HTML
tags and attributes. It's a good place to start, but currently demands
postprocessing to turn stuff like "<br/>" into "<br />". Developer
tools
like DOM make it _really_ easy to generate well formed XML.

> It seems like there are a lot of other options, that aren't quite that
> radical, like ignoring all the markup, and just displaying the text...

You're talking about rendering, such as with XSL, not parsing.
The default rule in XSL strips out all unknown markup ... but you
can't get input _into_ a conformant XSL processor unless it's at
least "well-formed" to start with.

That shows another penalty of the HTML error handling "race", since
it affects a lot more than the browsers -- it affects every tool that
works with HTML. It's contagious, increasing costs all over.

> But what exactly are these markup-complications? Surely it's not just
> the end-tags, etc, that XML wants tidied???

XML isn't there to change HTML; it doesn't demand anything be "tidied".

But if you want examples of flakey markup in HTML: you've seen examples
such as "<EM> <B> Bold </EM> yet confused</B>". Extrapolate that to
nested
tables and lists. And then mix all three, several layers deep. There's
all sorts of other stuff too; look at the Mozilla source and compare to
the code that just handles legal HTML (which is far simpler). Then
remember
that XML is even simpler to parse than HTML ...

There's also "invalid" HTML of course -- "tag soup" using nonstandard
tags, attributes, or entities; and stuff that violates content models
for their elements -- but at least XML formalizes that notion.

- Dave

David Brownell

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

Jorn Barger wrote:

>
> Steve Hueners <st...@juststeve.com> wrote:
> > Which seems to be the crux of the bee in your bonnet....the language
> > being defined by geeks 'n nerds.
>
> More specifically, the human factors being constrained by HF-aphasiacs
> (eg the anti-whitespace insanity, eg <P><P> reducing to <P>)

More like, HF (== "Human Factors", we must assume) is addressing
different issues than the mapping of markup ("<P>") to rendering.
(You needn't descend to saying people are aphasic because they
have different priorities than you!!)

Keep in mind that HTML by definition doesn't address rendering,
so any complaints about treatment of whitespace relate to some
browser(s) not to HTML.

Also: in the big picture, lots (!) more web pages are generated by
programs (often running off databases ... ;-) than by authors with
text editors. It can be awkward to make those programs never
generate empty paragraphs, so much of the motivation for rendering
"<P><P>" as beginning one paragraph (not two) is what most folk
see as the primary HF issue: making the rendered document look
good so it can be more easily comprehended.

- Dave

Jorn Barger

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

David Brownell <brow...@ix.netcom.com> wrote:
[...]

> (You needn't descend to saying people are aphasic because they
> have different priorities than you!!)

It's not nearly that innocent.

I've taken endless grief for my 'priorities' from a gang who consider my
priorities a sign of spiritual and intellectual inferiority.

And the 'arguments' they use continually betray an absolutely
disgraceful blindness to absolutely fundamental principles of
human-factors design, with their contempt for whitespace being an
obvious example.

> Keep in mind that HTML by definition doesn't address rendering,
> so any complaints about treatment of whitespace relate to some
> browser(s) not to HTML.

It's this agnosticism that's the problem, but I'm sure the <P><P> -> <P>
fiasco goes back directly to TimBL's theoretical model.

> Also: in the big picture, lots (!) more web pages are generated by
> programs (often running off databases ... ;-) than by authors with
> text editors. It can be awkward to make those programs never
> generate empty paragraphs, so much of the motivation for rendering
> "<P><P>" as beginning one paragraph (not two) is what most folk
> see as the primary HF issue: making the rendered document look
> good so it can be more easily comprehended.

No, sorry, this is *extremely* bogus.

PP is collapsed because there's a theory that an empty paragraph is
meaningless, and because the powers-that-be/were think/thought that
whitespace was wasteful.

A code-generator can collapse PP if it wants to, much more easily and
effectively than the renderer.

Ken Fox

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

jo...@mcs.com (Jorn Barger) writes:
> Ken Fox <kf...@pt0204.pto.ford.com> wrote:

> > XML gives you the last two -- assuming you use a DTD that allows it.
> > If you want a bondage-and-discipline DTD that doesn't enable page
> > layout tweaks then go for it. If you want something more flexible,
> > just write the DTD that does what you want.
>
> What happened to 'painless'?

"It takes a tough man to make a tender chicken." ;)

Seriously, DTDs must become painless to create if XML is ever to become
useful for additional "semantic" content. How will the browsers ever
know what to do with "semantic" content if they can't refer to a DTD?

> > XSL can transform back to the browser's DTD.
>
> I don't understand this.

What part don't you understand? XSL includes a transformation system.
This means browsers will be able to structurally transform document A
written in DTD B into HTML. The browser only has to worry about
rendering HTML. Authors can create documents using any DTD that has
a transformation to HTML.

I don't know your background so I don't know how much to explain.

> > (If you can't write a DTD then argue for a DTD that does what you
> > want instead of arguing against XML.)
>
> This either.

All you need to do to achieve your goals is write an HTML-like XML
DTD with an XSL transformation that deletes the non-renderable tags.

For those of us who would like to have the browser use those
non-renderable tags, we'll provide transformations that produce a
new stream of renderable tags.

> > The key thing to argue for is a *good* implementation of XSL in
> > every browser. That's what I meant by getting browsers to move towards
> > the XML standard. XML in a browser without any way of transforming
> > it is nearly useless.
>
> By this point, you seem to be waving your hand and saying, "XSL is the
> magic bullet".

I suppose it does look that way to somebody who doesn't know what a
transformation means.

I don't know how *good* XSL is. All I know is that the concept of XML +
XSL gives you *exactly* what you want. It also gives *me* exactly what
I want. XSL might be weak and/or cumbersome. I would have preferred a
term-rewriting approach rather than a procedural one. It might work out
ok anyway.

> Here's another challenge: if you take the styles on the stylesheet, and
> 'promote' them back onto the main text page, in place of the
> 'structures', is this an improvement on HTML? How?

Conventional "styles" are useful, but *extremely* limited because they
don't allow structural transformations of a document. In most cases,
"semantic" content is going to require structural transformation before
it can be rendered by a browser. Even in your "just ignore them"
proposal, the act of dropping the tags is a structural transformation.
(A trivial one I admit.) XSL is the piece of the puzzle that does this.

Now the browser is one form of consumer -- a very special form. Its
main concern is just getting the content on the screen. It will just
run the transformations to convert a document to HTML.

Other kinds of consumers may use the document as-is. For example, a
search engine may be able to use the non-renderable content to build a
better database. A "table of contents generator" might be able to
differentiate between chapters and sections. There are an infinite
variety of uses.

> > In fact, the XML spec allows you to do *exactly* what you want.
>
> No, this can't be accurate.

Why not?

> > We'll probably end up with a near-identical HTML DTD that browsers
> > can render.
>
> 'Near' meaning 'with a little tidying'?

HTML has a huge amount of inertia. I just can't see HTML in XML as
being totally different. I wish it were. All of those stupid
"semantic" tags could be dropped. Things like "STRONG" and "H1" are
stupid to put in a language designed to be *rendered* in a browser. Now
that we have XML + XSL we don't have to make those compromises anymore.

HTML could become just a dynamic page layout language with hyper
references. All the "semantic" content can be put into DTDs that have
XSL transformations into HTML. I doubt that this will happen -- but it
could.

The beauty of this system is you can create documents with "semantics"
appropriate for the task. If you're an advertising company, maybe you
won't use these higher-level DTDs at all. Maybe you're totally consumed
with page layout and the higher-level DTDs aren't useful. On the other
hand, if you're a military procurement agency then you'll ignore the
page layout and *only* use a higher-level DTD designed for the job.

Doesn't matter to the browsers. They use XSL to transform everything
into the only DTD (HTML) they know how to render.

> > You can come up with a DTD that extends that with your any-old-thing
> > tags.
>
> I don't call this painless.

I admit this is painful right now. That has to change. Most people are
already creating DTDs -- we call them templates and wizards. It doesn't
seem like a big leap to have the word processors generate a DTD that can
be read by a browser.

> > XSL can be used to map the any-old-thing tags into HTML.
>
> I've been violently rejecting the idea that semantics need styles at
> all.

I know. ;) That's fine for you. For me, I want the "semantics" to
transform into something renderable. Have you read the XSL spec?
I think you might be thinking "fonts and colors" when you hear XSL.
It's much more than that.

> > What makes you think HTML in XML is going to be any more rigid than
> > the HTML we have now?
>
> Every source says end-tags, etc need tidying.

Sure. How many of those people are implementors? Standards say crazy
things. For example, it is common for a standard to say the result of
an error is undefined. Fine. An implementation has license to reformat
your hard drive in response to an error. Nobody would build such a
thing though because nobody would use it.

Standards are not supposed to be user friendly. They are supposed to be
useful, precise and unambiguous. An *implementation* of a standard is
*hopefully* user friendly.

Another way of looking at this is good-cop/bad-cop. Standards creators
are bad-cops. They say scary things. They threaten. They cajole.
They make dire predictions of what happens if you don't behave.

The implementors are good-cops. They let you get away with little
stuff that's not important. They offer gifts. They sympathize. The
reason they can is because you're so terrified of the bad-cop that you
don't ask for much.

> > After all, if people want "semantics", they don't have to use HTML.
>
> ??? What have I been saying? We have 300+ million pages of HTML.
> Adding some simple semantic tags to these ought to allow improved

> search-engine functions. ...

Sure. That has nothing to do with XML. That's a legacy problem with
simple "solutions" right now. (Browsers ignore tags they don't
understand so just stick in tags that search engines tell you to. This
won't work in practice of course because people will abuse it.)

What about the future? In a world with XML + XSL, people who want
"semantics" shouldn't start with HTML. That would be like starting with
PostScript and then complaining that it's hard to get your section
headings to auto-number. Authors should start with something that has
the "semantic" tags they need. Hopefully the word processors they use
can create new DTDs by incrementally expanding the HTML DTD.

> > On the other hand, if people want page layout, they're restricted
> > to what HTML (and the browser) is able to do.
>
> ???

Browsers render HTML. If you want the browser to do something, you have
to speak its language. No amount of transformation is going to change
that so transformations can not be used to change rendering.

Jorn Barger

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

Ken Fox <kf...@pt0204.pto.ford.com> wrote:
> [...] How will the browsers ever

> know what to do with "semantic" content if they can't refer to a DTD?

I don't *want* the browser to do anything with it. I want the search
engine to do something with it. I want the browser to ignore it.

> > > XSL can transform back to the browser's DTD.
> > I don't understand this.
>
> What part don't you understand? XSL includes a transformation system.
> This means browsers will be able to structurally transform document A
> written in DTD B into HTML. The browser only has to worry about
> rendering HTML. Authors can create documents using any DTD that has
> a transformation to HTML.

So when you say "the browser's DTD" you really mean "ordinary HTML
viewed thru the convoluted perspective of a pseudo-DTD"?

> All you need to do to achieve your goals is write an HTML-like XML
> DTD with an XSL transformation that deletes the non-renderable tags.

PLUS tidy all current markup... yes?

> > By this point, you seem to be waving your hand and saying, "XSL is the
> > magic bullet".
> I suppose it does look that way to somebody who doesn't know what a
> transformation means.

"transformation" is jargon. The concept is trivial.

> I don't know how *good* XSL is. All I know is that the concept of XML +
> XSL gives you *exactly* what you want.

Now I *know* you're waving your hand, because you clearly don't know
what I want.

> Conventional "styles" are useful, but *extremely* limited because they
> don't allow structural transformations of a document.

Why can't you transform styles?

I've just been writing a webpage on this fallacy, and related ones:

<URL:http://www.mcs.net/~jorn/html/net/structure.html>

[...]

> > I've been violently rejecting the idea that semantics need styles at
> > all.
> I know. ;) That's fine for you. For me, I want the "semantics" to
> transform into something renderable. Have you read the XSL spec?
> I think you might be thinking "fonts and colors" when you hear XSL.
> It's much more than that.

Examples?

My problem is that there *is* a ned for fonts-n-colors markup, and the
XML-crowd takes it as axiomatic that this should be crammed into the
procrustean bed of structural markup.

> > ??? What have I been saying? We have 300+ million pages of HTML.
> > Adding some simple semantic tags to these ought to allow improved
> > search-engine functions. ...
>
> Sure. That has nothing to do with XML. That's a legacy problem with
> simple "solutions" right now. (Browsers ignore tags they don't
> understand so just stick in tags that search engines tell you to. This
> won't work in practice of course because people will abuse it.)

I _believe_ you've just approved my proposal. I seriously doubt you
realise that that was what I was asking, though, or you wouldn't make
such a fuss about all the rest of it.

> What about the future? In a world with XML + XSL, people who want

> "semantics" shouldn't start with HTML. [...]

Real simple: people who have lots of HTML now, should still be
encouraged to add semantic tags, without requiring other sorts of
tidying/conversion.

It's just hunman factors.

Ken Fox

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

David Brownell <brow...@ix.netcom.com> writes:
> Ken Fox wrote:
> > Any implementation of XML is fundamentally broken if it can't
> > gracefully handle non-conformant documents.
>

> If by "non-conformant" you mean "doesn't match some DTD", then ...

> that's the business of the application, not of the XML implementation
> it uses.

It's the business of the XML implementation too. A validating XML
implementation might do something unfriendly when confronted with
an invalid document.

> But if what you mean by "non conformant" is "not well formed"

> then this statement is false. The XML spec is quite explicit ...

> Stop processing, it's a "fatal error".

Right. Yeah. Whatever. Any implementation that did this would be
nearly useless. This would require that a syntax checker abort after
one structural error. Stupid! It would mean that my 10,000 page
document couldn't be rendered because a 1 byte disk error occured in
a footnote somewhere. Stupid!

I understand the intent, but you've got to realize that the XML
standard doesn't describe the way most XML implementations will work.
It describes the *minimal* way that XML implementations will work.

> That error handling was requested by major browser vendors (I'm told)

> to get out of the "my error handling is different/better/..." race ...

This isn't the way competition works. A standard can not change the
socio-economic climate that a browser vendor competes in.

> which causes lots of broken HTML to exist. All browser vendors must
> waste a lot of effort tracking other vendors' "graceful" strategies
> for processing broken HTML; we'd be better off with CSS 1 support!

Broken HTML is a problem, but not really a big problem. The big
problem happens because authors expect broken HTML to keep on working
no matter what. If a vendor includes an option to allow the browser
to (1) reject broken HTML, (2) warn about broken HTML, or (3) stay
silent about broken HTML, I think it would be a nice solution. Make
option (2) the default. There would be enough negative advertising
pressure put on authors that *really* broken HTML would gradually
get fixed.

These types of problems will remain with us even when using XML.

Iain Lowe

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

Mr. Barger:

I have been reading these threads for a few days now trying to understand
exactly where all of this got started, and exactly what your beef is.

Among the many things that I'm getting (over and over and over) from your
posts is that you are very concerned about some kind of conspiracy of
techno/academic structural fascists rewriting the Web using XML. You
consistently refer to the 300+ million web pages that are out there being
somehow rendered obsolete by these dark forces.

From what I've seen of the Web, there aren't many of those millions of pages
that aren't rewritten, redesigned, or at least tweaked at least every couple
of months. The creators of those pages are sometimes adding content, but I
suspect that most times they're trying out the neat new stuff that's
constantly being developed and pushing the browsers to do more interesting
things. I hardly these HTMLers would be as concerned about adopting slightly
different methods of tagging to, quite possibly, give them even more
flexibility in the display of the information.

I am not an XML guru, but I have been working within a structured text/SGML
environment for a few years now as a typical end-user, and, quite frankly, I
don't understand where most of your railing against academics, 'database
jocks', etc. comes from. If you're so concerned about layout and whitespace
why are you interested in HTML at all? If the structural part of structured
text systems is so irritating, why aren't you just producing your
documents/manifestos/etc. in traditional paper-based software and
distributing it as PDFs?

This has nothing to do with accusations of 'spiritual and intellectual
inferiority', just my absolute mystification about what the point of all
this traffic is.

Jorn Barger wrote (among many other things):

> --
> I EDIT THE NET: <URL:http://www.mcs.net/~jorn/html/weblogs/weblog.html>
> "One of the best collections of news and musings culled from the Web --
> and updated daily." -- Austin Bunn in the Village Voice, 8 Sept 1998

--
---------------------
Iain Lowe
Technical Writer

Infrastructures for Information
http://www.i4i.com

416-504-0141 ext 283

Chris Maden

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

jo...@mcs.com (Jorn Barger) writes:

> Chris Maden <cr...@oreilly.com> wrote:
> > and one that's been in mind of the designers of XML from the
> > beginning. This is one reason why XML does not require a DTD.
>
> I understand that attempts were made in this direction. I'd like to
> see them go a fractional step farther, so that no 'tidying' is
> required, and freeform semantic tagging is encouraged.

You can have *either* schemas (e.g., DTDs) *or* freeform tagging. You
can't have both. The XML WG chose to require strict tagging to make
schemas unnecessary.

If you know from your schema that, for instance, <foo>s don't nest,
then you can have

<foo>foo1
<foo>foo2

and each <foo> signals the end of the previous one. But if you have
no schema, then you have no idea what <foo> is, and

<foo>foo1
<foo>foo2

might mean two <foo>s side-by-side or one <foo> inside another.

HTML has a known structure, so tags can be omitted or implied. HTML
need not change. But the very point of XML is user-chosen tags, and
unless you want to require schemas (like DTDs) for every page, then
you must require rigorous tagging. You must have one or the other.

On the bugward-compatibility issue, an example of the ad-hoc parsing
that Netscape implies is interdigitated tags like <i><b></i></b>.
Netscape appears to maintain a font stack. Any element that changes
the font pushes its formatting on the font stack; an end tag for any
element that changes the font pops the font stack. So you might
expect italic to stop at </i>, but what actually happens is that bold
stops, and the stretch between </i> and </b> is italic. I'm not a big
fan of what was called "draconian error handling", but it was
requested, independently, by both Netscape and Microsoft, so that they
wouldn't have to waste time keeping up with the other's error
handling. Most Web authors validate (and respond to critiques) with
"it works in my browser".

Jorn Barger

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

Iain Lowe <il...@i4i.com> wrote:
[smarmy insults trimmed]

> From what I've seen of the Web, there aren't many of those millions of
> pages that aren't rewritten, redesigned, or at least tweaked at least
> every couple of months.

Wow, your connection must be *real* fast.

> I am not an XML guru,

I guessed that from the way you quoted my entire message at the end of
your post.

> but I have been working within a structured
> text/SGML environment for a few years now as a typical end-user, and,
> quite frankly, I don't understand where most of your railing against
> academics, 'database jocks', etc. comes from.

Then do us all a favor and *watch*.

> If you're so concerned about
> layout and whitespace why are you interested in HTML at all? If the
> structural part of structured text systems is so irritating, why aren't
> you just producing your documents/manifestos/etc. in traditional
> paper-based software and distributing it as PDFs?

Oh, cool idea. Will do.

Don Johnson

unread,

Sep 20, 1998, 3:00:00 AM9/20/98

to

David Brownell <brow...@ix.netcom.com> writes:

> Jorn Barger wrote:
> >
> > Peter Flynn <silm...@m-net.arbornet.org> wrote:
> > > [...] If the designer/author or
> > > whoever wants a bigger gap than normal between two paragraphs,
> > > there must be a good reason for it
> >
> > Why? Or more important: What if the author doesn't know the reason?
>
> If there's a reason, SOMEBODY knows it and it should be
> made evident, else it's not going to be understood. And
> what about trying to communicate that reason on a device
> for which whitespace isn't free -- say a color printer
> (some are many $/page!) or a pager with a 4 line display.

This is just a friendly suggestion... When I build a DTD, I very
carefully distinguish between three types of elements:

1. Elements describing CONTENT (e.g. faxnum, author, date)
2. Elements describing STRUCTURE (e.g. para, title, section)
3. Elements describing FORMAT (e.g. em, bold, it, tt, br, hr)

In general, elements describing structure (like para) should not be
used to control format (like putting extra space between other
paras). Why not? Because it is desireable to separate the
infqormation about content, structure, and format.

Some application might number the paragraphs, for example. If you've
used empty paragraphs to achieve a formatting objective your paragraph
numbering might get messed up.

Some other application might want to express the "bigger gap" between
paragraphs with a horizontal rule or some other typesetting trickery.

Achieving a desired "format" outcome using "structure" tags will limit
your ability to write these applications.

Ken Fox

unread,

Sep 21, 1998, 3:00:00 AM9/21/98

to

jo...@mcs.com (Jorn Barger) writes:
> Ken Fox <kf...@pt0204.pto.ford.com> wrote:

> > [...] How will the browsers ever know what to do with "semantic"

> > content if they can't refer to a DTD?
>

> I don't *want* the browser to do anything with it. I want the search
> engine to do something with it. I want the browser to ignore it.

I know. Other people, including myself, want to use that information
to help control page layout. XML and XSL are flexible enough to do
both things.

> > > > XSL can transform back to the browser's DTD.
> > > I don't understand this.
> >
> > What part don't you understand? XSL includes a transformation system.
> > This means browsers will be able to structurally transform document A
> > written in DTD B into HTML. The browser only has to worry about
> > rendering HTML. Authors can create documents using any DTD that has
> > a transformation to HTML.
>

> So when you say "the browser's DTD" you really mean "ordinary HTML
> viewed thru the convoluted perspective of a pseudo-DTD"?

No. Browsers of the future might not use HTML as their fundamental
display DTD. For example, the flow objects in XSL are much more
general than HTML and would be a good choice to use. I don't have
any idea what you mean with "pseudo-DTD".

> > All you need to do to achieve your goals is write an HTML-like XML
> > DTD with an XSL transformation that deletes the non-renderable tags.
>

> PLUS tidy all current markup... yes?

That's an implementation detail for the browsers, not an XML thing.
The XML standard doesn't require XML processors to recover from fatal
errors (like structural problems). *However*, a browser is perfectly
allowed to handle errors gracefully. I argued earlier that authors
will prefer browsers that handle errors gracefully over browsers that
crash-and-burn. If you really want error recovery in XML, tell your
vendor.

> > I don't know how *good* XSL is. All I know is that the concept of XML +
> > XSL gives you *exactly* what you want.
>

> Now I *know* you're waving your hand, because you clearly don't know
> what I want.

I explicitly stated what I thought you wanted and you agreed:

| > 1. You want to preserve the existing body of HTML documents.
| > 2. You want to maintain (or improve) your ability to control
| > page layout.
| > 3. You want a painless way of adding "semantics" to existing
| > documents. (You haven't said much at all about how you want to
| > write new documents.)
| > Am I missing anything?
|
| In terms of goals, maybe not. In terms of arguments, there's still the
| whole anti-layout mindset that needs to be laid to rest.

Have you changed your mind?

I'm not hand waving when I say I don't know how good XSL is. I've
only read the XSL draft and I haven't used an XSL processor yet. The
scope and intent of XML + XSL covers what you want though.

> > Conventional "styles" are useful, but *extremely* limited because they
> > don't allow structural transformations of a document.
>

> Why can't you transform styles?

If you mean "deduce information from the context in which styles are
used," then sure, it's possible to transform styles. It's just really,
really, really hard. For instance, italics might indicate a book
title, but a transformation system would have to understand that
context in which the italics were used to be sure.

In short, a "style" transformation is a one-to-many mapping. An XSL
"semantic" transformation is a one-to-one mapping. I put these things
in quotes because, as far as the computer is concerned, the issue is
neither about style nor semantics. Those "semantic" elements that I
apply XSL to may actually be "style" directives. For example, the
XML+XSL definition of HTML will certainly use XSL to transform the
B and I elements into flow objects with font faces. I don't think
anybody is ever going to argue that B and I are semantic elements.

> > Have you read the XSL spec? I think you might be thinking "fonts
> > and colors" when you hear XSL. It's much more than that.
>

> Examples?

You can re-arrange nested child elements. You can manufacture a
new element stream using the attributes in an element as input to an
arbitrary procedure. You can select from a group of alternatives.

Here's a trivial example:

xsl:template match="author"
center
xsl:process-children
/center
/xsl:template

This transforms to HTML rather than the XSL flow objects. It allows
me to use "author" elements with *implicit* centering rather than your
proposal of adding "author" elements to *explicit* centering.

I can write my draft documents using these higher level elements.
Once I'm happy with content, then I can insert lower level formatting
directives that augment/replace the implicit default formatting of
the high level elements. There's no loss of control here.

Using the template from above, I might start out:

author
Ken Fox
/author

and then later change to

author no-style (pun intended... ;)
right
Ken Fox
/right
/author

The no-style attribute can be implemented as a high priority XSL
template which drops the element, but processes its children:

xsl:template match="*[attribute(no-style)]" priority=100
xsl:process-children
/xsl:template

> My problem is that there *is* a need for fonts-n-colors markup, and the

> XML-crowd takes it as axiomatic that this should be crammed into the
> procrustean bed of structural markup.

I don't think so. You can define an XML tag called font and then
use XSL to transform that pretty literally into flow objects. Now
you've got a font element that you can use anywhere -- along with
all the problems that font elements have already caused in HTML.

In fact, if your job is advertising or something that is only
concerned with page layout, you should probably use a DTD designed
for page layout. The "semantic" elements in that DTD would be page
layout directives.

If you don't like XSL, I believe that CSS can be used with XML.

> > Sure. That has nothing to do with XML. That's a legacy problem with
> > simple "solutions" right now. (Browsers ignore tags they don't
> > understand so just stick in tags that search engines tell you to. This
> > won't work in practice of course because people will abuse it.)
>

> I _believe_ you've just approved my proposal. I seriously doubt you
> realise that that was what I was asking, though, or you wouldn't make
> such a fuss about all the rest of it.

Approved? No, I've said that you don't need to argue with anybody over
whether the technology will let you do it. You have to argue with the
search engines to index some special tags. I doubt they'll do it
though because the potential for abuse is too great when they are
indexing content that is not displayed. People will look at a document
the search engine found and think the search engine is broken because
what they're looking at wasn't the part that matched the query.

My guess is that search engines are waiting for catalogs of well-known
DTDs to describe semantic content. If a document uses a well-known
DTD, it won't be able to spoof the search engine by asking it to index
content ignored by the browser.

The "fuss" I'm making is over you encouraging people to adopt your
"solution" instead of XML + XSL. I don't believe you've really tried
to understand how XML + XSL can help solve your problems. I don't
believe you've tried to think about the weaknesses of your ad hoc
approach.

> Real simple: people who have lots of HTML now, should still be
> encouraged to add semantic tags, without requiring other sorts of
> tidying/conversion.

How much tidying/conversion is required is an implementation issue
that the browser vendors have to decide on. I think there will be a
competitive advantage to browsers that can apply XSL transformations
to invalid/malformed XML documents (perhaps by automatically tidying
the document).

It is also easy to imagine an automatic conversion tool published
by the browser vendors that translate invalid/malformed HTML into
the valid HTML-in-XML that produces exactly the same screen display.

Jorn Barger

unread,

Sep 22, 1998, 3:00:00 AM9/22/98

to

Ken Fox <kf...@pt0204.pto.ford.com> wrote:
[re semantic-tagged content]

> > I don't *want* the browser to do anything with it. I want the search
> > engine to do something with it. I want the browser to ignore it.
> I know. Other people, including myself, want to use that information
> to help control page layout. XML and XSL are flexible enough to do
> both things.

My contention is that there's a near-perfect disjunction between the
semantic elements that search-engines will care about, and the stylistic
elements that designers care about. There's no such thing as a faxnum
style.

> > > I don't know how *good* XSL is. All I know is that the concept of XML +
> > > XSL gives you *exactly* what you want.
> > Now I *know* you're waving your hand, because you clearly don't know
> > what I want.

> I explicitly stated what I thought you wanted and you agreed: [...]

> Have you changed your mind?

You agreed with *me* that the painlessness condition wasn't being met.

> > > Conventional "styles" are useful, but *extremely* limited because they
> > > don't allow structural transformations of a document.
> > Why can't you transform styles?
> If you mean "deduce information from the context in which styles are
> used," then sure, it's possible to transform styles. It's just really,
> really, really hard. For instance, italics might indicate a book
> title, but a transformation system would have to understand that
> context in which the italics were used to be sure.

Strangely enough, though, if you don't worry about whether it's a title,
you can transform the italics themselves in a perfectly comprehensible
way, for most applications.

And CITE is impossible to translate into a style in any sane manner,
because its style depends entirely on the context.

> In short, a "style" transformation is a one-to-many mapping. An XSL
> "semantic" transformation is a one-to-one mapping.

I'd say you have that backwards.

> I put these things
> in quotes because, as far as the computer is concerned, the issue is
> neither about style nor semantics. Those "semantic" elements that I
> apply XSL to may actually be "style" directives. For example, the
> XML+XSL definition of HTML will certainly use XSL to transform the
> B and I elements into flow objects with font faces. I don't think
> anybody is ever going to argue that B and I are semantic elements.

How do you decide between EM and STRONG except by secretly translating
them to their real values, I and B?

Anyone who lays out a text with EM and STRONG, without considering the
unique subtle messages carried by bold and italic, will end up with a
*ridiculous-looking* text.

> > > Have you read the XSL spec? I think you might be thinking "fonts
> > > and colors" when you hear XSL. It's much more than that.
> > Examples?
>
> You can re-arrange nested child elements.

Fine. I place this firmly on the non-styles side of the line.

> You can manufacture a
> new element stream using the attributes in an element as input to an
> arbitrary procedure. You can select from a group of alternatives.

Ditto.

> Here's a trivial example:
>
> xsl:template match="author"
> center
> xsl:process-children
> /center
> /xsl:template
>
> This transforms to HTML rather than the XSL flow objects. It allows
> me to use "author" elements with *implicit* centering rather than your
> proposal of adding "author" elements to *explicit* centering.

And when you want several different styles for authors, on the same
page?

> I can write my draft documents using these higher level elements.
> Once I'm happy with content, then I can insert lower level formatting
> directives that augment/replace the implicit default formatting of
> the high level elements. There's no loss of control here.

But there's an added burden of semantic-analysis, acknowledged by all to
require extensive training to do correctly, or else you reduce your
semantic/structural tags to the status of named styling-macros.

> [...] You can define an XML tag called font and then

> use XSL to transform that pretty literally into flow objects. Now
> you've got a font element that you can use anywhere -- along with
> all the problems that font elements have already caused in HTML.

Fine. But if an author doesn't want to go thru this, they shouldn't be
denied the freedom to experiment with semantic tags, anyway.

> In fact, if your job is advertising or something that is only
> concerned with page layout, you should probably use a DTD designed
> for page layout. The "semantic" elements in that DTD would be page
> layout directives.

You talk like the world divides neatly into layout-no-semantics and
semantics-no-layout. In fact, anyone who's seriously involved with web
publishing knows you have to be expert at both. ***All web pages are
ads, in this sense.***

> [...] You have to argue with the

> search engines to index some special tags. I doubt they'll do it
> though because the potential for abuse is too great when they are
> indexing content that is not displayed. People will look at a document
> the search engine found and think the search engine is broken because
> what they're looking at wasn't the part that matched the query.

By this argument, there should be no search engines at all.

> My guess is that search engines are waiting for catalogs of well-known
> DTDs to describe semantic content.

Of course, since you're committed to that path, you imagine that the
search-engine-folk are, too.

> If a document uses a well-known
> DTD, it won't be able to spoof the search engine by asking it to index
> content ignored by the browser.

I'll believe that when I see it. Greed is infinitely creative.

(But if it does work, then this will quickly win the Darwinian race.)

> The "fuss" I'm making is over you encouraging people to adopt your
> "solution" instead of XML + XSL. I don't believe you've really tried
> to understand how XML + XSL can help solve your problems. I don't
> believe you've tried to think about the weaknesses of your ad hoc
> approach.

Back atcha. Read my AI FAQ: <URL:http://www.mcs.net/~jorn/html/ai.html>

Ken Fox

unread,

Sep 22, 1998, 3:00:00 AM9/22/98

to

jo...@mcs.com (Jorn Barger) writes:
> Strangely enough, though, if you don't worry about whether it's a title,
> you can transform the italics themselves in a perfectly comprehensible
> way, for most applications.

Boggle. Transform italics to what?

Everything else in your article has been beat to death. Why don't
you just hack up your HTML with bogo-tags and try to convice search
engines to treat them specially? Maybe it will work; probably it
won't. At any rate, you'll provide an interesting baseline for us
to compare XML + XSL to.

Jorn Barger

unread,

Sep 22, 1998, 3:00:00 AM9/22/98

to

Ken Fox <kf...@pt0204.pto.ford.com> wrote:
> > Strangely enough, though, if you don't worry about whether it's a title,
> > you can transform the italics themselves in a perfectly comprehensible
> > way, for most applications.
>
> Boggle. Transform italics to what?

Whatever you like. Whatever you changed "EM" into.

Your boggling betrays you.

Alan J. Flavell

unread,

Sep 23, 1998, 3:00:00 AM9/23/98

to

On Tue, 22 Sep 1998, Jorn Barger wrote:

> > Boggle. Transform italics to what?
>
> Whatever you like. Whatever you changed "EM" into.

But conscientious authors know that an "italics" markup means that
italics are needed for a purpose _other_ than EM. After all, if they'd
wanted emphasis they'd have used that.

> Your boggling betrays you.

If you say so. Maybe you should found a cult for it.

gkho...@canadamail.com

unread,

Sep 27, 1998, 3:00:00 AM9/27/98

to

Subject: Re: XHTML: The incremental XML-upgrade path
From: kf...@pt0204.pto.ford.com (Ken Fox)

>The XML standard doesn't require XML processors to
>recover from fatal errors (like structural problems).

Correct!

>*However*, a browser is perfectly
>allowed to handle errors gracefully.

Gracefully, yes, but in the same manner as normal processing?
Not at all! That is strictly prohibited.

>I argued earlier that authors
>will prefer browsers that handle errors gracefully
>over browsers that crash-and-burn.

Handle gracefully instead of crash and burn? Certainly! But
purposely be non-conformant? I don't think so.

>If you really want error recovery in XML, tell your
>vendor.

If I have understood your counsel, I think it is misleading
and totally improper.

[excerpt from http://www.w3.org/TR/1998/REC-xml-19980210]
fatal error:
An error which a conforming XML processor must detect and
report to the application. After encountering a fatal error,
the processor may continue processing the data to search for
further errors and may report such errors to the application.
In order to support correction of errors, the processor may
make unprocessed data from the document (with intermingled
character data and markup) available to the application.
Once a fatal error is detected, however, the processor must
not continue normal processing (i.e., it must not continue
to pass character data and information about the document's
logical structure to the application in the normal way).

Note what the specification requires of the processor: it
*must not* continue to pass character data and information
about the document's logical structre.

Vendors *MUST* be counselled to conform to the standard ...
one big mess of the web is the junk markup processed through
graceful error recovery. XML is specifically designed to
avoid this problem.

*!*!*!*
PLEASE keep this in mind when talking to vendors and choosing
your XML processor ... it is *vitally* important that we, as
paying customers, not allow vendors to "help us" in the short
run by accepting invalid information (ours or others') and
continuing normal processing ... it *doesn't* help in the
long run.
*!*!*!*

Thanks!

.......... Ken
Chairman, XML Conformance Subcommittee
OASIS Technical Committee
The Organization for the Advancement of
   Structured Information Standards
      http://www.oasis-open.org

--
G. Ken Holman               mailto:gkho...@CanadaMail.com
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/
Box 266,                                V: +1(613)489-0999
Kars, Ontario CANADA K0A-2E0            F: +1(613)489-0995
Training:   http://www.CraneSoftwrights.com/x/schedule.htm
Resources: http://www.CraneSoftwrights.com/x/resources.htm
Shareware: http://www.CraneSoftwrights.com/x/shareware.htm

--------------------------------------------------------------------
Posted using Reference.COM http://WWW.Reference.COM
FREE Usenet and Mailing list archive, directory and clipping service
--------------------------------------------------------------------

Ken Fox

unread,

Sep 30, 1998, 3:00:00 AM9/30/98

to

G. Ken Holman (gkho...@canadamail.com) writes:

> Ken Fox (kf...@ford.com) writes:
> > The XML standard doesn't require XML processors to
> > recover from fatal errors (like structural problems).
>
> Correct!
>
> > *However*, a browser is perfectly allowed to handle errors
> > gracefully.
>
> Gracefully, yes, but in the same manner as normal processing?
> Not at all! That is strictly prohibited.

I observed earlier that "normal" is not defined by the standard. Is
this "normal" within the context of the XML parser or "normal" to the
browser? Is this "normal" with respect to the reader's viewing
experience or "normal" to the flow of control within the browser
itself?

A very useful interpretation of "fatal" error handling is "A compliant
XML processor must inform the controlling application when encountering
a fatal error. The application will decide the course of action to
take."

> > I argued earlier that authors will prefer browsers that handle
> > errors gracefully over browsers that crash-and-burn.
>
> Handle gracefully instead of crash and burn? Certainly! But
> purposely be non-conformant? I don't think so.

Yes. A *browser* can be purposefully non-conformant to the XML spec.
(Even a conformant XML parser can have significantly better error
handling than you want.) It's just a conjecture I have that they
*probably* will do more than you think because users (authors and
readers alike) will ask for this feature.

Who would use this feature? Anybody that:

* has documents that have *accidentally* become non-conformant due
to bit-rot. A browser might be able to skip and/or fix the rotted
sections. This is vital for anybody with legal requirements for
long-term document storage.

* is authoring new documents. The browser might be able to skip
over errors and continue to check the rest of the document.
(Don't tell me that this is explicitly permitted. Any "semantic"
errors require the XML parser to pass processed character data and
structure to the browser. Structural errors should not trump
semantic ones.) Bad XML could easily be formatted differently by
the browser to identify sections of the document that must be
fixed.

* is thinking of upgrading a legacy document collection and wants to
evaluate the scope of the work vs. the benefit gained from the
conversion to XML. Turning on tag name case-folding would be a
safe and useful option I suspect.

> > If you really want error recovery in XML, tell your vendor.
>
> If I have understood your counsel, I think it is misleading
> and totally improper.

Good thing expressing opinions is still ok on net news. ;)

> [excerpt from http://www.w3.org/TR/1998/REC-xml-19980210]
> fatal error:
> An error which a conforming XML processor must detect and
> report to the application. After encountering a fatal error,
> the processor may continue processing the data to search for

^^^^^^^^^^^^^

> further errors and may report such errors to the application.
> In order to support correction of errors, the processor may

^^^

> make unprocessed data from the document (with intermingled
> character data and markup) available to the application.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> Once a fatal error is detected, however, the processor must
> not continue normal processing (i.e., it must not continue
> to pass character data and information about the document's
> logical structure to the application in the normal way).

^^^^^^^^^^^^^^^^^

> Note what the specification requires of the processor: it
> *must not* continue to pass character data and information
> about the document's logical structre.

No. You are reading quite a bit into the spec. I posted a note
asking Tim Bray to clarify what browsers are supposed to do with a
fatal error because right now it is open to interpretation. The
browser could "tidy" the source and re-submit it to the processor.
An XML processor could also continue to pass character data and
structure in an *abnormal* way.

> Vendors *MUST* be counselled to conform to the standard ...
> one big mess of the web is the junk markup processed through
> graceful error recovery. XML is specifically designed to
> avoid this problem.

Yes, I understand the intent. Just be sure that your intent is *not*
conveyed by the wording of the standard. At any rate, standards can
not change the desires of the market. Good standards *reflect* the
desires of the market.

> *!*!*!*
> PLEASE keep this in mind when talking to vendors and choosing
> your XML processor ... it is *vitally* important that we, as
> paying customers, not allow vendors to "help us" in the short
> run by accepting invalid information (ours or others') and
> continuing normal processing ... it *doesn't* help in the
> long run.
> *!*!*!*

That's your opinion. I think XML will be too popular to allow
crippled implementations like this. It's like advocating that every
time a C compiler gets a syntax error it has to stop parsing. In
the real world, compilers try to do as much as possible with the input
before giving up. There are even *options* to turn on different
levels of strictness.

Why should XML be any different?