Short article on definition lists, comments please

Ben M

unread,

Aug 29, 2002, 6:30:20 PM8/29/02

to

I have just finished writing an article describing definition lists, how to
use them and why they are useful etc.

Before I start linking to it from elsewhere on my site I would appreciate
any comments, i.e. does it make sense, are the examples any good etc. Thanks
in advance for any help.

The URL http://www.benmeadowcroft.com/webdev/articles/definition-lists.shtml
P.S. the finished article will also have this URL
--
BenM
http://www.benmeadowcroft.com

Neredbojias

unread,

Aug 29, 2002, 7:28:36 PM8/29/02

to

Without even guffawing, Ben M wrote:

> I have just finished writing an article describing definition lists, how to
> use them and why they are useful etc.
>
> Before I start linking to it from elsewhere on my site I would appreciate
> any comments, i.e. does it make sense, are the examples any good etc. Thanks
> in advance for any help.

In the 2nd example are 2 opening <dl> tags and one closing </dl> tag. Is
that right?

--
Neredbojias

Contrary to popular belief, wooden nickels do not come from sawed bucks.

rf

unread,

Aug 29, 2002, 9:25:41 PM8/29/02

to

"Ben M" <cee....@virgin.net> wrote in message
news:agxb9.3359$CG1.1...@newsfep1-win.server.ntli.net...

You give lots of examples of the HTML but no indication of what will be
produced. Follow each example with the result of that example.

Cheers
Richard.

Ben M

unread,

Aug 30, 2002, 1:05:39 PM8/30/02

to

"Neredbojias" <donjuan...@coprophagous.com> wrote in message
news:umtbiec...@corp.supernews.com...

> Without even guffawing, Ben M wrote:
>
> > I have just finished writing an article describing definition lists, how
to
> > use them and why they are useful etc.
> >
> > Before I start linking to it from elsewhere on my site I would
appreciate
> > any comments, i.e. does it make sense, are the examples any good etc.
Thanks
> > in advance for any help.
>
> In the 2nd example are 2 opening <dl> tags and one closing </dl> tag. Is
> that right?

No,
Thanks for the spot.
--
BenM
http://www.benmeadowcroft.com

Ben M

unread,

Aug 30, 2002, 1:06:52 PM8/30/02

to

"rf" <making...@the.time> wrote in message
news:pQzb9.24800$MC2....@news-server.bigpond.net.au...

I'll think about that, it's definitely a good idea, I'm just worried about
it looking a bit unwieldy.

Thanks a lot for the feedback.

--
BenM
http://www.benmeadowcroft.com

Jukka K. Korpela

unread,

Aug 30, 2002, 4:41:01 PM8/30/02

to

"Ben M" <cee....@virgin.net> wrote:

>> You give lots of examples of the HTML but no indication of what will be
>> produced. Follow each example with the result of that example.
>
> I'll think about that, it's definitely a good idea, I'm just worried about
> it looking a bit unwieldy.

It's not always a good idea to include examples of "what will be produced".
People so easily assume that what they see is _the_ presentation.

Maybe a sound file of an aural presentation would avoid this problem.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jukka K. Korpela

unread,

Aug 30, 2002, 4:50:19 PM8/30/02

to

"Ben M" <cee....@virgin.net> wrote:

> I have just finished writing an article describing definition lists, how
> to use them and why they are useful etc.

- -
> http://www.benmeadowcroft.com/webdev/articles/definition-lists.shtml

I'm afraid you wrote it too late. The <dl> markup has already been spoiled by
abuse. The benefits of semantic markup become questionable if we cannot
presume that <dl> _really_ means a definition list. I'm afraid most <dl>
elements on Web page are not definition lists, even under a liberal
interpretation, just as most <blockquote> elements probably aren't any
quotations. A program that processes documents according to the semantics of
its elements - for example, a search engine with a feature to search for
definitions, looking at <dl> and <dfn> markup - would produce odd results
rather often.

Even your document uses <dl> for other than definitions, too. Is e.g. the
following really a definition?
<dt>Content Management</dt>
<dd>A good reason to charge lots of money.</dd>
Does it specify what the term "content management" refers to? No, I would say
that it _postulates_ some definition for it and says something _about_ it, an
opinion.

I'd like to refer to my treatise "Definition - a definition and an analysis",
http://www.cs.tut.fi/~jkorpela/def.html
which contains some comments on HTML markup for definitions.

Neredbojias

unread,

Aug 30, 2002, 6:55:43 PM8/30/02

to

Without even guffawing, Ben M wrote:

> No,
> Thanks for the spot.

I noticed the improvements. 'Tis lookin' pretty good.

Ben M

unread,

Aug 31, 2002, 4:21:48 AM8/31/02

to

"Jukka K. Korpela" <jkor...@cs.tut.fi> wrote in message
news:Xns927AF277236F...@193.229.0.31...

> "Ben M" <cee....@virgin.net> wrote:
>
> > I have just finished writing an article describing definition lists, how
> > to use them and why they are useful etc.
> - -
> > http://www.benmeadowcroft.com/webdev/articles/definition-lists.shtml
>
> I'm afraid you wrote it too late. The <dl> markup has already been spoiled
by
> abuse. The benefits of semantic markup become questionable if we cannot
> presume that <dl> _really_ means a definition list. I'm afraid most <dl>
> elements on Web page are not definition lists, even under a liberal
> interpretation, just as most <blockquote> elements probably aren't any
> quotations. A program that processes documents according to the semantics
of
> its elements - for example, a search engine with a feature to search for
> definitions, looking at <dl> and <dfn> markup - would produce odd results
> rather often.

Witness the notorious "I Love You" google glossary search.

I see your point, it would have problems being implemeneted on the web
(although dispite a few hiccups the google glossary is looking quite good).
I think leveraging semantic information is still if only on a site wide
basis say. My reasoning is (and it applies to more the definition lists) the
semantics of an element are a kind of meta data that can be manipulated as
easily as any other kind. By using semantic markup I am able to query my
site for all the definition lists, pull them out of the body of a document
and create a glossary from them.

As you say, in the context of the entire WWW it is fighting a losing battle
(or at least a battle where the "good guys" are outnumbered 99 to 1, source:
made up statistics inc.) however we can make our part of the network
meaningful, and useful to us and others. Basically doing the right thing is
good, even if it seems no one else does (and it builds character too).

My main motivation in doing this I think was to counter (at least in my own
mind) some of the common CMS roll your own XML stuff that is occuring at the
moment. It seems people are quite happy to invent their own XML languages
with tags such as <PARA> which strikes me as somewhat redundant, by creating
these articles I hope to at least influence the few people who read them
that there is a lot more to HTML than layout tables. I'm currently reading
the content management bible by Bob Boiko (overall a useful book dispite
some erratta), in it he states (p17) "Very few binary formats really enable
you to separate formatting from content. XML and its parent SGML as among
the few that do offer this type of separation. The manager faces the
difficult choice of either converting all of her content into a format such
as XML or storing it in a binary format where rendering format and content
are inseperable (such as HTML)."

After nearly collapsing in a fit of laughter I then realised that a lot of
other people were reading this stuff and taking that statement as true. It
is a shame that HTML has been so closely tied to "presentation" that it is
popularly thought unable to fulfill a role in content managment systems. One
of my hopes is that web authors will "discover" the semantic markup that
makes up HTML and begin using it rather than XML schemas quickly thought out
in a few meetings in an office. In my opinion one of the benefits of valid
XHTML is it's capability to be more easily processed by machines (I realise
HTML can be parsed too, but parsing XML is simpler), and thus can be
processed by CMS systems the other great advantage I see is in it's
exstensibility, if I am writing a CMS system rather than creating an
entirely new markup language (for use in processing the content) I can
merely define extensions that adequately cover any further functionality I
require.

> Even your document uses <dl> for other than definitions, too. Is e.g. the
> following really a definition?
> <dt>Content Management</dt>
> <dd>A good reason to charge lots of money.</dd>
> Does it specify what the term "content management" refers to? No, I would
say
> that it _postulates_ some definition for it and says something _about_ it,
an
> opinion.

Good point I'll ammend that.

> I'd like to refer to my treatise "Definition - a definition and an
analysis",
> http://www.cs.tut.fi/~jkorpela/def.html
> which contains some comments on HTML markup for definitions.

I'll take a look after I've posted this.

--
BenM
http://www.benmeadowcroft.com

Isofarro

unread,

Aug 31, 2002, 6:21:06 AM8/31/02

to

Ben M (cee....@virgin.net) on Saturday 31 August 2002 08:21 in
comp.infosystems.www.authoring.html wrote:

> "Jukka K. Korpela" <jkor...@cs.tut.fi> wrote in message
> news:Xns927AF277236F...@193.229.0.31...
>

>> I'm afraid you wrote it too late. The <dl> markup has already been
>> spoiled by abuse.

As has much of the list of HTML elements.

>> The benefits of semantic markup become questionable if we
>> cannot presume that <dl> _really_ means a definition list.

True. But because people are using the elements incorrectly, should
that prevent the few from using it? Sure, there's no value in
extracting semantic meaning from the vast hoard of "web sites" out
there, so they would need to be ignored until they start adopting
a semantic structure.

> My reasoning is (and it applies to more the
> definition lists) the semantics of an element are a kind of meta data
> that can be manipulated as easily as any other kind. By using semantic
> markup I am able to query my site for all the definition lists, pull
> them out of the body of a document and create a glossary from them.

I've been playing with a very similar idea, but with a restricted set
of elements like abbr, acronym, and dfn. It hadn't ocurred to me to
treat definition lists the same way to extract a meaning.

I'm trying to write an article about the value of structured html,
where the two key examples would be a document outliner (like the
outline structure from w3 validators), and an abbreviation list with
meanings, all extracted from the very same html on the fly.

> As you say, in the context of the entire WWW it is fighting a losing
> battle

I tend to think of it as not a battle we are losing, but an opportunity
we are just starting -- reclaiming our HTML elements. The key arguments
to getting our HTML elements back is the misuse of tables for layout,
and "presentational" abused HTML. Once CSS takes a strong hold, then
authors can then start re-adopting the elements to their initial
meaning and usefulness.

> we can make our part of the network meaningful,
> and useful to us and others.

Yes, we can be more than just flies.

[XML based CMS]

> It seems people are quite happy to invent
> their own XML languages with tags such as <PARA> which strikes me as
> somewhat redundant,

Isn't that from DocBook? But your point still stands, why recreate what
already exists. Or "Why reinvent the square wheel", since the
reinvention isn't as good or mature to what exists.

> by creating these articles I hope to at least
> influence the few people who read them that there is a lot more to
> HTML than layout tables.

We see eye-to-eye here, although I get the niggling feeling that XHTML
is a better tool for semantic structure than HTML, purely because it
will work with the standard XML parsers. HTML would require some sort
of exception handling with elements that don't require closure.

As a bit of an aside, the advantage of semantically structured HTML is
that you can then have an "intelligent" CMS that automatically creates
the title attribute of abbrev and acronym - by remembering that
abbreviation has been used previously. So adding value to the author's
writing process. (Although with restraint, like on the first occurrance
of an abbreviation in a document).

One of the problem of knowledgbases like http://www.everything2.com and
http://www.allmyfaqs.com/faq.pl is that human effort is required to
link a new article with existing material -- this leads to an avoidance
of linking related resources together, or duplication since its easier
to create a reference than find a reference that may or may not exist.

With semantic structure within a CMS, this can be automated by the CMS,
so the author's overhead is merely selecting from a CMS-suggested list
those pages that the author feels will add value to his piece.

> It is a shame that HTML has been so closely tied to
> "presentation" that it is popularly thought unable to fulfill a role
> in content managment systems.

I would have to say that DocBook offers a better selection of elements
than HTML. HTML has its limits in conveying meaning.

> if I am writing a CMS system rather than creating an
> entirely new markup language (for use in processing the content) I can
> merely define extensions that adequately cover any further
> functionality I require.

Ahh, so you will add new elements which would then be transformed down
into compliant XHTML? So instead of creating a new XML syntax, you take
XHTML as a foundation, and extend that?

That makes sense. Though, again, I'm not convinced yet where XHTML is a
better benefit than DocBook. Only people used to authoring HTML will
find the time to adjust much lower.

--
Iso.
FAQs: http://html-faq.com http://alt-html.org http://allmyfaqs.com/
Recommended Hosting: http://www.affordablehost.com/
AnyBrowser Campaign: http://www.anybrowser.org/campaign/

Ben M

unread,

Aug 31, 2002, 7:17:17 AM8/31/02

to

"Isofarro" wrote in message news:ia5qka...@sidious.isolani.co.uk...

> Ben M (cee....@virgin.net) on Saturday 31 August 2002 08:21 in
> comp.infosystems.www.authoring.html wrote:
>
> > "Jukka K. Korpela" <jkor...@cs.tut.fi> wrote in message
> > news:Xns927AF277236F...@193.229.0.31...
> >
> >> I'm afraid you wrote it too late. The <dl> markup has already been
> >> spoiled by abuse.
>
> As has much of the list of HTML elements.
>
> >> The benefits of semantic markup become questionable if we
> >> cannot presume that <dl> _really_ means a definition list.
>
> True. But because people are using the elements incorrectly, should
> that prevent the few from using it? Sure, there's no value in
> extracting semantic meaning from the vast hoard of "web sites" out
> there, so they would need to be ignored until they start adopting
> a semantic structure.
>
> > My reasoning is (and it applies to more the
> > definition lists) the semantics of an element are a kind of meta data
> > that can be manipulated as easily as any other kind. By using semantic
> > markup I am able to query my site for all the definition lists, pull
> > them out of the body of a document and create a glossary from them.
>
> I've been playing with a very similar idea, but with a restricted set
> of elements like abbr, acronym, and dfn. It hadn't ocurred to me to
> treat definition lists the same way to extract a meaning.

I've found that since I started authoring in XHTML strict (and reading the
specs) I've discovered that there is a lot of potentially interesting things
you can do with semantic XHTML. As you mention later it is much easier to
perform these actions with valid XHTML than with HTML, getting at the
individual elements isn't difficult with the XML DOM.

> I'm trying to write an article about the value of structured html,
> where the two key examples would be a document outliner (like the
> outline structure from w3 validators), and an abbreviation list with
> meanings, all extracted from the very same html on the fly.
>
> > As you say, in the context of the entire WWW it is fighting a losing
> > battle
>
> I tend to think of it as not a battle we are losing, but an opportunity
> we are just starting -- reclaiming our HTML elements. The key arguments
> to getting our HTML elements back is the misuse of tables for layout,
> and "presentational" abused HTML. Once CSS takes a strong hold, then
> authors can then start re-adopting the elements to their initial
> meaning and usefulness.

An optomist? Good, I would like to agree with you. I of course can't make
any assert that my predictions are accurate (until I invent that time
machine) so I don't think I'll make any (at least not now google stores
these things for future reference). Suffice to say that I hope people
involved in developing advanced CMS don't look too far beyond what's already
available to invent their own proprietary schemas.

> > we can make our part of the network meaningful,
> > and useful to us and others.
>
> Yes, we can be more than just flies.
>
> [XML based CMS]
> > It seems people are quite happy to invent
> > their own XML languages with tags such as <PARA> which strikes me as
> > somewhat redundant,
>
> Isn't that from DocBook? But your point still stands, why recreate what
> already exists. Or "Why reinvent the square wheel", since the
> reinvention isn't as good or mature to what exists.

I've never really looked at DocBook, I got it from a content management book
I am currently reading.

> > by creating these articles I hope to at least
> > influence the few people who read them that there is a lot more to
> > HTML than layout tables.
>
> We see eye-to-eye here, although I get the niggling feeling that XHTML
> is a better tool for semantic structure than HTML, purely because it
> will work with the standard XML parsers. HTML would require some sort
> of exception handling with elements that don't require closure.
>
> As a bit of an aside, the advantage of semantically structured HTML is
> that you can then have an "intelligent" CMS that automatically creates
> the title attribute of abbrev and acronym - by remembering that
> abbreviation has been used previously. So adding value to the author's
> writing process. (Although with restraint, like on the first occurrance
> of an abbreviation in a document).
>
> One of the problem of knowledgbases like http://www.everything2.com and
> http://www.allmyfaqs.com/faq.pl is that human effort is required to
> link a new article with existing material -- this leads to an avoidance
> of linking related resources together, or duplication since its easier
> to create a reference than find a reference that may or may not exist.
>
> With semantic structure within a CMS, this can be automated by the CMS,
> so the author's overhead is merely selecting from a CMS-suggested list
> those pages that the author feels will add value to his piece.

Here's to the semantic web, hoping it comes soon!

I really like the ideas you brought up, I would love to not have to bother
defining the first instance of an abbreviation on a page manually (like I do
currently). What I would also like is to write an article on definition
lists and have a list of related articles presented to me so I can point to
them as further reading (not neccesairily from my site either). Such a thing
could be done using RSS querying technologies and related links with
autodiscovery.

For example http://holovaty.com/content/customfeeds/ lets you send a keyword
or phrase query and returns a cusomised RSS feed of the last five articles
that match your criteria. Couple this with a list of websites that implement
similar interfaces (http://www.syndic8.com/ might already let you do this I
havent had time to look) and your CMS can query several trusted sites for
related articles. If you then search for sites related to other sites in
your list (using the google API for example) and then perform RSS
autodiscovery to get an even bigger list of related articles to choose from.

> > It is a shame that HTML has been so closely tied to
> > "presentation" that it is popularly thought unable to fulfill a role
> > in content managment systems.
>
> I would have to say that DocBook offers a better selection of elements
> than HTML. HTML has its limits in conveying meaning.
>
> > if I am writing a CMS system rather than creating an
> > entirely new markup language (for use in processing the content) I can
> > merely define extensions that adequately cover any further
> > functionality I require.
>
> Ahh, so you will add new elements which would then be transformed down
> into compliant XHTML? So instead of creating a new XML syntax, you take
> XHTML as a foundation, and extend that?
>
> That makes sense. Though, again, I'm not convinced yet where XHTML is a
> better benefit than DocBook. Only people used to authoring HTML will
> find the time to adjust much lower.

I haven't really looked at DocBook before to be honest with you. I may take
a look later this afternoon.

Thanks a lot for your interesting comments.

--
BenM
http://www.benmeadowcroft.com

Isofarro

unread,

Aug 31, 2002, 10:27:23 AM8/31/02

to

Ben M (cee....@virgin.net) on Saturday 31 August 2002 11:17 in
comp.infosystems.www.authoring.html wrote:

> "Isofarro" wrote in message
> news:ia5qka...@sidious.isolani.co.uk...

>> Once CSS takes a
>> strong hold, then authors can then start re-adopting the elements to
>> their initial meaning and usefulness.
>
> An optomist? Good, I would like to agree with you.

Heh, yes its a bit optimistic, but from a business sense, if you can
make a case for structured markup that's to the company's benefit, they
may just listen. I'm angling for the accessibility route at the moment,
since with XHTML and a document outline, a page can be comfortably
accessible on a mobile phone or PDA (as an outline - which is just a
nested list of title and header elements).

I keep looking at ways to break up current HTML documents into chunks,
and the outline method makes the most sense. The nice thing about WML
is the "card" layout, so each chunk of content (one chunk is a block of
content associated with one of the headers in the outline) is on a
separate card (sort of like interpage links -- <a
href="#header1">...</a>), so a big document can be displayed usably and
accessibly on a small screen device.

Selling the ability to grab the eyeballs of a small-screen mobile
device user, therein lies the seed for pushing structured and semantic
markup (IMO). Plus a mobile device user is more likely to want "quick
facts" than a lengthy prose -- so that leaves the essential need to
markup abbreviations and definition lists, and why require a separate
website for that when a well-structured markup will deliver it for free
(in addition to installing an XML-parser).

I'd like a way to identify any element (or node) in a document, so
maybe XPath will help there - then I can start breaking down documents
into smaller and more mobile elements.

> Here's to the semantic web, hoping it comes soon!

IMO, the Semantic Web needs lots structured content first before it is
to achieve a level of usefulness. I feel it is our job as content
authors and markup specialists (and CMS creators) to help lay that
foundation.

Its ironic that the strawman argument against us (HTM hand-coders) is
we're stuck in the past, when its obvious enough (to me anyway) that
the next great evolution of the Web lies in the structured and clean
markup we evangelise.

> I would love to not have to
> bother defining the first instance of an abbreviation on a page
> manually (like I do currently).

There is the immediate benefit -- you can concentrate on what you
enjoy, writing articles. And then the CMS does the donkey work of
offering suggestions of relevant other reading material.

> Such a thing could be done using
> RSS querying technologies and related links with autodiscovery.

So a CMS should provide an automated way of producing an RSS feed for a
website. Then other CMS's can find the information and leverage it.

> For example http://holovaty.com/content/customfeeds/ lets you send a
> keyword or phrase query and returns a cusomised RSS feed of the last
> five articles that match your criteria. Couple this with a list of
> websites that implement similar interfaces (http://www.syndic8.com/
> might already let you do this I havent had time to look) and your CMS
> can query several trusted sites for related articles.

Nice! That's what I love about the Web. The ability to refer to someone
else's content, and then add your own knowledge to it. Its
collaboration far beyond the typical level Enterprise CMS.

> I haven't really looked at DocBook before to be honest with you. I may
> take a look later this afternoon.

The place to start would be
http://www.oasis-open.org/committees/docbook/

And the online book http://www.docbook.org/

The foundations is very much in the SGML world, but the XML
implementation is good enough.

> Thanks a lot for your interesting comments.

Ditto! You've given me some extra food for thought (the RSS ideas and
links).

George Lund

unread,

Aug 31, 2002, 2:32:08 PM8/31/02

to

In message <ia5qka...@sidious.isolani.co.uk>, Isofarro
<spam...@spamdetector.co.uk> writes

>>> The benefits of semantic markup become questionable if we
>>> cannot presume that <dl> _really_ means a definition list.
>
>True. But because people are using the elements incorrectly, should
>that prevent the few from using it?

A variety of uses of the DL list type, not necessarily related to
definitions at all, appear to me to be in keeping with the HTML
specification. That the element name happens to be short for one of the
possible uses (definition list) is, to all intents and purposes, just a
confusing shorthand.

A particular example given in the current specification is "for marking
up dialogues, with each DT naming a speaker, and each DD containing his
or her words." [ http://www.w3.org/TR/html4/struct/lists.html#h-10.3 ]

To be honest, HTML *needs* a way to mark up a generic list where there
important labelling content that precedes each list item (other than
just a sequence of numbers). I think authors have had no choice but to
use DL for this... and the specifications have, as ever, done nothing to
clarify the situation.

I disagree with Jukka Korpela that such a general list-style is "purely
presentational". Generic mark-up like this is hard to identify as
having a very specific semantics, but that doesn't make it wrong. And
HTML is a such a general language that it probably *shouldn't* try to
include separate element structures for each application (catalogue
numbering, dialogue mark-up, definition lists, etc.).

Just some thoughts anyway.
--
George

Nick Kew

unread,

Aug 31, 2002, 3:53:06 PM8/31/02

to

In article <ia5qka...@sidious.isolani.co.uk>, one of infinite monkeys
at the keyboard of Isofarro <spam...@spamdetector.co.uk> wrote:

> (as usual, much that I agree with)

> I'm trying to write an article about the value of structured html,
> where the two key examples would be a document outliner (like the
> outline structure from w3 validators), and an abbreviation list with
> meanings, all extracted from the very same html on the fly.

Yo! Do you have the code to do that? If not, would you be interested
in some samples of how easy it is?

> We see eye-to-eye here, although I get the niggling feeling that XHTML
> is a better tool for semantic structure than HTML, purely because it
> will work with the standard XML parsers.

HTML works with (Open)SP (as used by the validation services), with the
parsers used by browsers, and with many of the XML parsers (for example,
Accessibility Valet uses the HTMLParser from libxml2, and Neko extends
Xerces to work with HTML). The choice between HTML and XHTML makes
very little difference to the parsers.

> As a bit of an aside, the advantage of semantically structured HTML is
> that you can then have an "intelligent" CMS that automatically creates
> the title attribute of abbrev and acronym - by remembering that
> abbreviation has been used previously.

Yep. And this kind of thing can sometimes also be usefully post-processed
in a proxy, to improve existing pages on-the-fly.

> One of the problem of knowledgbases like http://www.everything2.com and
> http://www.allmyfaqs.com/faq.pl is that human effort is required to
> link a new article with existing material -- this leads to an avoidance
> of linking related resources together, or duplication since its easier
> to create a reference than find a reference that may or may not exist.
>
> With semantic structure within a CMS, this can be automated by the CMS,
> so the author's overhead is merely selecting from a CMS-suggested list
> those pages that the author feels will add value to his piece.

You seem to be somewhere close to reinventing the original WebThing,
which was designed to solve precisely that problem. You also seem to be
doing a much more thorough job than I did with it back in '95. I'd be
interested to learn more about your project, if it's available for
public viewing.

--
Nick Kew

Available for contract work - Programming, Unix, Networking, Markup, etc.

Jukka K. Korpela

unread,

Aug 31, 2002, 5:12:43 PM8/31/02

to

"Ben M" <cee....@virgin.net> wrote:

> Witness the notorious "I Love You" google glossary search.

Google glossary search? What's that? Oh, we have Google don't we...
http://labs.google.com/glossary
How have I missed that? Anyway, based on a quick look at the pages that it
finds, I don't think it uses <dl> or <dfn> much. In fact, most glossaries on
the Web probably have neither of those markup elements and, on the other
hand, <dl> is used for many other things than glossaries. So to the extent
that the search produces odd results, the causes are not in <dl>.

> My reasoning is (and it applies to more the definition
> lists) the semantics of an element are a kind of meta data that can be
> manipulated as easily as any other kind.

I completely agree. But to make that work on the Web, we would need
1) sufficiently rich and semantically well-defined markup elements that can
be used
2) author awareness and willingness to use them, and use them properly
3) search engines and other software that make use of them (hereby
reinforcing 2)).
This is just a dream at present, and it is optimistic enough to keep the
dream.

Jukka K. Korpela

unread,

Aug 31, 2002, 5:19:23 PM8/31/02

to

George Lund <geo...@lundbooks.co.uk> wrote:

> To be honest, HTML *needs* a way to mark up a generic list where there
> important labelling content that precedes each list item

In what sense would that be logically different from a table with two
columns?

I used to have some <dl> elements in contexts like annotated link lists. I
even defended myself (to myself) saying that the link name is a sort-of term
and my annotation is a sort-of definition. But it was foolish of me. Actually
I switched to using heading elements for the link names and <p> markup for
the annotations.

> I disagree with Jukka Korpela that such a general list-style is "purely
> presentational".

Have I said so? I think I have said that <dl> has been used for purely
presentational reasons when it has been used to achieve the common
presentation style of <dl> elements, despite the fact that the structure of
the data is not a definition list.

Ben M

unread,

Aug 31, 2002, 7:38:27 PM8/31/02

to

"Nick Kew" <ni...@fenris.webthing.com> wrote in message
news:2r6rka...@jarl.webthing.com...

> In article <ia5qka...@sidious.isolani.co.uk>, one of infinite monkeys
> at the keyboard of Isofarro <spam...@spamdetector.co.uk> wrote:
>
> > We see eye-to-eye here, although I get the niggling feeling that XHTML
> > is a better tool for semantic structure than HTML, purely because it
> > will work with the standard XML parsers.
>
> HTML works with (Open)SP (as used by the validation services), with the
> parsers used by browsers, and with many of the XML parsers (for example,
> Accessibility Valet uses the HTMLParser from libxml2, and Neko extends
> Xerces to work with HTML). The choice between HTML and XHTML makes
> very little difference to the parsers.

I've not used HTML parsers (just XML parsers) can you get them to represent
the HTML in such a manner that XSL transformations can be applied to them?

> > As a bit of an aside, the advantage of semantically structured HTML is
> > that you can then have an "intelligent" CMS that automatically creates
> > the title attribute of abbrev and acronym - by remembering that
> > abbreviation has been used previously.
>
> Yep. And this kind of thing can sometimes also be usefully post-processed
> in a proxy, to improve existing pages on-the-fly.
>

> > With semantic structure within a CMS, [linking related resources] can be

automated by the CMS,
> > so the author's overhead is merely selecting from a CMS-suggested list
> > those pages that the author feels will add value to his piece.
>
> You seem to be somewhere close to reinventing the original WebThing,
> which was designed to solve precisely that problem. You also seem to be
> doing a much more thorough job than I did with it back in '95. I'd be
> interested to learn more about your project, if it's available for
> public viewing.

I tried to look at http://www.webthing.com/selforg/ but I was getting an
internal server error. Is this site related to your '95 project? Do you have
any more info on it?

--
BenM
http://www.benmeadowcroft.com

Ben M

unread,

Aug 31, 2002, 8:05:51 PM8/31/02

to

"Jukka K. Korpela" <jkor...@cs.tut.fi> wrote in message

news:Xns927C21CF7E6C...@193.229.0.31...

> "Ben M" <cee....@virgin.net> wrote:
>
> > My reasoning is (and it applies to more the definition
> > lists) the semantics of an element are a kind of meta data that can be
> > manipulated as easily as any other kind.
>
> I completely agree. But to make that work on the Web, we would need
> 1) sufficiently rich and semantically well-defined markup elements that
can
> be used
> 2) author awareness and willingness to use them, and use them properly
> 3) search engines and other software that make use of them (hereby
> reinforcing 2)).
> This is just a dream at present, and it is optimistic enough to keep the
> dream.

I'll skip point 1, as you seem to cover it well in
http://www.cs.tut.fi/~jkorpela/def.html

Too answer point 2 and 3, one thing more discriminating search engines have
given us in keyword relevancy scores for a HTML page. It is a known search
engine optimization technique to actually use h1, h2 elements etc with
descriptive heading text in a page to boost page relevancy. This has the by
product of authors producing at least semi structured HTML. As search engine
companies are still investing in leveraging the vast amount of information
they have to draw on I am reasonably confident that advances will be made
that will prompt authors to use even more meaningful markup in an attempt to
make their pages rank higher.

see http://www.mcu.org.uk/articles/cssaccessproblems.html

As you say 3 reinforces 2, I am grateful to see that there is some
investigation into areas where semantic markup would provide a benefit. With
the recent improvements in authoring tools like dreamweaver and golive there
is still a long way to go but at least (from my vantage point) things are
beginning to move in the right direction again.

[disclaimer] I am by nature slightly optimistic.

--
BenM
http://www.benmeadowcroft.com

Nick Kew

unread,

Sep 1, 2002, 7:12:47 AM9/1/02

to

In article <hScc9.6011$hA4.1...@newsfep1-win.server.ntli.net>, one of infinite monkeys

at the keyboard of "Ben M" <cee....@virgin.net> wrote:

> I've not used HTML parsers (just XML parsers) can you get them to represent
> the HTML in such a manner that XSL transformations can be applied to them?

Yes. Take a look at <URL:http://valet.webthing.com/access/>
for an example in which I do that. With certain provisos, both
SAX and DOM work: SAX by generating events where tags are implied,
DOM using the HTML DOM.

> I tried to look at http://www.webthing.com/selforg/ but I was getting an
> internal server error. Is this site related to your '95 project? Do you have
> any more info on it?

Ohbugger, comes of updating to Apache2.0 in a big hurry when the security
alert happened. Yes, selforg was the (1996) second edition of the
original, but was dropped when I didn't find a real-life project to
use it in. Google for "Holistic Hypertext" for how it relates to
what Isofarro describes, and how his ideas have scope for improving
on mine:-)

The kind of application I saw as its primary field was a bibliographic
archive, where it would generate automatic links so that if I type
[Kew, 1995] in an article such as this, it becomes a link if and only
if one or more article matching the reference is available in the archive.

George Lund

unread,

Sep 1, 2002, 7:50:49 AM9/1/02

to

In message <Xns927C33E66F24...@193.229.0.31>, Jukka K.
Korpela <jkor...@cs.tut.fi> writes

>George Lund <geo...@lundbooks.co.uk> wrote:
>
>> To be honest, HTML *needs* a way to mark up a generic list where there
>> important labelling content that precedes each list item
>
>In what sense would that be logically different from a table with two
>columns?

I suppose one could also ask if there is a difference between a UL and a
single-column list. I think the there is a difference but I would be
hard pushed to explain it, beyond mumbling something about 'typical
presentation' :-)

>Actually
>I switched to using heading elements for the link names and <p> markup for
>the annotations.

I think that is probably the best most logical alternative in HTML if DL
is restricted to only meaning definition list.

>I think I have said that <dl> has been used for purely
>presentational reasons when it has been used to achieve the common
>presentation style of <dl> elements, despite the fact that the structure of
>the data is not a definition list.

Sorry if I misunderstood.

regards
--
George Lund

Isofarro

unread,

Sep 1, 2002, 7:35:09 AM9/1/02

to

Nick Kew (ni...@fenris.webthing.com) on Saturday 31 August 2002 19:53 in
comp.infosystems.www.authoring.html wrote:

> In article <ia5qka...@sidious.isolani.co.uk>, one of infinite
> monkeys at the keyboard of Isofarro <spam...@spamdetector.co.uk>
> wrote:
>
>> (as usual, much that I agree with)
>
>> I'm trying to write an article about the value of structured html,
>> where the two key examples would be a document outliner (like the
>> outline structure from w3 validators), and an abbreviation list with
>> meanings, all extracted from the very same html on the fly.
>
> Yo! Do you have the code to do that? If not, would you be interested
> in some samples of how easy it is?

I've written my own outliner using PHP and SAX, to extract
abbreviations etc would now just be a case of changing the "list of
elements" hashtable.

Thanks for the offer (your reputation for parsing HTML does preceed
you). The only reason I'm writing an article is so that I have a good
excuse to play around with XML parsers on HTML source - so I'm well
aware that I'd be reinventing the square wheel.

>> We see eye-to-eye here, although I get the niggling feeling that
>> XHTML is a better tool for semantic structure than HTML, purely
>> because it will work with the standard XML parsers.
>

> (for example, Accessibility Valet uses the HTMLParser from libxml2,
> and Neko extends Xerces to work with HTML). The choice between HTML
> and XHTML makes very little difference to the parsers.

Ah excellent, then I don't have to give up on HTML4 just yet then.

>> As a bit of an aside, the advantage of semantically structured HTML
>> is that you can then have an "intelligent" CMS that automatically
>> creates the title attribute of abbrev and acronym - by remembering
>> that abbreviation has been used previously.
>
> Yep. And this kind of thing can sometimes also be usefully
> post-processed in a proxy, to improve existing pages on-the-fly.

You are heading in the same direction I want to go. I want to take the
next step (which someone like Jim Ley has probably already done) of
boiling these features down to javascript bookmarklets (especially the
basic "extract from the current DOM" functions). I'll probably end up
with something like TopText :-)

My long term interest is in intelligent agents - configured by the
browser user to "travel" the web with a semblance of understanding and
using structured HTML to build its knowledge. Your idea of a proxy
doing this processing fits in rather nicely, much better that something
added "on-top" of browsers.

>> With semantic structure within a CMS, this can be automated by the
>> CMS, so the author's overhead is merely selecting from a
>> CMS-suggested list those pages that the author feels will add value
>> to his piece.
>
> You seem to be somewhere close to reinventing the original WebThing,
> which was designed to solve precisely that problem.

Ahh, thanks for that info, I'll be doing a little reading then.

> You also seem to be
> doing a much more thorough job than I did with it back in '95.

I warn you - its all just ideas of what I want to do, how much of it
gets implemented only time, interest and enthusiasm will tell. I have a
bad habit of creating collections of just-started and half finished
projects. I like the "solving problems" bit, but suffer in the "actual
implementation" stages since by then the main problems and challenges
are solved.

> I'd be
> interested to learn more about your project, if it's available for
> public viewing.

If it actually amounts to anything, it will be publically available
(under a GPL, if any implementation is remotely useful. Its about time
I started giving back to the Open Source world.)

Nick Kew

unread,

Sep 1, 2002, 9:20:46 AM9/1/02

to

In article <e1uska...@sidious.isolani.co.uk>, one of infinite monkeys

at the keyboard of Isofarro <spam...@spamdetector.co.uk> wrote:

[ this would be email if I'd found a suitable-looking address in your
post or at http://www.isolani.co.uk/contact/ ]

> I've written my own outliner using PHP and SAX, to extract
> abbreviations etc would now just be a case of changing the "list of
> elements" hashtable.

Sounds like your work might fit in well with what we (including Jim
and myself among others) are doing in the W3C fora. You might be
interested in joining what I describe as a virtual coffee machine -
where we exchange ideas, discuss problems/issues/etc - on IRC.

> Thanks for the offer (your reputation for parsing HTML does preceed
> you). The only reason I'm writing an article is so that I have a good
> excuse to play around with XML parsers on HTML source - so I'm well
> aware that I'd be reinventing the square wheel.

It's a worthwhile exercise: it's a problem that's still open enough
to offer potential for your efforts to improve on what's available.

> Ah excellent, then I don't have to give up on HTML4 just yet then.

:-)

>> Yep. And this kind of thing can sometimes also be usefully
>> post-processed in a proxy, to improve existing pages on-the-fly.
>
> You are heading in the same direction I want to go.

I wrote a quick-demo accessibility proxy some months ago; building on
it towards things like smart inference remain TBD.

> I want to take the
> next step (which someone like Jim Ley has probably already done) of
> boiling these features down to javascript bookmarklets (especially the
> basic "extract from the current DOM" functions). I'll probably end up
> with something like TopText :-)

That's another useful feature, but doesn't quite do the same as the
proxy proposal because it requires more of the end-user.

> My long term interest is in intelligent agents - configured by the
> browser user to "travel" the web with a semblance of understanding and
> using structured HTML to build its knowledge.

Like "sailor", for instance?
(or, dammit, like google et al :-)

> Your idea of a proxy
> doing this processing fits in rather nicely, much better that something
> added "on-top" of browsers.

Well, they can complement each other, just as some of Jim's smart client
tools use my server-based tools.

>> You also seem to be
>> doing a much more thorough job than I did with it back in '95.
>
> I warn you - its all just ideas of what I want to do, how much of it
> gets implemented only time, interest and enthusiasm will tell. I have a
> bad habit of creating collections of just-started and half finished
> projects. I like the "solving problems" bit, but suffer in the "actual
> implementation" stages since by then the main problems and challenges
> are solved.

Classic hack:-) Nice if you have the luxury!

--
Nick Kew

Looking for work - http://www.webthing.com/~nick/cv.html

Jukka K. Korpela

unread,

Sep 1, 2002, 9:43:12 AM9/1/02

to

George Lund <geo...@lundbooks.co.uk> wrote:

> I suppose one could also ask if there is a difference between a UL and a
> single-column list.

I presume you mean single-column _table_.

> I think the there is a difference but I would be
> hard pushed to explain it, beyond mumbling something about 'typical
> presentation' :-)

There are some _potential_ structural differences: a table may have one cell
(or several cells) indicated as header cells, it may contain a <caption>
element and a summary attribute, etc.

In addition to the default presentation being probably different (depending
on the browser of course), there are different ways to suggest something
about the presentation in HTML. For <ul>, you can basically suggest just the
shape of the bullets; for <table>, you can suggest borders, spacing, etc.

Isofarro

unread,

Sep 1, 2002, 10:41:55 AM9/1/02

to

[This post is the output of an on-the-fly learn and explore exercise,
sorry if it seems a bit jumbled ]

Nick Kew (ni...@fenris.webthing.com) on Sunday 01 September 2002 11:12
in comp.infosystems.www.authoring.html wrote:

> In article <hScc9.6011$hA4.1...@newsfep1-win.server.ntli.net>, one
> of infinite monkeys at the keyboard of "Ben M" <cee....@virgin.net>
> wrote:
>>
>> I tried to look at http://www.webthing.com/selforg/ but I was getting
>> an internal server error. Is this site related to your '95 project?
>> Do you have any more info on it?
>

> Google for "Holistic Hypertext" for how it relates to
> what Isofarro describes, and how his ideas have scope for improving
> on mine:-)

From: http://wdvl.internet.com/Vlib/Software/Conferencing.html

WebThing: WebThing is the self-organising website. Supports
collaborative document authoring (with full revision control system)
and conferencing via threaded discussions. Workgroup support includes
access control at three levels, and mutual mailing lists. Databases
(table-of-contents and index) are automatically maintained. Holistic
HyperText ensures links are always up to date and relieves webmasters
of the chore of updating them.

A search for "Holistic Hypertext" returns:
http://www.dcs.gla.ac.uk/idom/irlist/new/1996/96-xiii-9-296/Holistic_HyperText_and_the_Self-Organising_Website.html

Is there a document with more detail available? On first impression it
sounds good. Abstracting away the URLs and let them be dynamically
generated at request time.

From:
http://groups.google.com/groups?q=Holistic+HyperText&hl=en&lr=&ie=UTF-8&selm=jkc226.ib.ln%40jarl.webthing.com&rnum=3

Quote: "In a web system implementing Holistic Hypertext, such as the
'95 and '96 WebThing systems, you would enter the word "melanoma" as a
keyword for your new document. When you subsequently read _any_ other
document that contains the word, it will be rendered as an HTML link."

Ahh, so the content author does not even have to say "I'd like to link
this word to something". WebThing would scan the content and add in
links it decides are appropriate - or would the keyword need to be
[bracketed]? I suppose that choice is a matter of taste.

*expletive* *expletive*
http://groups.google.com/groups?q=Holistic+HyperText&hl=en&lr=&ie=UTF-8&selm=33ppn5.t9.ln%40jarl.webthing.com&rnum=4

Quote: "How WebThing's hypertext works: if one or more papers by
Bernstein is available in a collection when I read your posting, then
"Bernstein" is rendered as an HTML link. ... You, as author of the
posting, don't even have to know whether the paper is in the
collection,"

Wow - that just fried my little brain cell. You are not limiting this
to web-server delivered browser content - you actually envision every
application you use (like a newsreader or mail client, or conceivably a
word processor). So the links become pervasive on any text output, as
natural as application-dependant context sensitive help.

Now that allied with an intelligent Agent that already knows your
preferences and interests, some of those links will be a gold mine of
information. Totally independant of the content author, and
customisable by the reader.

> The kind of application I saw as its primary field was a bibliographic
> archive, where it would generate automatic links so that if I type
> [Kew, 1995] in an article such as this, it becomes a link if and only
> if one or more article matching the reference is available in the
> archive.

Ahh, the foundations of a wiki, but with added intelligence (link
searching). So it works by returning a list of documents that match
both keywords "Kew" and "1995"?

I'm in agreement with you about anchor links being the html equivalent
of GOTO programming. Your Holistic Hypertext idea goes a long way in
providing a powerful alternative.

Your Holistic Hypertext implementation, along with your Groupware
implementation make a powerful and compelling experience. Is there a
publically available download that I can have a look at?

I'm surprised a killer application hasn't been found for it - off the
top of my head I can name a few applications:
* Intelligent Agents (biased of course!)
* Intranet documentation of things like company guidelines and policies
* Developer documentation (heck Java classes, constants, and method
names are prime candidates)

I'm also surprised Wiki implementations haven't picked this idea up,
but most likely its an advanced idea not applicable to a small perl
file/text-file storage system because of the overhead of generating it
on the fly. I suppose a link caching mechanism is a virtual must.

Thanks, you've just opened up a huge new world of ideas to play with!

Andy Dingley

unread,

Sep 2, 2002, 8:52:48 AM9/2/02

to

On Sat, 31 Aug 2002 09:21:48 +0100, "Ben M" <cee....@virgin.net>
wrote:

Quoting the content management bible by Bob Boiko :

>The manager faces the
>difficult choice of either converting all of her content into a format such
>as XML or storing it in a binary format where rendering format and content
>are inseperable (such as HTML)."

What's so wrong with this view ? Now we all (apart from Idiot Boy and
his "tutorial") know that HTML was supposed to be presentation-free,
but the sad thing is that it _isn't_. We live in a world of Netscape
bolt-ons, inadequate CSS support and an inability to abandon, even
now, the bodges we had to use from necessity a few years ago.

>After nearly collapsing in a fit of laughter I then realised that a lot of
>other people were reading this stuff and taking that statement as true. It
>is a shame that HTML has been so closely tied to "presentation" that it is
>popularly thought unable to fulfill a role in content managment systems.

This would indeed be a shame.

Namespaces ! I know Arjun hates them, but they're an easy way to
build CMS around HTML.

> One
>of my hopes is that web authors will "discover" the semantic markup that
>makes up HTML and begin using it rather than XML schemas quickly thought out
>in a few meetings in an office.

Writing an XML schema is never (or hardly ever) a good idea. Do it in
HTML, do it in DocBook, but don't go re-inventing wheels.

Ben M

unread,

Sep 2, 2002, 1:31:16 PM9/2/02

to

"Andy Dingley" <din...@codesmiths.com> wrote in message
news:i8i6nu4l777uce3pa...@4ax.com...

> On Sat, 31 Aug 2002 09:21:48 +0100, "Ben M" <cee....@virgin.net>
> wrote:
>
>
> Quoting the content management bible by Bob Boiko :
>
> >The manager faces the
> >difficult choice of either converting all of her content into a format
such
> >as XML or storing it in a binary format where rendering format and
content
> >are inseperable (such as HTML)."
>
> What's so wrong with this view ? Now we all (apart from Idiot Boy and
> his "tutorial") know that HTML was supposed to be presentation-free,
> but the sad thing is that it _isn't_. We live in a world of Netscape
> bolt-ons, inadequate CSS support and an inability to abandon, even
> now, the bodges we had to use from necessity a few years ago.

HTML as it is written by the majority is a mix of presentation and content.
The point I was making that it was not in fact "inseperable", seperating
presentation and content in HTML can be done whether it looks nice or not is
another matter :) (As it should be when seperating content and
presentation). Looking at HTMl as it is popularly written however it is easy
to miss this point of view. The use of HTML without presentational tricks is
in fact entirely possible, and a relatively easy way to store data in a CMS
system, compared to developing proprietary schemas etc at least...

> >After nearly collapsing in a fit of laughter I then realised that a lot
of
> >other people were reading this stuff and taking that statement as true.
It
> >is a shame that HTML has been so closely tied to "presentation" that it
is
> >popularly thought unable to fulfill a role in content managment systems.
>
> This would indeed be a shame.
>
> Namespaces ! I know Arjun hates them, but they're an easy way to
> build CMS around HTML.
>
> > One
> >of my hopes is that web authors will "discover" the semantic markup that
> >makes up HTML and begin using it rather than XML schemas quickly thought
out
> >in a few meetings in an office.
>
> Writing an XML schema is never (or hardly ever) a good idea. Do it in
> HTML, do it in DocBook, but don't go re-inventing wheels.

Agreed.

--
BenM
http://www.benmeadowcroft.com

Jim Dabell

unread,

Sep 4, 2002, 12:08:35 PM9/4/02

to

Andy Dingley wrote:

> On Sat, 31 Aug 2002 09:21:48 +0100, "Ben M" <cee....@virgin.net>
> wrote:
>
>
> Quoting the content management bible by Bob Boiko :
>
>>The manager faces the
>>difficult choice of either converting all of her content into a format
>>such as XML or storing it in a binary format where rendering format and
>>content are inseperable (such as HTML)."
>
> What's so wrong with this view ? Now we all (apart from Idiot Boy and
> his "tutorial") know that HTML was supposed to be presentation-free,
> but the sad thing is that it _isn't_. We live in a world of Netscape
> bolt-ons, inadequate CSS support and an inability to abandon, even
> now, the bodges we had to use from necessity a few years ago.

It's my opinion that in most cases, a separation is completely possible, as
long as you are willing to throw out styling for nn4/ie4 and below.

>>After nearly collapsing in a fit of laughter I then realised that a lot of
>>other people were reading this stuff and taking that statement as true. It
>>is a shame that HTML has been so closely tied to "presentation" that it is
>>popularly thought unable to fulfill a role in content managment systems.
>
> This would indeed be a shame.
>
> Namespaces ! I know Arjun hates them, but they're an easy way to
> build CMS around HTML.
>
>> One
>>of my hopes is that web authors will "discover" the semantic markup that
>>makes up HTML and begin using it rather than XML schemas quickly thought
>>out in a few meetings in an office.
>
> Writing an XML schema is never (or hardly ever) a good idea. Do it in
> HTML, do it in DocBook, but don't go re-inventing wheels.

I disagree. Obviously, yes, re-use is good, but often, using something that
is only slightly suitable leads to confusion (i.e. somebody could use
<book> instead of <website>, but then it wouldn't make sense to read the
markup). Even if something has a complete one-to-one mapping to your
domain, if it uses elements that are named counter-intuitively, it's better
to invent something that makes sense when authoring.

--
Jim Dabell

Andy Dingley

unread,

Sep 4, 2002, 8:04:07 PM9/4/02

to

On Wed, 04 Sep 2002 17:08:35 +0100, Jim Dabell
<jim-u...@jimdabell.com> wrote:

>> Writing an XML schema is never (or hardly ever) a good idea. Do it in
>> HTML, do it in DocBook, but don't go re-inventing wheels.
>
>I disagree. Obviously, yes, re-use is good, but often, using something that
>is only slightly suitable leads to confusion

It also leads to synergy. Given the number of people out there coding
like crazy and making useful stuff I can use for free, it's a
compelling argument.

If it lets me borrow RSS processing tools, I might even listen to Dave
Winer ocasionally, even though his whole approach is completely wrong
(0.92 vs. 1.0)

Isofarro

unread,

Sep 5, 2002, 2:37:15 AM9/5/02

to

Andy Dingley (din...@codesmiths.com) on Thursday 05 September 2002

00:04 in comp.infosystems.www.authoring.html wrote:

> If it lets me borrow RSS processing tools, I might even listen to Dave
> Winer ocasionally, even though his whole approach is completely wrong
> (0.92 vs. 1.0)

Can you elucidate, or point me to what his approach (approach on using
RSS I assume?) is and what's wrong with it?

Thanks.

Ben M

unread,

Sep 5, 2002, 7:35:16 PM9/5/02

to

"Isofarro" <spam...@spamdetector.co.uk> wrote in message
news:r2u6la...@sidious.isolani.co.uk...

> Andy Dingley (din...@codesmiths.com) on Thursday 05 September 2002
> 00:04 in comp.infosystems.www.authoring.html wrote:
>
> > If it lets me borrow RSS processing tools, I might even listen to Dave
> > Winer ocasionally, even though his whole approach is completely wrong
> > (0.92 vs. 1.0)
>
> Can you elucidate, or point me to what his approach (approach on using
> RSS I assume?) is and what's wrong with it?

An introductory tutorial on RSS which explains some of the differences
between the two versions can be found at
http://www.mnot.net/rss/tutorial/#Versions you may be able to find a DTD out
there that describes what a 0.92 or > RSS feed looks like but I haven't
found one. RSS 0.9x seems to be extended with new versions etc while
extension of RSS 1.0 is by modularization. RSS 1.0 is also an RDF (a w3
standard http://www.w3.org/RDF/) based application. Note the two
abbreviations appear to mean different things (Really Simple Syndication vs.
RDF Site Summary).

Disclaimer, I've only been looking into RSS for a short while so bear that
in mind. However my confusion over the matter (as a newcomer to the field)
is a product of the scattered documentation relating to the two
specifications.

--
BenM
http://www.benmeadowcroft.com

Andy Dingley

unread,

Sep 6, 2002, 7:54:53 AM9/6/02

to

On Thu, 05 Sep 2002 06:37:15 +0000, Isofarro
<spam...@spamdetector.co.uk> wrote:

>Can you elucidate, or point me to what his approach (approach on using
>RSS I assume?) is and what's wrong with it?

Ben has sensibly started a new thread on this "State of RSS"

Isofarro

unread,

Sep 6, 2002, 4:18:29 PM9/6/02

to

Ben M (cee....@virgin.net) on Thursday 05 September 2002 23:35 in
comp.infosystems.www.authoring.html wrote:

[RSS 0.92 and RSS 1.0]

> Note
> the two abbreviations appear to mean different things (Really Simple
> Syndication vs. RDF Site Summary).

Ahh, that answers the question neatly. I suppose Dave Winer
advocates RSS0.92 (Rich Site Summary).

Andy Dingley

unread,

Sep 6, 2002, 4:38:17 PM9/6/02

to

>> Can you elucidate, or point me to what his approach (approach on using
>> RSS I assume?) is and what's wrong with it?

RDF started out with Netscape, who did it with RDF (not perfectly, and
from an immature standard). Dave Winer took it up, and produced the
0.9* standards purely in XML, as he doesn't understand RDF.
http://www.scripting.com/netscapeDocs/RSS%200_91%20Spec,%20revision%203.html
http://backend.userland.com/stories/rss091
http://backend.userland.com/rss092

RSS 1.0 went back to RDF
http://groups.yahoo.com/group/rss-dev/files/specification.html#s5.3.4
(and other locations)

The whole RSS 0.9* / 1.0 thing is extremely acrimonious 8-(

Ben Hammersley is working on the O'Reilly book
http://rss.benhammersley.com/

And some other resources
http://blogspace.com/rss/resources

http://www.redland.opensource.ac.uk/rss

As it stands now, RSS 1.0 is by far the better standard and is finally
getting the tools support (a big thankyou to all concerned,
particularly The Ubiquitous Aaron). 0.92 feeds are far more common, as
it's easier to make them. Don't use anything other than 0.92 or 1.0

Most feeds aren't valid RSS 8-( Many aren't even reliably well-formed
XML (entity encoding is the usual problem)

Transcoding from 1.0 is easy (simple XSLT, examples all over the
place, including my own stab at it). transcoding from 0.92 to 1.0 is
equally easy, but suffers from the same limits as 0.92.

Rendering RSS to HTML is easy. There's a jumbo-sized bucket of XSLT
around to do it for you.

So what's the big diference between 0.92 and 1.0 ?

Simply put, it's extensibility.

RDF is (in some ways) a solution to those things that XML can't do.
One of these is the ability to support an extensible data model,
without needing to change XML schemas every few minutes.

RSS 0.92 is a very limited format. Everyone agrees this, they just
differ in how to fix it. It can't carry embedded content without
problems, it can't be extended beyond the "website summary" model
without trouble, and it has negligible metadata. Userland's solution
to this is to generate increasingly wild'n'wacky element types and
slap them into the DTD whenever they feel like it - RSS 0.93 is an
abomination of kludged on hacks that is starting to resemble DocBook
for ugliness.

In RSS 1.0, using RDF, and also using namespacing, allows easy
extensibility in a controlled manner.

Namespaces give modularisation. This is already a strong feature of
RSS 1.0. Many of the features that are being added in 0.92 -> 0.93 are
already in 1.0, by the use of optional modules. The great thing about
the modular approach is that you don't need to break the old schema to
get them, and they're easily ignored if your app doesn't support them.
A similar 0.92 app is likely to reject the whole document as being
invalid 0.92.

RDF gives easy extensibility within a module. Need sophisticated
metadata ? Slap on the appropriate module and stretch it as far as you
want with your own schema or ontology. A typical RSS 1.0 will then be
able to process this quite capably (for the metadata, content and
syndication modules are a commonly supported core set) and the extra
detail from the custom ontology will degrade gracefully. 0.92 will
flag it as "Unrecognisable random garbage" and may either accept or
reject the document, but it certainly won't process it.

For the last two years, I've had an RSS intranet system with extremely
specialised additional content markup. This is largely meaningless to
anyone other than the two parties concerned - but it's all valid RSS
1.0, so I can build parts of it from off-the-shelf 1.0 software
(couldn't do that with 0.92). Secondly, anyone can view this with 1.0
tools. They won't see the obscure technical details, but they will get
the "headers and descriptions" level view of it (couldn't have done
that with 0.92 either).

Additionally, the 0.92 specs are dire. They talk about embedding ASCII
in content elements, rather than CDATA, and many similar goofs. In a
protocol that's so widely used, this sort of vagueness will bite you
in the end.

Summary Advice:

Build all new apps with 1.0

Transcode for publication to 0.92 as well

Accept either format for aggregation

Treat 0.92 either minimally, or generously (with much effort over
error handling, but don't expect much)

Expect glaring errors from other people's feeds.

Isofarro

unread,

Sep 7, 2002, 4:41:11 AM9/7/02

to

Andy Dingley (din...@codesmiths.com) on Friday 06 September 2002 20:38
in comp.infosystems.www.authoring.html wrote:

> Dave Winer took it up, and produced the
> 0.9* standards purely in XML, as he doesn't understand RDF.

[snip]

> RDF is (in some ways) a solution to those things that XML can't do.
> One of these is the ability to support an extensible data model,
> without needing to change XML schemas every few minutes.

I'm unqualified and uninformed about RDF to raise even a decent
conversation at this point. Thanks for the info, I need to seriously
start reading up on RDF (I've always assumed it was just a flavour of
XML, and I'm more convinced now that I'm missing something).

Arjun Ray

unread,

Sep 7, 2002, 5:32:08 AM9/7/02

to

In <73ecla...@sidious.isolani.co.uk>, Isofarro
<spam...@spamdetector.co.uk> wrote:

| I need to seriously start reading up on RDF

Tough slogging ahead. The goal posts have moved about.

| (I've always assumed it was just a flavour of XML, and I'm more
| convinced now that I'm missing something).

Well, it started as how to add a bunch of tags that Netploder wouldn't
barf on. Then came the "we gotta put everything in pointy brackets"
phase, one notable result of which was the Namespaces disaster. RDF
moved on, though (much as even the clouds of Chernobyl dissipated
eventually). By the time it's fully baked, it will probably have
reinvented KIF and Prolog.

This was a jaundiced view. I'm still not impressed.

Andy Dingley

unread,

Sep 7, 2002, 7:11:07 AM9/7/02

to

On 7 Sep 2002 04:32:08 -0500, Arjun Ray <ar...@nmds.com.invalid> wrote:

>In <73ecla...@sidious.isolani.co.uk>, Isofarro
><spam...@spamdetector.co.uk> wrote:
>
>| I need to seriously start reading up on RDF
>
>Tough slogging ahead. The goal posts have moved about.

Not since Tuesday !

But you are _absolutely_ right. I couldn't track the RDF spec changes
when the guy writing them was only ten feet away.

>Well, it started as how to add a bunch of tags that Netploder wouldn't
>barf on. Then came the "we gotta put everything in pointy brackets"
>phase,

RDF suffered very badly from unreadable and confusing documentation in
the early stages. The current generation is a big improvement,
particularly Dan's striped <http://www.w3.org/2001/10/stripes/>

> one notable result of which was the Namespaces disaster.

We agree to differ over namespaces. You see their brokeness, I see
their usefulness.

> By the time it's fully baked, it will probably have
>reinvented KIF and Prolog.

No, that's going to be OWL's job

Some people think Topic Maps have a place in there too.

Andy Dingley

unread,

Sep 7, 2002, 7:11:14 AM9/7/02

to

On Fri, 06 Sep 2002 21:38:17 +0100, Andy Dingley
<din...@codesmiths.com> wrote:

>RDF started out with Netscape, who did it with RDF (not perfectly, and
>from an immature standard). Dave Winer took it up, and produced the
>0.9* standards purely in XML, as he doesn't understand RDF.

Idiot !

_RSS_ started out with Netscape, who did it with RDF (not perfectly,

Arjun Ray

unread,

Sep 7, 2002, 10:30:08 AM9/7/02

to

In <q0mjnugubs72omqfb...@4ax.com>, Andy Dingley

<din...@codesmiths.com> wrote:
| On 7 Sep 2002 04:32:08 -0500, Arjun Ray <ar...@nmds.com.invalid> wrote:

| RDF suffered very badly from unreadable and confusing documentation in
| the early stages.

Did it ever! The formal theory wasn't any such thing, and the socalled
XML taggery was brain damage run amok. Mercifully, the pictures were
intelligible.

| The current generation is a big improvement, particularly Dan's striped
| <http://www.w3.org/2001/10/stripes/>

With due respect to their erudition and experience, I really wish people
would look more deeply into how generalized markup works, in particular
the importance of instrumental names.

Yes, striping is an improvement of sorts. We are now even regaled with
"RDF interpretations" of random XML documents (provided no elements have
mixed content) along these lines. But I'm still not impressed.

If substantial mathematical properties of graphs were being brought to
bear, I might take the Graph Nature of RDF more seriously, but if it's
just drawing pictures of blobs and lines, then serializations thereof in
XML syntax are straightforward. Here is one example of the correct
approach:

http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.4.5.html

One persistent fundamental shortcoming of the XML serializations of RDF
that we've seen to date is the href disease: a fixation on URIs - okay,
URI references or whatever the term du jour is - occuring as values of
some (random) attribute (among many). This is all the more devastating
with RDF than other schemes (such as HTML) because in RDF URIs are first
class objects in the terms of discourse.

With any clue to how markup works, you simply do not put "ground terms"
in attributes - unless the attribute is on an element with a locative
semantic (e.g. a local proxy to serve as a referential target for some
external entity, object or even concept.) The reluctance to use the
ID/IDREF mechanism that not only comes for free in the formalism but was
actually designed, ferchrissake, for representational needs like this,
is, to my mind, downright irrational.

http://lists.xml.org/archives/xml-dev/200002/msg00458.html
http://lists.xml.org/archives/xml-dev/200208/msg01614.html

Maybe striping meets the Sufficient Ingenuity criterion of how to be
modern and hightech. Sigh.

| We agree to differ over namespaces. You see their brokeness, I see
| their usefulness.

About as useful as keels or wings on cars.

|> By the time it's fully baked, it will probably have reinvented KIF
|> and Prolog.
|
| No, that's going to be OWL's job

Ah, more TLAs. What fun.

--
For every complex problem, there is a solution that is simple, neat,
and wrong. -- H. L. Mencken

Jim Dabell

unread,

Sep 7, 2002, 12:04:36 PM9/7/02

to

Andy Dingley wrote:
[snip]

> As it stands now, RSS 1.0 is by far the better standard and is finally
> getting the tools support (a big thankyou to all concerned,
> particularly The Ubiquitous Aaron). 0.92 feeds are far more common, as
> it's easier to make them. Don't use anything other than 0.92 or 1.0

[snip]

Any comments on RSS 3.0? ;)

<URL:http://www.aaronsw.com/weblog/000574>

--
Jim Dabell

Andy Dingley

unread,

Sep 7, 2002, 3:26:42 PM9/7/02

to

On Sat, 07 Sep 2002 16:04:36 +0000, Jim Dabell
<jim-u...@jimdabell.com> wrote:

>Any comments on RSS 3.0? ;)

I hoped no-one had noticed those !

I told you the 0.92 / 1.0 thing got pretty acrimonious. 2.0, 3.0 et al
were just people getting a bit silly. Read back in the rss-dev list
for the whole sorry tale.

RSS aleph is generally agreed to be the ultimate version 8-)

Andy Dingley

unread,

Sep 7, 2002, 3:40:51 PM9/7/02

to

On Sat, 07 Sep 2002 16:04:36 +0000, Jim Dabell
<jim-u...@jimdabell.com> wrote:

>Any comments on RSS 3.0? ;)
>
><URL:http://www.aaronsw.com/weblog/000574>

I've just checked the date stamps. 8-(

I thought you were referring to last year's light-hearted RSS 2.0, 3.0
proposals -- but it seems like the stupid oaf is _serious_ with this
2.0 business.

Dave Winer is an _idiot_ 8-(