The Structured Web

37 views
Skip to first unread message

Sean B. Palmer

unread,
Nov 1, 2010, 8:36:32 AM11/1/10
to Gallimaufry of Whits
In place of a Semantic Web, I imagine a structured web. The word
"Semantic" attracted a lot of philosophers, description logic
engineers, relational table proselytes, and many other folk from the
1970s. These became my friends, and I love 'em all, despite not being
up to the calibre of their training. In contrast to the Semantic set,
the latté drinking, blackberry flicking Web 2.0 set went off and
created that fluffy little nest of vipers known as microformats.

The main idea of the Semantic Web was to extend the web to be a
substratum of nodes which are not just web pages, and arcs which are
not just hyperlinks, and then build new applications on this graph
topology instead of inside specific nodes. But as you'd expect from
trying to apply a host of 1970s techniques to a bustling new
environment where new emergent techniques and models are spewing forth
practically every day, things didn't go so well in many respects.

One of the biggest problems was getting people to accept the
extensions which were most imcompatible with the colloquial view of
the existing system. And one of the biggest of this particular species
of problem was identity. People are used to the web being a bunch of
web pages. When you make this rather large conceptual change to the
topography, you start to stoke some very large question marks in
people's heads.

Tim had designed the HTTP bit of the web, the hypersphere, to contain
a bunch of web pages. So he said that we have to use either non-HTTP
links, or HTTP links with hashes in them, to link to people and places
and the like. But this created the mother of all arguments, one which
rolled on for over five years, where people argued that the
hypersphere could contain things which aren't web pages. Why not?

Tim argued that the web wasn't designed that way. He made the damn
thing, he should know how it works. But the others said that okay, you
made it this way, but now we're changing it. These were heady days,
with a lot of argument by assertion on either side. The stalemate was
broken by a strange architectural compromise. When a link returns a
303 See Other code, then it can be regarded as something which isn't a
web page.

This was an interesting piece of invention which was intended to be
within the constraints of the existing web system. It was meant to be
backwards compatible, but a usable enough extension to shut people up.
I suggested that we could even use an HTTP header rather than a
response code, but Tim pointed out that this may be a step too far.

Now, let's set the scene. You explain the Semantic Web to someone, and
at some point you have to tell them how to host a load of people on
their web server. All they have to do is change their response codes
to 303 See Other. Simples!

Oh the joy, the rambunctious joy when you see the moment their
comprehension simply implodes in on itself at the very notion.
Needless to say, though the 303 compromise was a success amongst
technical circles, it missed the point that as far as people are
concerned, a link is something that you put into a browser and it
gives you this page. There are billions of uses of links in this way.
That's a heck of a wide implementation base. On the other hand, are
there any other kinds of uses of links at all? Links used in such a
way that the fact they bring up a web page when you put them in your
browser is irrelevant?

As it turns out, this had been the case in XML circles for a while, in
the guise of XML namespaces. This was the first time when someone
decided that links could be used in a strange way. The original
archives of how they decided this, the xml-sig archives, are secret
but they make for hilarious reading. The kinds of arguments that were
carried into the Semantic Web world were played over and over in this
small sandbox environment between specialists with an unusual kind of
vehemence.

As usual, what was decided by committee should have been decided by
rough consensus and running code. When there was an HTML design
decision, the decision was taken by browser manufacturers. This is
still the case: the WTF WG said that they won't allow @href on every
element, as had been proposed, because browser manufacturers just
won't implement it. They're powerless, ultimately, against the wants,
and sometimes the wonts, of the browser manufacturers. They're just
lucky that for the most part, the wants of the architects and the
wants of the implementers overlap. If you're an optimist, at least.

Compare this to the Semantic Web. There were no proper Semantic Web
implementations, along the lines of the original design, until
Tabulator. In fact, if you disregard the much earlier Sailor by Seth
Russell, and my own Arcs a little later, then Tabulator is still the
only proper Semantic implementation out there. If that had been around
from the start, and had been popular, then the mess that we got out of
all the development wouldn't have happened.

But the premature optimisation screwed it up, and sent it sprawling
into a different direction: a direction of ontologies, of huge
distributed database systems, of wikidb, of tools which take five
hours to compute what times your friends are available to play pool.
Some of this is kind of cool, but it's nothing compared to how cool
Tim's original vision was, the dregs of which can still be seen in
Tabulator.

To change all these things now seems infeasible, and it may be so. But
build and they will come. Useful code will be used. If we take Tim's
original vision as still essentially sound, then we would only have to
start out again and avoid the problems and mistakes of the first
system. Prepare to throw one away, as they say!

The two biggest design problems, i.e. ignoring the architecture
wankery and subsequent premature optimisation, were identity and
topology. The identity has been covered above. The topology problem is
mercifully simple. You can't throw out the entire topology idea, but
you can ditch much of it.

Without designing a new system, the guidelines for a new system would
be simplicity, flexibility, a low cost to high use ratio, basically
all the things that the RDF model is not, and that the JSON model is.
To be clear, the JSON model is not a web model. The JSON model
wouldn't be the structured web, it would be the unstructured web at
best. It's just a particular kind of canvas. You might need to make a
preliminary sketch on it in pencil to make it useful, or you might
need to use a different basis altogether.

What is most important is that such a system is cool. There should be
mashups and excitement, people using what they create every day. It
should be what I imagine Tabulator could have been if it came before
CWM and not after. This should be a world of Ambient Information, as
William Loughborough called it, one of copious convenience and ample
accessibility. Perhaps it could be sketched out on a wiki, a bit like
the old Atom wiki, if anyone were so interested.

Reply all
Reply to author
Forward
0 new messages