[pedantic-web] fragment URIs

Nathan

unread,

May 20, 2010, 2:14:13 PM5/20/10

to pedant...@googlegroups.com

Can somebody give me a solid reason why I (or we) shouldn't use fragment
URIs for essentially everything that can't be saved on a filesystem, or
point to an issue that would be caused by using them.

ps: really don't want to get in to slash vs hash, just want to know if
there's a reason *not* to use fragments for what's termed 'non
information resources'.

Best,

Nathan

Ian Davis

unread,

May 20, 2010, 3:06:52 PM5/20/10

to pedant...@googlegroups.com

On Thursday, May 20, 2010, Nathan <nat...@webr3.org> wrote:
> Can somebody give me a solid reason why I (or we) shouldn't use fragment URIs for essentially everything that can't be saved on a filesystem, or point to an issue that would be caused by using them.
>

Fragment identifier interpretation belongs to the client and is
dependant on the media type of the response. See
http://blog.iandavis.com/2007/11/fragmentation for more of my thoughts
on that.

Also, in the world of web applications the fragment is now being used
for storage of client side state, to be read by javascript on the
client (eg gmail, google code etc). Its original use as a simple
identifier of a part of an html document is being superceded.

> ps: really don't want to get in to slash vs hash, just want to know if there's a reason *not* to use fragments for what's termed 'non information resources'.

Why not use for information resources too? Maybe that is nonsense but
it forces you to know that before creating an identifier. A slash URI
lets you defer getting that knowledge until you start serving
representations.

>
> Best,
>
> Nathan
>

ian

Nathan

unread,

May 21, 2010, 6:39:17 AM5/21/10

to pedant...@googlegroups.com

Ian Davis wrote:
> On Thursday, May 20, 2010, Nathan <nat...@webr3.org> wrote:
>> Can somebody give me a solid reason why I (or we) shouldn't use fragment URIs for essentially everything that can't be saved on a filesystem, or point to an issue that would be caused by using them.
>>
>
> Fragment identifier interpretation belongs to the client and is
> dependant on the media type of the response. See
> http://blog.iandavis.com/2007/11/fragmentation for more of my thoughts
> on that.
>
> Also, in the world of web applications the fragment is now being used
> for storage of client side state, to be read by javascript on the
> client (eg gmail, google code etc). Its original use as a simple
> identifier of a part of an html document is being superceded.

As if to justify the thoughts on your blog post, your final comment on
the matter was "I wonder what URI we ought to use to refer to issue 81
on the RFC2616bis work?
http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i81 is the URI
of the HTML fragment describing it. According to the webarch I canï¿½t use
that URI to also denote the issue itself."

Before wading in to this one, I thought I'd seek the advise of those who
are more experienced in the colliding worlds of HTML Documents and RDF
Concepts under the banner of RDFa - one of the most clarifying responses
I received was as follows:

<quote>
Sometimes people ask: "Can you have an @id and @about have the same
value?" and the answer to that is yes, you can do this:

<div id="me" about="#me">...</div>

So, when that URL is dereferenced, the HTML document displays what is,
hopefully, also machine readable. However, they do NOT point to the same
/thing/. One is a resource (@about), and one is an anchor in the
representation of the resource (@id)... it just so happens that they
have the same URL.

The meaning of that URL is different based on whether you ask what the
URL identifies in the context of the semantic web, or you ask what that
URL identifies in the context of the document-based web.

URLs are ambiguous without a solid context. The requested and returned
MIMEType gives context, HTTP Headers give context, the configuration of
the User Agent gives context, time gives context... there are many
things that give context to a URL.
</end-quote>

For me this brought clarity to the issue, with HTML the code we write
and see is parsed in to a DOM and rendered as visual boxes of content,
which may or may not have identifiers, and just as an @id (or previously
@name) allowed a client to navigate between parts of the rendered DOM,
so the javascript uses of @id today provide a reference to parts of the
DOM which we can use to store state and suchlike as you note. But this
is *very* context specific.

To illustrate I can easily write a simple javascript that goes through
every element in the DOM and gives it a random @id on the fly, and
changes them every 2 seconds, is that hundreds of new URIs being minted
randomly? To me it is clear that the answer is no - it's simply a
script, working on a temporary short lived DOM within a client on a
computer somewhere, here for a moment and gone the next, equating to the
pointer of an object in memory when a program is running.

In the RDF world URIs clearly point to Concepts, regardless of whether a
description of that conceptual mapping can be dereferenced, and as we
know the concept pointed to, in all cases, is not serialization
specific. We can describe the same thing, with the same URI, in n3,
rdf/xml or many other serializations. RDFa is just another
serialization, although this time the serialization specific syntax of
the RDF is merged in with the serialization specific syntax of an HTML DOM.

To illustrate, let's say http://example.org/jim#me identifies a Person
"jim", and dereferences to a .n3 serialization of RDF describing "jim".
If I dereference that URI with an RDF Parser like ARC, then does
http://example.org/jim#me suddenly identify the temporary in memory
'triple'? in one context it does. If I then run a little script which
swaps the value in memory to http://example.org/frank has it just minted
another URI, re-identified jim, turned jim in to frank, changed the
conceptual mapping? Again, it is clear to me the answer is no.

Returning to your original question on that post:

"I wonder what URI we ought to use to refer to issue 81 on the
RFC2616bis work?
http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i81 is the URI
of the HTML fragment describing it. According to the webarch I canï¿½t use
that URI to also denote the issue itself."

Your question conflates and confuses many, removes context and does
nobody any good.

As far as we know a URI for that Issue 81 has not been minted

a URI has however been minted for an HTML Document which you'll find if
you dereference http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/

If you dereference the URI you question
http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i81 with a
modern browser, then it'll GET the minted URI for the HTML document
http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/ - hopefully the
browser will create a DOM in memory, render that DOM on screen, give a
specific element a node @id of 'i81' if it's contained by the HTML
syntax in the document, and then adjusted the positioning of the
rendered DOM to position that node on screen where you can see it - but
only in this very specific context, on a client which implements the
aforementioned functionality.

If you dereference
http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i81 with CURL,
then to what does the URI refer?

http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i81 may be the
identifier of some Concept, maybe even that 'issue 81', but without a
description we simply do not know, and thus as far as we are concerned
it identifies nothing, or at the least 'something unknown to us'.

By removing the context you create a grey area, something which leads to
massive amounts of time being wasted, causes confusion and delay, risks
creating splits in a fragile and very important community, and far more
importantly it slows the path adoption and puts off many would be
adopters - shameful.

Regards,

Nathan

Nathan

unread,

May 21, 2010, 7:26:56 AM5/21/10

to pedant...@googlegroups.com, Ian Davis

Nathan wrote:
[snip]

> By removing the context you create a grey area, something which leads to
> massive amounts of time being wasted, causes confusion and delay, risks
> creating splits in a fragile and very important community, and far more
> importantly it slows the path adoption and puts off many would be
> adopters - shameful.

Ian,

Please do accept my apologies, that last word on their was uncalled for
and I shouldn't have said it.

Please disregard the term shameful and take the rest of the content for
what it's worth, and continue the debate if you will.

It is a very important matter and regardless of previous approaches, it
would be ideal moving forwards to clear up these fragment and
non-fragment uri issues for soon to be published data.

The web of linked data is all ready huge, but that's but a drop in the
ocean compared to what's coming.

Once again apologies, I had no right to speak out of turn and FWIW I
don't think you are shameful, and all too aware that opinions change and
that post is over 2 years old.

Best,

Nathan

Ian Davis

unread,

May 21, 2010, 8:03:22 AM5/21/10

to nat...@webr3.org, pedant...@googlegroups.com

On Fri, May 21, 2010 at 12:26 PM, Nathan <nat...@webr3.org> wrote:
> Ian,
>
> Please do accept my apologies, that last word on their was uncalled for and
> I shouldn't have said it.
>

Thanks Nathan. I think it is healthy for a community to support
differing opinions and to question itself often, without being
disruptive or offputting to newcomers.

Ian

Nathan

unread,

May 21, 2010, 9:35:38 AM5/21/10

to pedant...@googlegroups.com

Ian Davis wrote:
> On Thursday, May 20, 2010, Nathan <nat...@webr3.org> wrote:
>> Can somebody give me a solid reason why I (or we) shouldn't use fragment URIs for essentially everything that can't be saved on a filesystem, or point to an issue that would be caused by using them.
>>
>
> Fragment identifier interpretation belongs to the client and is
> dependant on the media type of the response. See
> http://blog.iandavis.com/2007/11/fragmentation for more of my thoughts
> on that.

I'd very much like to fork the discussion here, because the other reply
deals with RDFa related issues specifically.

I think we all agree that it's too far in to change the deployment of
what's there now with non-fragment URIs (rss1, foaf, dc, dbpedia etc)
and do think http-range-14 addresses it well.

However, I also (rather strongly) feel that it needs re-addressed from a
standpoint of deploying new data - where http-range-14 would stay as a
backwards compatibility resolution for existing datasets and ontologies.

From where I'm standing there still appears to be open issues, and
people at odds over what's best, the existing documentation doesn't seem
to cover current understanding / opinion and frankly I'm keen for others
not to have to go through the levels of confusion and learning which
I've had to in order to adopt Linked Data / RDF, not many would be so
patient or have tried so hard to get the understanding; I'm very aware
that often unbeknown to us there are people getting put off by the mixed
messages and rather tricky path to entry.

In an ideal world what I'd love to see happen is:

Negate discussion of existing ontologies and datasets (unless from an
analysis standpoint with regards performance, conceptual clarity,
teachability, implementation cost, ease of deployment, maintenance costs
etc)

1 - take a vote or somehow gather general opinion on whether it's worth
trying to come to a new (and ~final) community agreement as to best
practise for new datasets and ontologies

Then, dependant on the outcome of 1:

2 - put together a concise list of the benefit(s) and drawback(s) of
each, and figure out whether only using one of the two options would be
viable and cover all use-cases.

Then, if a single option is viable (for new datasets and ontologies):

3 - Draft up a note stating what's been concluded and the reasons why,
and probably take it to TAG for clearance / approval / feedback

4 - Dependant on outcome of all the above, redo the existing best
practise and publishing linked data guides to reflect what's been agreed.

In the interests of trying to do this as quickly and painlessly as
possible it may be an idea to limit step 1 to pedantic web with a quick
+1, then if in agreement do the ground work for 2 in a couple of days on
the list, then take it to public-lod/semweb to cover anything missed.

I'll say again though, it would have to be done rather quickly in order
to gather what's needed without bike-shedding and bickering, and also
because Linked Data is moving so quickly that better to address now than
at the 500 billion triple (which given the exponential growth of linked
data publishing & adoption, could be much sooner than expected). Also,
the numerous related spec's, rec's and working groups that are in
progress would surely benefit greatly from a speedy decision.

I'd also like to make it clear that I do feel very out of scope / rank
saying the above - but kind of counting on my relative 'newbie-ness' to
get away with it and discount the historic reasons others may consider
when thinking about bringing this up :-)

Best & I'll leave it with you all,

Nathan

Stéphane Corlosquet

unread,

May 21, 2010, 10:52:47 AM5/21/10

to pedant...@googlegroups.com

Nathan,

On Fri, May 21, 2010 at 6:39 AM, Nathan <nat...@webr3.org> wrote:

Before wading in to this one, I thought I'd seek the advise of those who are more experienced in the colliding worlds of HTML Documents and RDF Concepts under the banner of RDFa - one of the most clarifying responses I received was as follows:

<quote>
Sometimes people ask: "Can you have an @id and @about have the same
value?" and the answer to that is yes, you can do this:

<div id="me" about="#me">...</div>

You might earlier threads from the RDFa mailing list interesting such as [1] as well as an older response from Richard [2].

Note that the purpose of the pedantic-web mailing list is to fix concrete issues with published RDF data on the web, and more conceptual and high level discussions should rather happen on more general mailing lists such as publi...@w3.org or semant...@w3.org which have a much larger poll of members to discuss these matters. (and most of the pedants are also subscribed to these). A good rule of thumb for posting on pedantic-web is to have a URL to a broken dataset.

all the best,

Steph.

[1] http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2010Feb/0002.html

[2] http://lists.w3.org/Archives/Public/semantic-web/2007Dec/0157.html

Richard Cyganiak

unread,

May 25, 2010, 4:40:01 AM5/25/10

to pedant...@googlegroups.com

Hi Nathan,

If only things were that easy!

On 21 May 2010, at 14:35, Nathan wrote:
> In an ideal world what I'd love to see happen is:
>

> 1 - take a vote or somehow gather general opinion on whether it's
> worth trying to come to a new (and ~final) community agreement as to
> best practise for new datasets and ontologies
>
> Then, dependant on the outcome of 1:
>
> 2 - put together a concise list of the benefit(s) and drawback(s) of
> each, and figure out whether only using one of the two options would
> be viable and cover all use-cases.
>
> Then, if a single option is viable (for new datasets and ontologies):
>
> 3 - Draft up a note stating what's been concluded and the reasons
> why, and probably take it to TAG for clearance / approval / feedback
>
> 4 - Dependant on outcome of all the above, redo the existing best
> practise and publishing linked data guides to reflect what's been
> agreed.
>
> In the interests of trying to do this as quickly and painlessly as
> possible it may be an idea to limit step 1 to pedantic web with a
> quick +1, then if in agreement do the ground work for 2 in a couple
> of days on the list, then take it to public-lod/semweb to cover
> anything missed.

In theory that's a great plan, but in practice...

First, you assume that talking to the community achieves anything. In
general it doesn't. You might convince one or two people that your new
standard approach of publishing datasets is the Right Thing after a
few lengthy and time-consuming permathreads, but most just won't care,
and the discussion approach simply doesn't scale. The approach that
works is making sure that people who face a serious problem that
prevents them from getting stuff done come across your proposal when
they look for a solution.

Second, you assume that the community could do a concerted action to
update all the different documents and guides to reflect a single
view. To make that happen would be a full-time job and you'd probably
need bribes or a team of armed goons to motivate people into going
back to update those old documents they wrote. Never underestimate the
inertia of a large group of people.

Third, you assume that the community is actually eager to forge a new
consensus around a best method of publishing datasets. But positions
have been entrenched for years, people are generally sick of the
debate, pretty much every argument that can be made has been made, and
continued discussion is unlikely to reveal anything new or move the
scales. Also, in practice, the status quo may be a confusing mess, but
that confusing mess is still good enough to allow us to deploy stuff
and move on to the many remaining tough nuts, including dataset
synchronization, read-write, access control, etc etc. Realistically,
the only way to get rid of the mess is by making the mess obsolete.

Fourth, you assume that pedantic-web is a good place for organizing
this whole affair. But pedantic-web is a group dedicated to fixing
deployed data. It's not a group dedicated to web architecture. There
is some overlap in interest of course. But I'd rather see more
pedantic-web traffic about judging data against the specs and
recommendations, rather than traffic that judges the specs and
recommendations, because the latter is all opinion with little
grounding.

So, if quickly forging a consensus in the community is unlikely to
succeed, then what else can be done?

1. Lead by example. Publish your data the right way and talk about it.

2. Find a scalable way of teaching people who are new to this stuff.
If it's better than the material currently out there, then it will
make a difference.

3. In the rare case where it may be worth the effort, convince
individual people to do the Right Thing.

4. If you want to pick a more challenging task, then identify a
particular document, dataset, tutorial or whatever, and work with its
publishers to fix it. Expect this to be hard. You will hear a lot of:
"Well it seems to be working, so why change it?"

I'm happy to discuss this further with you or anyone else who cares
about this stuff, but let's take it off-list.

All the best,
Richard

Reply all

Reply to author

Forward