> Zef - welcome to the group.
Hi Zef!
While any ideas on improving DataPortability are obviously welcome,
I'd better let you know my position on FSs and the Web:
* HTTP (following the architecture as described in Fielding's thesis
[1] and WebArch [2]) provides a means of using the Web for file
storage
* Because the Web is essentially a graph, a graph-shaped structure is
appropriate for Web data [3]. (Hierarchies are possible through
selective views of graph data).
* The success of the Web is due in no small part to the ability for
developers to work independently yet have interop (mostly) guaranteed
by the base spec - critical here is the use of URIs as identifiers.
* There are specifications that respect these points and allow
arbitrary data to be stored *in* the Web - notably RDF. It's a simple
evolution of Web linking [4, 5]. A bonus is there's a query language
and lots of tools.
I personally believe that right now the use of Semantic Web
technologies is not only the best approach to data portability on the
Web, but also the easiest we have right now because of existing specs
and tools.
Cheers,
Danny.
[1] http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
[2] http://www.w3.org/TR/webarch/
[3] http://dig.csail.mit.edu/breadcrumbs/node/215
[4] http://en.wikipedia.org/wiki/Linked_Data
[5] http://dsonline.computer.org/portal/pages/dsonline/2007/06/w3web.html
Thanks for the hint - didn't take long to find a related post:
http://www.zefhemel.com/archives/2007/05/21/webfs-the-case-for-rdf
In which I found a key point:
[[
Every item on WebFS will have its own, unique URI.
]]
Yay!
I also agree, as when I nicknamed my draft WRFS, the
> R stood for Relational, which needs database-like capabilities. It
> also needs to be able to relate things via inferencing, and I think
> thats where RDF fits in, but under the hood. Most programmers dont
> ever need to see relational algebra, but they should be able to enjoy
> the benefits of inferencing.
"Under the hood" is good :-)
Incidentally, while certain inferency things are nice to have
available (like inverse functional properties, which can provide
identification when a URI isn't available), a good proportion of folks
seem to treat RDF more like a (seriously normalised) relational DB -
properties are relations - with SPARQL filling the role of SQL in
traditional RDBMS.
> Regardless, welcome to the group Zef, its good to have you.
Indeed!
Cheers,
Danny.
--
As has been pointed out, I am very aware of RDF and SPARQL, I've used
it extensively in my M.Sc. project (I built an extension of SPARQL for
it to allow to semantically query services). For the rest I also
completely agree with your opinion on FSs and the web, so I think it
will be a good fit. :) As a side note, in an early version of WebFS I
had the containers/folders/collections/directories implemented as RDF
documents. However, now that there is Atom and in particular, AtomPub,
which provides us with mostly everything we need, I think Atom would
be a better fit, maybe extended with embedded RDF, but I'm not sure
whether that much can be won from that to be honest. But I'd be happy
to hear your opinion on that.
I do see that reasoning with semantic data is a good idea, but I think
RDF does not have to be the encoding of choice. Atom, in essence, is a
subset of RDF (you can also decompose it into [URI, property, value]
triples) but a more widely known and accepted format in the "web 2.0"
world we live in. It wouldn't be that hard to be able to use reasoners
and SPARQL on Atom feeds once you've decomposed them to triples.
Thanks for the warm welcome, I'm looking forward to working with you all.
Best,
Zef
--
Zef Hemel
E-Mail: z...@zefhemel.com
Phone: (+31) (0)6 156 19 280
Web: http://www.zefhemel.com
For the reasons you give, Atom/AtomPub seems a good approach to WebFS.
I do believe it would be a (Semantic) Web-friendly approach (I argued
this over here:
http://dannyayers.com/docs/ieee/w2 ).
I'll ask around to see how people have been dealing with the limited
vocabulary available in Atom for metadata. There are Simple & Complex
extensions, and you can always use RDF as payload, but I'm not sure
which approach (if any) is emerging as a preferred convention.
Atom does seem fairly optimal for Web documents, it's limitation is
really treating the documents as opaque blobs (with metadata). In the
context of HTML docs, it'd be really desirable for the system to see
what other documents they link to. If e.g. your image file contains
EXIF metadata, you really want it available. This isn't such an issue
for a Web file system, though would be for a granular Web database in
which you'd want to transparently access any data within the docs (the
Web of Data/Giant Global Graph). If that kind of access is available
at relatively low cost, it'd be a shame to miss out.
'Course there are ways around this, using AtomPub as a facade in
front of a triplestore would be one (something I've been planning for
the Talis Platform, AtomPub is good to have), with GRDDL [1] used to
snag any data embedded in the docs. Simply incorporating the
appropriate GRDDL linkage into exposed docs would allow an external
GRDDL-aware agent to get the full picture. (Coincidentally the GRDDL
WG is looking at the Atom mapping right now - we should have a
transformation available in the next few weeks - any input on that,
requirements/use cases or whatever would be appreciated).
With the tooling around now, SPARQL effectively comes for free with
RDF. Text-based search tends to be relatively straightforward thanks
to kit like Lucene (there's even LARQ [2], Lucene+SPARQL).
But whatever the icing on the cake, or whatever the implementation
looks like, I'd strongly recommend any Web system takes advantage of
the network effect by following linked data principles [3].
Atom/AtomPub can help a lot there.
Cheers,
Danny.
[1] http://www.w3.org/TR/grddl-primer/
[2] http://jena.sourceforge.net/ARQ/lucene-arq.html
[3] http://en.wikipedia.org/wiki/Linked_Data
--