Scoping (was Re: what's missing?)

2 views
Skip to first unread message

Danny Ayers

unread,
Nov 30, 2007, 5:10:47 AM11/30/07
to datapor...@googlegroups.com
On 30/11/2007, Chris Saad <chris...@gmail.com> wrote:
>
> Ok I have added XMPP to the page - does anyone know key players in
> that community to include in this DP group?

(+1 to stpeter)

I've absolutely no intention of kicking up a fuss on this, but I feel
compelled to make an observation about the technologies that are
included in the 'stack'.

Ok, "Data Portability". There's plenty here on Portability side -
various formats that can be passed around using (mostly) doc-oriented
protocols. To be able to convey data across barriers between systems,
authentication and identity are needed.

But what of Data? Take a look at the traditional systems for managing
data, databases. They tend to have two key features: a data model and
a query language. How are these represented in the stack?

Ok, there are data models implicit in the formats - hCard, xfn, apml,
rss, opml basically allow descriptions (in their own peculiar fashion)
of people and their relationships and bloglike content. Great! Until
someone wants to describe their cat, Stratocaster, the projects
they're working on, their annual income and population of
Luxembourg...

...and even if the information desired *can* be represented in these
formats, how does one go about using it in a consistent fashion?
Given, say, the hCard, xfn, apml, rss, and opml files of everyone
working in an organisation, how would one go about getting information
out of it - say I want to see last months blog posts from all my
friends in the organisation who has an interest in seafood?

Ok, ok. I realise a lot of the motivation behind dataportability is
that existing systems - typically the social networks systems - have
this kind of data and don't handle it in a portable fashion. There is
a problem here which needs to be addressed by initiatives like
dataportability. The choice of specific (comparatively) standard, open
formats etc with a little buzz in the community seems reasonable in
this context.

But I can't help thinking that there's a blurring of scope here - in
one direction, fixing the non-portability in these existing systems,
in the other direction solving the general problem of data
portability.

The latter is obviously a legitimate concern, and big-picture-wise
encompasses the immediate issues. But personally, to solve the general
problem I'd start by looking at how to solve the problem of modelling,
integrating (and querying) data in general using the Web to best
advantage. Well, I don't have to, because those aspects are being
covered by Semantic Web technologies (primarily RDF & SPARQL). What's
more the data conveyed in hCard, xfn, apml, rss, opml etc can be
integrated (and queried) this way, and the exact same tools can be
used to talk about Stratocasters and the population of Luxembourg.

For sure a lot of work is needed around authentication and integration
of data through the generic protocols (HTTP, XMPP...), but I'm
continually surprised to find how many of the /other/ problems around
data portability have simple solutions when you bring Semantic Web
technologies into one's stack.

The reason I've absolutely no intention of kicking up a fuss on this
is that although I'm not convinced approaching data through a specific
set of arbitrary domains/formats is the best route, it's still
pointing towards a more general Web of Data (and it's generally
straightforward to work with formats like these using Semantic Web
tools).

Cheers,
Danny.

--

http://dannyayers.com

Josh Patterson

unread,
Nov 30, 2007, 2:28:43 PM11/30/07
to DataPortability
Danny,
I agree completely. Standards are great, but just like with
traditional networks, databases and filesystems, its all about the
model and the data --- and was the reason we stepped back and said "we
need to model this", and wrote up the WRFS sketch. You fit technology
into functions in the model, and I think RDF has a big role in
defining the data and its relations.

Josh

On Nov 30, 5:10 am, "Danny Ayers" <danny.ay...@gmail.com> wrote:

Josh Patterson

unread,
Nov 30, 2007, 2:46:35 PM11/30/07
to DataPortability
Danny,
On a further thought, I'm betting we end up with a query language that
treats the internet as a logical database / filesystem. Something
where you would write:

Select [url], [permissions] FROM [namespace].[Images] WHERE [openID] =
day...@rdf.org AND [Created On] > '12/1/2006';

Which would then be parsed into a query tree, and executed via a
"technology agnostic" stack model (think TCP/IP stack, slide in
layers), gather the records from flickr, zooomr, etc, and then return
a { XML, RDF, Hashtable, ? } recordset to the application layer to be
dealt with. Then you could use The Graph as a logical whole, just like
a database. I'd imagine it would function well as a universal drive
for mobile devices, kiosks, web apps, desktop apps, etc. I know Sparql
and a few others currently exist in the RDF arena, but I'd say we'll
end up with a new one that has its "tables" and "fields" backed up by
a RDF dialect with say [namespace].[images] resolving to some URI.

Josh

Danny Ayers

unread,
Nov 30, 2007, 3:05:50 PM11/30/07
to datapor...@googlegroups.com
On 30/11/2007, Josh Patterson <jpatt...@floe.tv> wrote:
>
> Danny,
> On a further thought, I'm betting we end up with a query language that
> treats the internet as a logical database / filesystem. Something
> where you would write:
>
> Select [url], [permissions] FROM [namespace].[Images] WHERE [openID] =
> day...@rdf.org AND [Created On] > '12/1/2006';

Have you seen SPARQL?

http://www.w3.org/TR/rdf-sparql-query/

With GRDDL in place (and/or appropriate heuristics) most web pages can
be interpreted consistently.

http://www.w3.org/TR/grddl-primer/

There are tools around for exposing other data sources as virtual RDF
graphs, e.g.

http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2rq/

It's already possible to query the whole web this way (if you're patient :-)

http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/semwebclient/

Josh Patterson

unread,
Nov 30, 2007, 3:32:07 PM11/30/07
to DataPortability
Danny,
Yeah, I've got a few semweb books, I've had some exposure to some of
it. I think the major thing is creating+refining the model, and then
working these parts into a "turnkey" stack and demonstrate to the
average startup "this IS possible, this how to do it, plug this
package into you code, and you are On The Graph". Is sparql perfect
for this? maybe, but at the same time, we all seem to have pieces of
the puzzle, but we dont quite know if/how they fit together to make
the complete "open web graph" that we want. Me and Josh Lewis (floe.tv
guys) got together with Paul, Ashley, and Chris (farraday guys) and we
said "lets throw together a toy model, make it work in a sandbox, and
then start plugging in various tech (rdf, sioc, sparql, xfn, foaf) and
see what works well for a turnkey solution (something that can be
immediately adopted by startups at minimal time/effort cost). We've
written up an outline on the wiki http://dataportability.pbwiki.com/WRFS-Prototype-Workspace
, with some beginnings of boilerplate code, if you'd like to jump in
there with us, we'd love to see "sparql/best tech for the job" running
in the stack.

Josh

On Nov 30, 3:05 pm, "Danny Ayers" <danny.ay...@gmail.com> wrote:

Danny Ayers

unread,
Dec 1, 2007, 2:49:30 PM12/1/07
to datapor...@googlegroups.com
On 30/11/2007, Josh Patterson <jpatt...@floe.tv> wrote:
>
> Danny,
> Yeah, I've got a few semweb books, I've had some exposure to some of
> it.

It really has accelerated in the last couple of years, you might be
surprised. A while ago I tried linkblogging developments, then I had
to give up because it was too time consuming hunting for stuff. I
recently restarted ("This Week's Semantic Web" [1]), and the problem
now is there's simply way too much new material each week to keep
track of.

> I think the major thing is creating+refining the model,

Creating I'd say mostly done in the 2004 RDF specs (on top of HTTP,
and augmented with OWL)*. Refining, well that'll take longer. SPARQL
and GRDDL are fairly new, and many best practices still aren't really
pinned down, not likely to be for a few years yet.

(* the core is remarkably elegant - just give things and relationships
between things URIs, pretty much everything else follows from there)

and then
> working these parts into a "turnkey" stack and demonstrate to the
> average startup "this IS possible, this how to do it, plug this
> package into you code, and you are On The Graph".

Right. Although the core tools & APIs have been built (e.g. [2]), only
now do we seem to be getting to the turnkey solution situation, and
broader outreach. Aside from a handful of notable exceptions, the
startup curiously seems to be the last demographic to be picking up on
the tech, although that isn't likely to be the case for long, with
e.g. folks like OpenLink offer a full-stack package, the company I
work for (Talis) covers similar ground, but with a platform offered as
a service.

One thing to bear in mind is that most of what's needed for a Web of
Data is already provided by the Web, checked Linked Data [3].

Certain domains (e.g. the life sciences) have been rushing to Semantic
Web tech, as the best hope for solving some of their hard problems.
Some of the big enterprise players like Oracle have RDF support, I
expect there'll be more where they came from. Then of course an awful
lot of people are using the stuff internally - Yahoo! springs to mind.

Is sparql perfect
> for this?

No, probably far from it, but it does offer incredibly useful
facilities, and when it comes to broad use of the Web of Data, we
still don't really know what would be perfect. But RDF+SPARQL used
with other Web tech does work, as an example the Open Linking Data
project now has about 25 or so sizeable datasets hooked together, last
count was 2 billion RDF statements with 3 million crosslinks. Same
basic principle as the humble FOAF profile (which may be derived from
XFN...)

maybe, but at the same time, we all seem to have pieces of
> the puzzle, but we dont quite know if/how they fit together to make
> the complete "open web graph" that we want. Me and Josh Lewis (floe.tv
> guys) got together with Paul, Ashley, and Chris (farraday guys) and we
> said "lets throw together a toy model, make it work in a sandbox, and
> then start plugging in various tech (rdf, sioc, sparql, xfn, foaf) and
> see what works well for a turnkey solution (something that can be
> immediately adopted by startups at minimal time/effort cost). We've
> written up an outline on the wiki http://dataportability.pbwiki.com/WRFS-Prototype-Workspace
> , with some beginnings of boilerplate code, if you'd like to jump in
> there with us, we'd love to see "sparql/best tech for the job" running
> in the stack.

Sounds interesting, but I'm afraid I couldn't find a password for the Wiki.

Cheers,
Danny.

[1] http://blogs.talis.com/nodalities/this_weeks_semantic_web/
[2] http://sites.wiwiss.fu-berlin.de/suhl/bizer/toolkits/
[3] http://en.wikipedia.org/wiki/Linked_Data

--

http://dannyayers.com

Josh Patterson

unread,
Dec 2, 2007, 12:36:00 AM12/2/07
to DataPortability
Danny,
I read your posts on "This Week in the Semantic Web" and have read
your blog for quite a while. You make some very valid points on how
far a lot of those technologies have matured. I think the major thing
that will drive them toward widespread adoption is "how does this make
my app better? how can I use this to get an advantage? whats the
killer app for this?". A lot of people have a lot of different goals
with these technologies, and I'll share my intentions very honestly
here: "make our media mixer stand-out from the crowd".

To do that, we've taken input on the beta, and one thing that people
really want is to not only be able to use their data however they
want, but for it to just "show up" in other applications without them
having to explicitly re-upload it to our site. And to take that to the
next logical step, if the data is "on the web", and its your data,
then it should be available in any application that you want to use it
in (I present some arguments for this viewpoint in my writeup).

What are ways to aggregate that data? Well, there are lots of them,
but RDF is very good with its URI's linking things together, and the
ability for relational logic. However, I dont necessarily look at it
as a complete "model" for this exact task (and also, what is our main
goal? do we all have the same goal?), but more so as a technology to
implement it in. I do think the semantic web's rise will be linked to
killer apps like floe.tv that implement a model like WRFS that use RDF
under the hood, and create a lot of exposed RDF "linked data" ( OIL,
OWL, whatever ) for a lot of people to use a variety of ways. However,
its gotta be more than just being "exposed" --- I think its gotta be
treated like a single unified database / filesystem hybrid. (record
level permission systems, highly distributed, aggregation via a "web
inode" at runtime)

I've read a little about linked data, but thats only part of the
equation --- yeah, its "linked" but like Messina has brought up, we
gotta control how it is linked, and who can link it. The user will be
more in control, the user HAS to become more in control or the
consumer is getting a bad deal overall. Linked Data facilitated by RDF
is part of the process, under the hood --- but what if we provide more
abstractions, like a sparql-inspired "linked data querying language",
and give them an ajax library that allows them to execute those
statements? I think these ideas and questions are not new [1], as
generally there is nothing new under the sun. However, as cycles of
the web go on, the old ideas get reviewed and might fit in this
epoch / market.

I'm sure you see a lot of places where I or this group are missing
something with our sketches, and I dont think anyone has it all
figured out (cause if they did, they'd drop a roadmap in our laps!),
take a look at [2], the password to the wiki is in there, and our
sketch is in there. I agree that everything that is needed to make the
Web of Data happen exists, but putting it all together "in tune and on
time" is the trick.

Let's get a toy model working that works with the data as an open
protocol that uses these sets of "standards". Let's make it highly
decentralized. Then let's get a bigger model working. Let's solve some
small problems like "setup a library that will allow a programmer to
query 'The Graph' for the locations of all the images owned by the
user 'day...@open.id' and return that in a linked data RDF structure".
Let's make it turnkey, and trust me, we've got people in this group
who will gladly plug into this coming Web of Data.

Josh

[1] http://www.w3.org/DesignIssues/
[2] http://groups.google.com/group/dataportability/web/overview-of-activity-so-far

On Dec 1, 2:49 pm, "Danny Ayers" <danny.ay...@gmail.com> wrote:
> > written up an outline on the wikihttp://dataportability.pbwiki.com/WRFS-Prototype-Workspace

Danny Ayers

unread,
Dec 2, 2007, 6:37:04 AM12/2/07
to datapor...@googlegroups.com
On 02/12/2007, Josh Patterson <jpatt...@floe.tv> wrote:
>
> Danny,
> I read your posts on "This Week in the Semantic Web" and have read
> your blog for quite a while. You make some very valid points on how
> far a lot of those technologies have matured. I think the major thing
> that will drive them toward widespread adoption is "how does this make
> my app better? how can I use this to get an advantage? whats the
> killer app for this?". A lot of people have a lot of different goals
> with these technologies, and I'll share my intentions very honestly
> here: "make our media mixer stand-out from the crowd".
>
> To do that, we've taken input on the beta, and one thing that people
> really want is to not only be able to use their data however they
> want, but for it to just "show up" in other applications without them
> having to explicitly re-upload it to our site. And to take that to the
> next logical step, if the data is "on the web", and its your data,
> then it should be available in any application that you want to use it
> in (I present some arguments for this viewpoint in my writeup).

Absolutely!

> What are ways to aggregate that data? Well, there are lots of them,
> but RDF is very good with its URI's linking things together, and the
> ability for relational logic. However, I dont necessarily look at it
> as a complete "model" for this exact task (and also, what is our main
> goal? do we all have the same goal?), but more so as a technology to
> implement it in.

Yep, that seems reasonable. If you have an application that's, say,
oriented towards music you'd want some app-specific vocabulary (e.g.
the Music Ontology), some more general vocabularies (e.g. FOAF for the
social side, Dublin Core for docs). You'd need the appropriate
subsystems to get data into the application and to manage and present
it as required.

The demands on the implementor aren't that different from someone
building with say PHP+MySQL - in fact using a toolkit like ARC
http://arc.semsol.org/ you probably would be using PHP+MySQL, only
working with the data at a web-oriented level of abstraction. The big
difference is that the data is inherently interoperable with other
data over the Web by default, not as an afterthought (like e.g.
pushing out RSS feeds or whatever).

I do think the semantic web's rise will be linked to
> killer apps like floe.tv that implement a model like WRFS

// me only just started reading WRFS docs

that use RDF
> under the hood, and create a lot of exposed RDF "linked data" ( OIL,
> OWL, whatever ) for a lot of people to use a variety of ways. However,
> its gotta be more than just being "exposed" --- I think its gotta be
> treated like a single unified database / filesystem hybrid. (record
> level permission systems, highly distributed, aggregation via a "web
> inode" at runtime)

That isn't far from the way people are using Semantic Web tech - it's
very much a single unified database - the Web of Data. Approaches to
access control vary, some folks use named graphs, some folks use
reification, but usually per-statement permission is possible, where
required. (Where a statement is a single binary relation, finer
grained than a typical record). It is highly distributed - it's the
web. Quite a lot of systems take a resource-oriented approach that is
very inode-like, the URI of the primary resource corresponding to the
inode number (in the global database), with a bunch of associated
properties. SPARQL's DESCRIBE is handy for this. It can operate at
runtime - data is accessed with HTTP GET, aggregated by simple
addition of the statements to a queryable RDF store/cache.

> I've read a little about linked data, but thats only part of the
> equation --- yeah, its "linked" but like Messina has brought up, we
> gotta control how it is linked, and who can link it. The user will be
> more in control, the user HAS to become more in control or the
> consumer is getting a bad deal overall.

For sure. A lot of the work around linked data to date has been on
open data, so access control isn't an issue. But the resource
description framework is well suited for describing permissions, a
nice example in the wild is the use of FOAF+OpenId for blog comment
whitelisting. http://esw.w3.org/topic/FoafOpenid

Linked Data facilitated by RDF
> is part of the process, under the hood --- but what if we provide more
> abstractions, like a sparql-inspired "linked data querying language",
> and give them an ajax library that allows them to execute those
> statements?

Hmm, I don't see any need for "sparql-inspired" when you can just use
sparql. But re. Ajax libraries, yup, we definitely need more on
Ajax-powered user-friendly front ends. (I think OpenLink have probably
done the most work in the area - http://oat.openlinksw.com/ )

I think these ideas and questions are not new [1], as
> generally there is nothing new under the sun. However, as cycles of
> the web go on, the old ideas get reviewed and might fit in this
> epoch / market.
>
> I'm sure you see a lot of places where I or this group are missing
> something with our sketches, and I dont think anyone has it all
> figured out (cause if they did, they'd drop a roadmap in our laps!),
> take a look at [2], the password to the wiki is in there, and our
> sketch is in there.

Thanks.

I agree that everything that is needed to make the
> Web of Data happen exists, but putting it all together "in tune and on
> time" is the trick.
>
> Let's get a toy model working that works with the data as an open
> protocol that uses these sets of "standards". Let's make it highly
> decentralized. Then let's get a bigger model working. Let's solve some
> small problems like "setup a library that will allow a programmer to
> query 'The Graph' for the locations of all the images owned by the
> user 'day...@open.id' and return that in a linked data RDF structure".

In SPARQL that would look something like:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX own: <http://example.org/owners/>

SELECT ?image
WHERE{
?image a foaf:image ;
own:owner "day...@open.id" .
}

Suitable libraries can be found at:
http://esw.w3.org/topic/SparqlImplementations

> Let's make it turnkey, and trust me, we've got people in this group
> who will gladly plug into this coming Web of Data.

Worthy aims, but I would note on your list you have:

"Defining which current technologies fit the properties of the current
abstraction so that we dont reinvent the wheel."

HTTP+RDF+SPARQL cover about 90% of the wheel you're describing ;-)

Reply all
Reply to author
Forward
0 new messages