Semantic Web + FluidDB

10 views
Skip to first unread message

orlin

unread,
Sep 2, 2009, 1:33:55 PM9/2/09
to FluidDB Users
Hi,

The "linking objects" thread was great, especially its going into
semantic web territory. Exciting to know there is substantial
interest in that. I would like to approach it afresh and from a
slightly different angle.

I've also thought about how triples could map on fluiddb. There are
several ways to do it. Almost got sucked into it. At some point
though I stepped back asking "why". Now I ask you. Except for its
being interesting, fluiddb as a triple store, among all the other ways
to look at it... Why? What value does it bring? How would you
compete against stores that exist solely for the purpose of storing
massive ammounts of triples and are optimized to query such. Lots of
production-ready, highly-competitive opensource projects. Isn't this
taking away from the focus of what fluiddb is really good at?
Something that semantic technologies can't (easily) do. To me, the
social aspect and the query language is what set fluiddb apart and
simplicity is what will make it succeed like no other. Yes, sparql is
great, but I wonder how many would use it, even if made availble too.
I sense that the fluiddb query language will be good enough for most
searches and thus preferred by most. Later, I plan to write more
about searching and how in some ways the query language is better than
sparql (to the "fluiddb discuss" group)...

So object:tag and triples can peacefully coexist, yet they seem like
competing representations, each with their specific software / tools.
There is another way.

Planning to use semantic technologies (if your data is public, and
perhaps even if not - I'd recommend http://talis.com/platform)
together with fluiddb. A hybrid approach. Not completely semantic
web, but more of a fusion. Some or all the data can be auto-imported
into fluiddb in the object->tag->value format that is most natural to
fluiddb users. This is again for the purpose of further social
tagging, sharing with other apps, convenient querying, and everything
that makes fluiddb great. Any data that has been imported from the
semantic web side, can be re-imported. This solves the temporary
issue of fluiddb being alpha and most of the headache that comes with
loss of data. Webhooks could solve the problem of updating the triple
store with fluiddb data, so the flow can go both ways.

Even after fluiddb is fully stable, there is "peace of mind" that
comes from having your data backed up. Having it also in rdf is a
huge plus. Automatically available to all non-fluiddb users (i.e. the
rest of the www). Fluiddb and the triple stores would probably
interlink anyway. Being addressable, fluiddb can be linked to, as
well as link through tag values.

So we have some triples. How do we get them into fluiddb? We could
have a data model to validate against -- a definition language in the
form of fluiddl.owl (ontology), fluiddl.rdfs (schema), fluiddl.spin
(for constraints checking), or whatever. Does it make sense to
describe the crud for objects & tags entirely in rdf? I picture an
intermediary service that converts the rdf to api calls. I guess not
exactly crud. Just post (create), put (update except for about),
delete (but not objects), etc. Basically whatever fluiddb allows. If
the fluiddb api changes, this adapter gets fixed and all semantic apps
continue working as happy as before.

How do we get the rdf of a specific application converted to the
fluiddb rdf described above? Personally, I think with http://www.spinrdf.org/.
Here is an example http://composing-the-semantic-web.blogspot.com/2009/08/ontology-mapping-with-spin-templates.html
post about such transformation of data from one ontology to another.
So we can use sparql for this or more. I'm sure there are other ways
too.

An application api can make this doubling of data transparent. It
puts it in both places at once. Synchronously, in parallel with http
calls, or with amqp, or possibly just update one of the stores and
then bring the other up to date with a background job. The fluiddb
import service could be an extraction of the first such app,
refactored, and a basis for all further semantic apps.

Btw, the data can originate from a relational database - check
http://esw.w3.org/topic/Rdb2RdfXG if interested in that. The rdf
representation comes for free.

Best of all worlds,

Orlin

Terry Jones

unread,
Sep 16, 2009, 10:57:36 PM9/16/09
to fluidd...@googlegroups.com
Hi Orlin

Sorry for the amazingly slow reply to this :-(

>>>>> "Orlin" == orlin <orlin...@gmail.com> writes:

Orlin> I've also thought about how triples could map on fluiddb. There are
Orlin> several ways to do it. Almost got sucked into it. At some point
Orlin> though I stepped back asking "why". Now I ask you. Except for its
Orlin> being interesting, fluiddb as a triple store, among all the other
Orlin> ways to look at it... Why? What value does it bring? How would
Orlin> you compete against stores that exist solely for the purpose of
Orlin> storing massive ammounts of triples and are optimized to query such.
Orlin> Lots of production-ready, highly-competitive opensource projects.
Orlin> Isn't this taking away from the focus of what fluiddb is really good
Orlin> at? Something that semantic technologies can't (easily) do. To me,
Orlin> the social aspect and the query language is what set fluiddb apart
Orlin> and simplicity is what will make it succeed like no other.

It's interesting that you see fundamental value in the query language. You
might be interested in a blog posting I wrote on the advantages of a simple
query language http://bit.ly/ipQmZ

Orlin> So object:tag and triples can peacefully coexist, yet they seem like
Orlin> competing representations, each with their specific software / tools.
Orlin> There is another way.

Orlin> Planning to use semantic technologies (if your data is public, and
Orlin> perhaps even if not - I'd recommend http://talis.com/platform)
Orlin> together with fluiddb. A hybrid approach.

I didn't know about Talis. I just spent some time looking through their web
site. Looks good. Seeing as we'd obviously fall into the non-open data
category, we'd have to become a commercial customer of theirs. We might be
able to do that in the longer term. You might also be interested to know
that we're (currently) using S3 for underlying storage of data. That should
give some peace of mind. Nevertheless, you're right that we're in an alpha
stage and just because we use S3 underneath doesn't mean we don't have work
to do to make sure data actually gets to S3 from an operating FluidDB
instance.

Anyway, I like the general idea, and Talis in particular looks nice.

Orlin> How do we get the rdf of a specific application converted to the
Orlin> fluiddb rdf described above? Personally, I think with
Orlin> http://www.spinrdf.org/. Here is an example
Orlin> http://composing-the-semantic-web.blogspot.com/2009/08/ontology-mapping-with-spin-templates.html
Orlin> post about such transformation of data from one ontology to another.
Orlin> So we can use sparql for this or more. I'm sure there are other
Orlin> ways too.

Yet another technology for me to learn about! :-) There seems to be no end
of relevant stuff that I've never heard of. Thanks.

Orlin> An application api can make this doubling of data transparent. It
Orlin> puts it in both places at once. Synchronously, in parallel with
Orlin> http calls, or with amqp, or possibly just update one of the stores
Orlin> and then bring the other up to date with a background job.

Agreed. Periodic background serialization and dumping of data into S3 is
the current mode.

Orlin> Btw, the data can originate from a relational database - check
Orlin> http://esw.w3.org/topic/Rdb2RdfXG if interested in that. The rdf
Orlin> representation comes for free.

I'll go look at this too.

Orlin> Best of all worlds,

Yes, that would be great. FluidDB is not trying to be everything, so the
better it fits into the existing ecology of data solutions, the better.
Like you, we're sure it has its niche and sweet spot. That's probably due
in some combination to the query language, the model of control, the
simplicity, etc.

Thanks a lot & sorry for the slow reply. I've been neglecting this mailing
list recently in favor of working on code. I hope to strike a better
balance now.

Terry

orlin

unread,
Sep 20, 2009, 8:21:33 AM9/20/09
to FluidDB Users, o...@soundsapiens.com
Hi Terry,

Slow replies are no problem. It's usually better to think things
over. Big changes rarely happen fast anyway.

I wanted a column store exactly for search, but had given up looking
by the time FluidDB showed up. It's so awesome! Both Google's
BigTable and Amazon's SimpleDB were too limiting with their query
languages. Except for the simple arithmetic and composition of stored
queries (which could be worked around) FluidDB's seems perfect.

You actually took my idea a step further, which took me by surprise.
For one, I wasn't sure how much of your past Semantic Web criticism
still holds :) Also, I didn't realize it could be read that way (at
the lower level). Talis is similar to FluidDB as both are _pushing
the envelope_ with data APIs. I was actually thinking at the
application level and the conversation went to Fluid DB
serialization / S3 storage, which is great. There probably are
advantages to serializing in RDF from a FluidDB perspective. There
are also some Talis-specific ones like deltas for example -- so you
could easily keep history of changes (for those who need it). The
biggest one though (from my point of view) is to get representations
of FluidDB queries in JSON-RDF - one of the many rdf alternatives
which are interchangeabe - e.g. with http://triplr.org/ or your own
Redland / Raptor. API calls could perhaps also take rdf payload
in...

Semantic web apps could thrive on FluidDB! One would just need to
setup the mapping between their application vocabulary and the one of
FluidDB. It all integrates using semantic web standards at the data
level. In our example, if you were to go with Talis, there would be
two kinds of Talis we are talking about. The Talis I keep my semantic
application data in and the Talis you serialize FluidDB data to.
There can also be two kinds of Talis from your business perspective.
Not all data is private / protected and therefore commercial. Talis
loves open linked data, and has a couple of public data licenses for
that. They are also very active with emerging technologies (not all-
semantic) through Paul Miller's podcast talks http://planet.talis.com/talkingwithtalis/
for example. The Conected Commons offers 50 Million triples (a year)
for free http://www.talis.com/platform/cc/ if the data is public
domain - I'm pretty sure you can qualify with some of FluidDB's
data... It seems this could cover the readable (by all) FluidDB tags
for quite a while - how many object / tags are there right now?
Furthermore, applications could have their own Talis stores where you
put the readable data (based on tags namespace) on their behalf, which
helps keep FluidDB's Talis quota low. And why not their private data
too. Perhaps some apps will have private, commercial Talis stores to
put all their tags in. All my Talis data will be public and I plan to
also put it in FluidDB - mostly so that it can be better searched (the
dsl stuff). This can be free for both you and such apps.

Once FluidDB matures, perhaps I would keep the data that overlaps - in
FluidDB as a source rather than destination. In either case,
integration is straightforward - for example with SPIN transformations
which could easily go both ways. Only the data that doesn't fit well
in the FluidDB model or doesn't make use of the FluidDB advantages -
would be separate. In time, the overlapping rest could be mirrored
automatically. I dream of semantic replication (e.g. extesions)
between FluidDB and various triple stores :) More realistically, if
FluidDB seriaized some (or all) of its own data in rdf, I could take
it with webhooks and do whatever else with it. Even better, if both
FluidDB and my app are storing the data in Talis I could give you
authentication credentials so you post the data to my store as well.
The point is not just for backup (though that is a nice feature -
regardless of how stable FluidDB is), but also to have inferred data
that's different from the FluidDB model of object / namespace / tag /
etc. If you could run user-stored scripts for these transformations
(using jena's spin inference engine or perhaps some owl reasoner),
that would be even more awesome! Having webhooks first will be
better, as they are more generic and useful for all kinds of
purposes.

I'm having hopeful thoughts about such exciting possibilities :)

Orlin
Reply all
Reply to author
Forward
0 new messages