RDF vs RDBMS

498 views
Skip to first unread message

revis...@yahoo.com

unread,
Aug 28, 2006, 8:56:56 PM8/28/06
to Semantic Web
I'm a newbie to RDF and have been facing a fundamental question as read
more about RDF. RDF positions itself away from plain XML
representations of data saying XML suited for representing data with
containment hierarchies, and where "order" is important, whereas RDF
has a flatter structure, represents only references among different
entities. That sounds just like what a relational database is supposed
to do, and those are critieria when deciding whether to used an XML DB
or a relational DB to store your data.

Where does RDF fit in, and how does it compare to relational databases.
I keep hearing that databases are not good for "semi-structured" data,
but am not yet able to understand how RDF addresses that. Mozilla for
example uses RDF for very structured (table of content) data.

What would be points of comparison where RDF is better suited to store
and query my data?

Revi S.

Danny Ayers

unread,
Aug 29, 2006, 6:50:27 AM8/29/06
to semant...@googlegroups.com, Semantic web list
On 8/29/06, revis...@yahoo.com <revis...@yahoo.com> wrote:
>
> I'm a newbie to RDF and have been facing a fundamental question as read
> more about RDF. RDF positions itself away from plain XML
> representations of data saying XML suited for representing data with
> containment hierarchies, and where "order" is important, whereas RDF
> has a flatter structure, represents only references among different
> entities. That sounds just like what a relational database is supposed
> to do, and those are critieria when deciding whether to used an XML DB
> or a relational DB to store your data.

I'll dodge the XML question, save to say that RDF and XML can be seen
as fulfilling different roles, RDF providing a data model and XML
providing a syntax for serialising data. I'll have a crack at this
though:

> Where does RDF fit in, and how does it compare to relational databases.
> I keep hearing that databases are not good for "semi-structured" data,
> but am not yet able to understand how RDF addresses that. Mozilla for
> example uses RDF for very structured (table of content) data.

I think it can be useful to think of RDF as *a* relational model, just
not the same one that SQL DBs are based on (Codd's). Individual
statements in RDF are expressed as subject, predicate, object triples.
Sets of these with a common predicate can be mapped to binary
relations in the relational model, in the the common parlance,
2-column tables. e.g.

==== foaf:name ====
---subject------object--
_:personA | "John"
_:personB | "Jane"
_:personC | "Fred"
...

Here the subjects are bnodes, which can be viewed as ID fields/keys in
the local store.

Going back to your suggestion that RDF is flatter than XML, well yes,
when viewed as a set of triples it is. But the subject of one triple
can (and often is) the object of another, and vice versa:

==== foaf:knows ====
---subject------object--
_:personA | _:personB
_:personB | _:personC
_:personC | _:personA
...

So another view of a set of statements is the node (subject/object) &
directed arc (predicate) graph. There's a loop in this example. In
this sense RDF is actually less flat than XML, which (without
assistance) just has a hierarchical tree structure.

In the directed graph structure there's an obvious analogy there to
the interlinked structure of the Web. But almost certainly the most
important point of RDF in regards to the Web is that the subject,
predicate and object can be resources in the Web sense, things
identified with URIs. This means that they can act as ID fields/keys
not just in the local store but anywhere they appear. In other words,
through relational glasses the (Semantic) Web as a whole can be
considered a single database. In this view an individual RDF store or
file is just a cache of a little bit of the data in the Semantic Web.
The graph view of RDF is more than just an analogy to the interlinked
structure of the Web, it's an extension of it.

In the relational model, a row in a table is actually an assertion
that the relation is true for the values in the row. A SELECT query is
a filter on the assertions that are true for the given conditions. A
RDBMS will maintain logically consistency across all the data it
contains. In these (and other ways) a relational DB is a reasoning
engine. But another significant difference between relational DBs and
RDF is that in the former, for a certain set of values a relation is
either considered either true (there is a corresponding row in the
table) or false (there isn't). In the RDF model in the general case,
if a set of values isn't in the "row" (i.e. you don't have a
particular statement), then it's not false, just unknown. (This is the
open world assumption, check "Missing isn't broken",
http://rdfweb.org/mt/foaflog/archives/000047.html). In practice, when
querying either programmatically or with SPARQL, you will only be
looking at a certain set of data, so this is treated as the universe
(the whole graph) and hence closed.

Where things start to get really interesting is that the predicates
can appear in "tables" too:

==== rdf:type ====
---subject------object--
foaf:name | rdf:Property
...

At this point it may be easier to stop thinking in terms of the
relational model, the object-oriented model - the inheritance bits at
least - is probably closer conceptually (though still very different).

> What would be points of comparison where RDF is better suited to store
> and query my data?

I'll leave that part for someone else ;-)

Spot on questions, hopefully some day the answers will find their way
into the FAQs here:
http://esw.w3.org/topic

(Somewhere around there you should also find material on mapping
between RDBMSs and RDF, the stuff above is just one way).

Cheers,
Danny.

--

http://dannyayers.com

ch...@bizer.de

unread,
Aug 29, 2006, 12:54:51 PM8/29/06
to Semantic Web
My to cents.

I completely agree with Sören, that one strength of the RDF data model
is
its flexibility.

Two things that appear at least equality important to me and that are
not
provided by the relational model are globally unique identifiers and
links.

By using globally unique identifiers, everybody can add information
about a
resource. By using links, you can refer from your resource to somebody
else's resource. Meaning that you can set a link from one database
(repository) to another, which clearly isn't possible with classical
relational database technology.

Tim's tabulator browser [1] shows nicely how these links can be
followed by
using URI dereferencing and the good old rdf:seeAlso property. I also
like
Bastian's work on federated SPARQL queries [2], which shows how
globally
unique identifiers enable queries over multiple data sources. So the
Semantic Web community is getting closer to having the access paradigms
of
the classical web - browse and search - also work for the Semantic Web.

Thus, I think where the RDF model really starts playing it strengths is
data
integration and data linkage. We are currently exploring data linkage
in the
context of D2R Server [3] , a tool for publishing the content of
classical
relational databases on the Web. D2R Server allows you to query
relational
databases with SPARQL. Currently, we are extending the server with URI
dereferencing features, meaning that you can retrieve RDF
representations of
the objects in your relational database. This will allow you to set
links
between different relational databases, allow you to refer from your
webpage
or blog to an object within a relational database or use a tool like
Tabulator to transparently browse from the content of one relational
database to the content of another.

So my guess: If RDF is good for something, it is good for data
integration
and data linkage.

Cheers,

Chris

[1] http://www.w3.org/2005/ajar/About.html
[2] http://darq.sourceforge.net/
[3] http://www.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/


----- Original Message -----
From: "Sören Auer" <au...@informatik.uni-leipzig.de>
To: <semant...@googlegroups.com>
Cc: "revi s." <revis...@yahoo.com>; <www-rdf-...@w3.org>
Sent: Tuesday, August 29, 2006 4:39 PM
Subject: Re: RDF vs. relational databases

From my point of view (next to the distribution aspect) flexibility is
a crucial difference between the RDF based Semantic Web representation
techniques and databases:

With databases schema changes are very time consuming operations - the
whole repository and keys have to be reorganized. Triple stores on the
contrary don't distinguish between data changes and ontology schema
changes - both are finally just additions or deletions of triples.
However, triple stores will probably never be able to compete with
optimized database schemas with respect to query speed. So if you need
high speed querying and you don't expect many schema changes use a
RDBMS, if you want to be very flexible with your schema/ontology use a
triple store. The RDF paradigm is also a bit more holistic in the sense
that everything from data, schema to metadata is encoded in triples,
while databases usually have different encoding techniques for each of
these.

Regards,

Sören

Danny Ayers

unread,
Aug 30, 2006, 5:26:11 AM8/30/06
to semant...@googlegroups.com
It goes without saying I agree completely with all the positive things
said about RDF :-)

But as is implicit in Chris' work on D2R, it's not an either/or
situation. There are plenty of good reasons to exploit the decades of
work on RDBMSs with a local DB, not least because they're already
massively deployed. The benefits of the RDF model are also available
through either exposing an RDF (/SPARQL) view of an existing RDB, or
through using an RDF store that uses a relational DB under the hood.

Lots of relational DBs are already connected to the web, but this is
usually through a very narrow filter (to the human-friendly HTML web
site). There's a lot to be gained from widening that communication
path and allowing systems to talk to each other more directly.

Cheers,
Danny.

Wide Boy

unread,
Aug 30, 2006, 3:57:13 PM8/30/06
to Semantic Web
Naturally, I whole heartedly agree with all the positive things that
have been said about RDF and the enormous potential it offers as an
enabling technology for the semanitic web. However, I get an uneasy
feeling when people start to make comparisons with a database, whether
it be relational or any other in fact. I feel uneasy because my view
of a database is that it should have structure, a language for altering
that structure and a language for querying the data contained within
these structures. When I say querying I mean read, write and update,
not just read. All comparisons of RDF to RDB, in this dicussion and
others, simply refer to the likeness of RDF to RDB on the basis of data
being read/accessed. I've not seen any discussions that would try to
suggest how data in and RDF store may be updated, and managed in terms
of data consistency, integrity, privacy, etc the sort of things
typically provided, as a defacto feature set in _all_ database not just
RDBMSs. I'm not sure RDF is designed for this sort of functionality and
without it I don't think it could qualify as a DBMS.

Just my two cents worth, FWIW.

Naran

Danny Ayers

unread,
Aug 30, 2006, 6:31:04 PM8/30/06
to semant...@googlegroups.com
On 8/30/06, Wide Boy <naran....@gmail.com> wrote:
>
> Naturally, I whole heartedly agree with all the positive things that
> have been said about RDF and the enormous potential it offers as an
> enabling technology for the semanitic web. However, I get an uneasy
> feeling when people start to make comparisons with a database, whether
> it be relational or any other in fact. I feel uneasy because my view
> of a database is that it should have structure, a language for altering
> that structure and a language for querying the data contained within
> these structures. When I say querying I mean read, write and update,
> not just read. All comparisons of RDF to RDB, in this dicussion and
> others, simply refer to the likeness of RDF to RDB on the basis of data
> being read/accessed.

Hmm, RDF isn't short of structure, it just tends towards less rigid
structures than most traditional approaches to data.

I don't disagree that having some conventions for update closely
associated with the SPARQL query language would be very desirable, in
fact I nagged for it quite a bit myself. But on reflection I think the
approach taken by the DAWG does make sense: let the query/read side
settle down for a while, letting toolbuilders to work on their own
solutions to try and get some idea of what's *really* required. After
all, the basic facilities are already there in HTTP (put, post, delete
graphs, authentication).

I've not seen any discussions that would try to
> suggest how data in and RDF store may be updated, and managed in terms
> of data consistency, integrity, privacy, etc the sort of things
> typically provided, as a defacto feature set in _all_ database not just
> RDBMSs. I'm not sure RDF is designed for this sort of functionality and
> without it I don't think it could qualify as a DBMS.

Data consistency, integrity and privacy are all very much relevant
question for Semantic Web technologies, and I have no doubt that RDF
store implementors aim for the best RDBs can offer. However the scope
is rather different - the problems are complicated somewhat by the
fact this stuff is primarily designed for a distributed environment,
the Web. In a massive-scale distributed system where total consistency
and integrity just isn't realistic, robustness and fault tolerance
become more significant considerations. There are plenty of other
technical and social issues of joining together data from completely
inpendent sources, but I believe RDF is considerably more suitable
than any other approaches coming from the database world. I can't
imagine RDF stores supplanting RDBs in the foreseeable future, but
then again nor can I imagine a traditional RDB (no matter how
distributed) scaling to the size of the Web, with the concomitant
diversity of data.

I'm not sure the notion of qualifying as a DBMS makes much sense -
like the relational model, RDF is just a model. But like the
relational model it does have a logical formalism, so there is a sound
basis on which consistency and integrity can be implemented. (It's
also worth noting that RDBMS implementations aren't above criticism in
such respects, c.f. Database Debunking).

But as I mentioned before, it's not an either/or situation. There are
scenarios where an RDF store makes more sense, there are scenarios
where an RDBMS makes more sense. In a lot of the latter cases, RDF the
language can also make sense in integration strategies, or as previous
posters in this thread have mentioned, as a tool that introduces
extreme flexibility into developing data-based applications. I think
the sane thing to do is hedge your bets, the strategy that companies
like Oracle and IBM appear to be following.

Reply all
Reply to author
Forward
0 new messages