On using DataMapper for an RDF ORM

54 views

Skip to first unread message

Ben Lavender

unread,

Jun 18, 2010, 12:25:33 AM6/18/10

to datam...@googlegroups.com

Hi all,

I'm Ben Lavender, author of Spira[1], an ORM for RDF.rb [2]. I
recently wrote a blog post [3] about Spira in which I briefly mention
why I didn't go with a DataMapper adapter. Not long after that, I
mused on twitter about DataMapper again [4], and afterwards dkubb
messaged me asking for more information (seems a lot of support is
done there). I said I'd email a response, and that response turned in
to the following novella.

I'm not sure if the tweet was offering help, hunting for feedback,
trying to start a discussion, or what, but I decided it didn't
matter--I'm very grateful for having had DataMapper as a sane source
of inspiration several times now, and if nobody's really interested in
this then consider this some detailed, hopefully useful feedback on
adapting DataMapper to another data model.

Anyway, my ORM, Spira, is based on RDF. I won't get too detailed with
what RDF is, but briefly, it's a W3C standard data model which is the
roughly the same as an entity-attribute-value system, or a directed
graph with named edges, or triples--subject, predicate, object. The
cool part is that each identifier, for every subject, predicate, and
objects that are not data literals, are globally unique (URI's). RDF
really took off this last year, and lots of websites you use every day
are publishing it. A decent intro is at [5].

RDF is synonymous with Linked Data, the Semantic Web, and tons of
other useless buzzwords, and other buzzwords like RDFa are
serializations of RDF. Spira is an attempt at an ORM which embraces
the semantics of this data model, which are well-captured by the RDF
library for Ruby that Arto Bendiken and I wrote, RDF.rb. Spira is
currently built on top of this with minimal other requirements.

Spira's DSL looks an awful lot like DataMapper when it's being used,
so it would seem pretty boneheaded not to just use DataMapper (and
indeed, Arto made a sorta-working dm-rdf adapter a few months ago
[6]). But there are problems with the DataMapper API that prevent
this. So, as was requested on Twitter, here are my pros and cons for
DataMapper for RDF. I must warn that I have learned what I know of
DataMapper in bits and pieces over the last year, so some of it may be
out of date.

DataMapper wins:

* Tons and tons and tons of 'paperwork'. Dirty tracking, a useful
validation set, lots of field types, eager/lazy loading semantics, and
sane semantics for identity maps, just to name a few. This is the
biggest win for me--any decent ORM will have to recreate most of this,
and this stuff is all well-written, and, importantly, incredibly
well-tested. I can't express enough how undelightful it feels to
consider redoing most of this when it all exists and is done so well.
And when it mostly works, it's sane enough to imagine making work.
For example, I can see a path to map existing RDF data literal types
to the base set of DataMapper types without bending doing something
insane. This is actually a number of smaller wins, lumped into one.

* Query filtering! DataMapper's filter_records is awesome, providing
tons of useful filter methods entirely at the Ruby level if one so
chooses. RDF has some standard query languages, but the data model
itself does not have one, and I would love to be able to hand this off
to DataMapper initially and implement this huge feature set
incrementally.

* Some useful plugins. dm-remixable, for example, is an interesting
take on module inclusion, and having small-but-useful things like
dm-trimmer is excellent.

However, DataMapper has a lot of things that make mapping it to RDF a
rough fit at present.

* Primary key assumptions. RDF operates in the open world--if you
don't have a resource, you can't definitively say it it doesn't exist,
just that you don't know whether it exists or not, because the primary
is globally unique. This disconnect is pervasive in DataMapper. For
example, the storage adapter shared specs are based on an adapter that
can use an integer primary key, but in RDF, data literals explicitly
*cannot* be a key--only URIs are allowed.

This has affected the semantics in Spira quite a bit. RDF resources
only exist in terms of their known, asserted, non-nil properties. To
'create' a resource and not set any property values isn't a meaningful
operation. Similarly, while querying for resources matching certain
property values makes sense, to 'find' a property by a primary key
cannot meaningfully return 'false'. Further, any valid URI can be
used as an identifier for any set of properties (represented,
hypothetically, by a DataMapper resource).

In Spira, I opted to eschew 'find(key)' and 'create(key)' for
'for(uri)' and 'RDF::URI.as(klass)'. Making this change would be
exceedingly difficult in DataMapper. While it can probably be done at
the user level, it would be very tough to get away from
internally--DataMapper seems to conflate finding, identity, and
existence.

* Collection semantics. In RDF, all properties are n-ary sets of
values. While Spira defines a way to ignore all but one instance of a
property for simplicity of avoiding this, I don't see how to represent
the case of a multiply-valued property. Making it a property with a
Set object value is a problem, since dirty checking is done by object
identity, so updating said set won't do. And the 'has x' DSL is only
for relations, not normal properties.

* Relation semantics. DataMapper relation semantics are an exact
mirror of relational database semantics and are not the same as the
semantics used in graph databases. For example, 'has n' looks for a
key on the child object that refers to the parent, and many to many
requires a linking table/class. In the scope of RDF, all of this is
relational database voodoo, and unnecessary; relations are just a
property pointing to the URI of another resource, and there can be a
set of them. Spira handles relations with one extra case in the case
statement where it checks property type. It really couldn't be
simpler.

* Many plugins won't work, often because of the above problems with
the idea of 'primary key'. For example, dm-taggable lets you define a
table name for tags via a symbol, then attach them to a model. But
the RDF equivalent of a RDB table would be a globally unique RDF type
URI, and symbols won't cut it. This, and many other plugins, are
simply not going to work, and none of them will have test cases
relevant to this, so it will be very difficult to tell what works and
what doesn't.

* The query and filter modeling is sweet, but I am having trouble
finding documentation on how to implement an adapter that knows what
to do with these. This is perhaps just me not digging deep enough
yet, but I suspect it was a contributing factor to the demise of the
apparently-defunct CouchDB adapter, which had a completely separate
interface for searching when it needed to use Couch's javascript
map/reduce functionality (an overview of the problems adapting a
non-SQL query format to DataMapper is at [7]).

* Because RDF is a consistent data model with multiple
implementations, it's natural to define an interface to an RDF data
store, which is just what we've done in RDF.rb (RDF is much more
consistent than SQL, and this is a lot easier problem than the one
projects like DataObjects have taken on). This has some unfortunate
overlap with DataMapper repositories; since RDF repositories are truly
semantically equivalent, one can do things like create a union of
repositories that acts like a normal repository, and then pass this on
to whatever ORM lives above the data model layer. DataMapper has a
sizable chunk of semantics dealing with repositories, and meeting them
all will probably be re-treading some ground we covered in the core
RDF.rb library.

* DataMapper has me writing a plugin instead of an entire framework.
This severely limits agile redefinition of semantics--if significant
chunks of DataMapper need retrofitting to support some or another
thing I want to try, it's much harder to experiment as I go.

* The DataMapper project is rather large and imposing. I only
submitted a patch once, a pending test for an open suggestion ticket,
and nothing happened. Now, for a project as large and complicated as
this, I don't imagine that every random guy who has a patch or problem
will get a response, but if I want to make an awesome RDF ORM, and I
need to make patches, and there's a hill to climb before patches I
produce will be accepted, that worries me--I want to be solving the
RDF ORM problem, not incidental ones.

The last two reasons were my deciding factors. I've been working on
this RDF stuff for a few years now and, I really wanted to focus on
the questions raised by an RDF ORM. DataMapper has some things that
RDF just does not need. The relations semantics given above would be
a good example. I think it's key that an RDF ORM not make compromises
on semantics--unlike most nosql stores, RDF is a cross-vendor,
cross-implementation data model [8]. If a workable solution comes
about, it's not a one-off adapter, but a link to a whole ecosystem.

That being said, I've come to appreciate just how well-done a lot of
the 'baseline' stuff in DataMapper is. And I've played with the
semantics for a few months now and come across some of the bigger
questions on the RDF semantics front. So if someone in the DataMapper
community has some RDF interest, and someone who knew their way around
better wanted to help me with a roadmap of how to approach some of
these issues, it would be much appreciated.

In any case, I hope these comments are useful, and please accept my
delving into DataMapper to this degree as a compliment to what you
guys have done.

[1]: http://spira.rubyforge.org
[2]: http://rdf.rubyforge.org
[3]: http://blog.datagraph.org/2010/05/spira
[4]: http://twitter.com/dkubb/status/16095960423
[5]: http://www.rdfabout.com/quickintro.xpd
[6]: http://github.com/bendiken/dm-rdf
[7]: http://holmwood.id.au/~lindsay//2009/02/15/everything-old-is-new-again/
[8]: http://blog.datagraph.org/2010/04/rdf-nosql-diff

Piotr Solnica

unread,

Jun 18, 2010, 11:18:28 AM6/18/10

to DataMapper

On Jun 18, 6:25 am, Ben Lavender <blaven...@gmail.com> wrote:
> * Primary key assumptions. RDF operates in the open world--if you
> don't have a resource, you can't definitively say it it doesn't exist,
> just that you don't know whether it exists or not, because the primary
> is globally unique. This disconnect is pervasive in DataMapper. For
> example, the storage adapter shared specs are based on an adapter that
> can use an integer primary key, but in RDF, data literals explicitly
> *cannot* be a key--only URIs are allowed.

These assumptions will go away soon. We're going to introduce a new
type of a property that will represent a generic object id called
ObjectID, Identity or Identifer or maybe something else :) The point
is Serial (an Integer-derived primary key field) will no longer be the
default primary key property in the shared specs (and other places
too!).

> * Collection semantics. In RDF, all properties are n-ary sets of
> values. While Spira defines a way to ignore all but one instance of a
> property for simplicity of avoiding this, I don't see how to represent
> the case of a multiply-valued property. Making it a property with a
> Set object value is a problem, since dirty checking is done by object
> identity, so updating said set won't do. And the 'has x' DSL is only
> for relations, not normal properties.

If I understand correctly you're talking about EmbeddedValue here
which will be introduced in DM 1.x series. See:
http://wiki.github.com/datamapper/dm-core/roadmap

> * Many plugins won't work, often because of the above problems with
> the idea of 'primary key'. For example, dm-taggable lets you define a
> table name for tags via a symbol, then attach them to a model. But
> the RDF equivalent of a RDB table would be a globally unique RDF type
> URI, and symbols won't cut it. This, and many other plugins, are
> simply not going to work, and none of them will have test cases
> relevant to this, so it will be very difficult to tell what works and
> what doesn't.

Some plugins may provide a behavior that will work with all the
adapters; however, many of the plugins will have to have a per-adapter
implementation (like dm-aggregates for example) and I don't think we
can avoid that.

Thanks for your fantastic feedback. I've faced a lot of similar
problems while working on MongoDB adapter so I perfectly understand
your decisions. Anyway I stick with DataMapper as most of the problems
with NoSQL adapters will be resolved in 1.x series AND DataMapper 2.0
will probably solve ALL the issues that you've had, so I guess pretty
soon you should be able to come back and see if it makes more sense
writing a new RDF adapter rather than working on a dedicated ORM where
you have to implement ~75% of functionality that DataMapper already
provides :)

Cheers,

// solnic

Reply all

Reply to author

Forward

0 new messages