Some thoughts on DataMapper's Property API

Anthony Williams

unread,

Feb 1, 2010, 11:45:00 AM2/1/10

to DataMapper

Hi all,

I've been thinking a little about defining properties in DataMapper,
particularly relating to how they work within repository blocks. I've
posted some ideas to my site: http://antw.me/thoughts/datamapper-property-api.html

If you prefer to read them here instead (sans links and syntax
highlighting), I've included the full message below.

I'd love to get some feedback; let me know what you think...

Anthony.

TRANSCRIPT:

In my own dm-core fork on Github I've recently been experimenting with
ways to trim down both Resource and Model, extracting specific
functionality out to separate classes and modules. I've started by
relieving Resource of the need to take care of attributes, creating an
AttributeSet class which holds all of a Resource's attributes, tracks
when they've been updated (marking them as dirty), and lazy-loading
attributes when needed.

Although not yet pushed to Github, my latest commits pass the full dm-
core spec suite against SQLite3 and PostgreSQL, but have 3-4 failures
with the InMemory and Yaml adapters. I eventually tracked this down to
an example where a Property is defined on a Model within the context
of a specific repository. For example:

require 'dm-core'

DataMapper.setup(:default, 'in_memory://localhost/one')
DataMapper.setup(:second, 'in_memory://localhost/two')

class Person
include DataMapper::Resource

property :id, Serial
property :name, String

repository(:second) do
property :external_id, Integer
end
end

This creates a Person model with two attributes: `id` and `name`, and
a third `external_id` attribute which applies only when using the
model within the `:second` repository context:

DataMapper.repository(:second) do
Person.create(:name => 'Michael Scarn', :external_id => 1)
end

In then realised my AttributeSet implementation didn't account for the
properties of a Model changing depending on the current repository
context. It then led to me thinking a little more about the purpose--
and usefulness--of being able to define models in this way.

I'd like to--perhaps a little presumptuously--suggest that this
functionality isn't as nice as it first seems, provide an alternative
means for achieving the same result, and elaborate on how I think such
repository blocks should work.

### Inconsistent instance API

Allowing a user to wrap properties in a repository block results in a
model changing it's behaviour depending on external state (the current
repository context). At one moment the resource has an `external_id`
attribute, and in the next the attribute seems to disappear.

DataMapper.repository(:second) do
Person.new(:external_id => 1)
end
# => #<Person @id=nil @name=nil>

Wait... where did the `external_id` attribute go? In fact the
attribute was set, it just doesn't appear since `Person#inspect` was
called outside of the repository block...

DataMapper.repository(:second) do
puts Person.new(:external_id => 1).inspect
end
# => #<Person @id=nil @name=nil @external_id=1>

Aha! There it is. Trying to set the `external_id` attribute outside of
the `:second` repository context will also (rightly) fail.

### Ambiguity as to where a resource is saved

DataMapper's repository context allows you to save any Resource to any
defined repository (providing they support the same features).

person = Person.new(:name => 'Samuel L. Chang')

# Now that I have my resource, I can save it to wherever I
# want... By default, the resource will be saved in the
# :default repository
person.save

# Alternatively, I can specify a different repository...
DataMapper.repository(:second) do
person.save
end

While this is an interesting feature, I'm struggling to come up with a
reason why you'd _want_ to do this. To me it just introduces
ambiguity:

> Erm, where did I save that person instance? I'm sure it's around here
> somewhere... Where are you little person instance? Peekaboo!
>
> <cite>Me, a year later.</cite>

In reality, so long as you're explicit about wrapping parts of your
application in the correct repository blocks, this is not a problem.
But wherever you have to be explicit there is the possibility that
someone will forget; forgetting _just once_ might be enough to cause
obscure bugs.

If the `external_id` attribute was set to disallow nil, the second
call to `person.save` in the above example would fail, since no value
was set. (In fact, the above example would fail anyway, since the
first call to `person.save` would mark the resource as clean, thus the
second call would do nothing.)

## A better way?

I'm of the belief that each model should be associated with one--and
only one--repository. This would be the `:default` repository, except
where a user explicitly declares otherwise when setting up their
model. A `Person` would be associated with the default repository
_always_, regardless of the current repository context. In the example
below, the person would be persisted to the default repository even
though it's wrapped in another repo.

person = Person.new(:name => 'Michael Scarn')

DataMapper.repository(:second) do
person.save
end

DataMapper could provide a method for changing the default repository:

class Person
include DataMapper::Resource

# Tells DM that the Person model should be persisted
# to the :second repository.
set_repository :second

property :id, Serial
property :name, String
end

By doing this, users would never need to worry about repository
context outside of their models, making their domain objects much more
straight-forward.

As far as I'm concerned, `Person` and `repo(:second) { Person }` are
two different models, with different interfaces, different properties,
and are stored in different repositories. The second Person should
probably be represented as another model, distinct from the first.

Since DataMapper doesn't congflate class inheritance with Single Table
Inheritance, we could use inheritance to achieve the same effect as
the current API:

class Person
include DataMapper::Resource

property :id, Serial
property :name, String
end

# Inherits properties from Person, but adds it's own
# custom properties, and persists to another repo.
class HRPerson < Person
set_repository :second
property :external_id, Integer
end

### Problems with this approach...

`Model#copy` would break. Well... it wouldn't just break. The entire
concept of copying resources across repositories would become
redundant.

## An alternative meaning for repository blocks

By doing away with the current meaning of repository blocks within
model instances, we free up the API to do something I think is much
more interesting: models which persist _across_ multiple repositories.

Let's take a (slightly contrived) example...

DataMapper.setup(:default, 'yaml://localhost/main')
DataMapper.setup(:human_resources, 'yaml://localhost/hr')

class Employee
include DataMapper::Resource

property :id, Serial
property :name, String
property :username, String
property :password, String

repository(:human_resources) do
property :salary, Integer
property :pay_on, Date
end
end

Our employee model has six properties: `name`, `username`, and
`password` will be persisted to the default repository, while `salary`
and `pay_on` will be persisted to the human resources repository.
`id`, since it is a key, is used in _both_.

Let's create a employee...

Employee.create(
:name => 'Michael Scarn',
:username => 'mscarn',
:password => '12345',
:salary => 2000,
:pay_on => Date.today
)

Here's what would happen "under the hood":

1. We assume that the key is generated by the model's default
repository. In the absence of a `set_repository` statement, DataMapper
assumes `:default`.

2. DataMapper then saves the resource to the default repository. In
this example it persists the name, username, and password, and returns
the ID which was generated.

3. It then proceeds to persist the salary and pay_on attributes to the
human resources repository with the ID returned by the default repo.

Our storage ends up looking a little like this:

# default/employees.yaml
- id: 95143
name: "Michael Scarn"
username: "mscarn"
password: "12345"

# hr/employees.yaml
- id: 95143
salary: 2000
pay_on: 2010-02-01

### Lazy loading from multiple repositories

Loading a resource without specifying which fields you want to load
would work in a way similar to lazy loading.

user = User.get(95143)

This loads the User with `id`, `name`, `username`, and `password` from
the default repository. Calling `user.salary` would load all of the
attributes which belong to the human resources repository.

user = DataMapper.repository(:human_resources) do
User.get(95143)
end

# ... or ...
user = User.get(95143, :repository => :human_resources)

This loads the User with `id`, `salary`, and `pay_on` from the human
resources repository. Calling `user.name` would load all of the
attributes which belong to the default repository.

### Finishing up

I think this behaviour has a lot of potential: In many web
applications developers have made the compromise of denormalising data
in order to improve performance. DataMapper could instead provide an
API to store these denormalised "cache" attributes in a fast key/value
store.

class Journey
include DataMapper::Resource

property :id, Serial
property :start_at, String
property :end_at, String

repository(:redis) do
property :really_expensive_computation, String
end
end

Tony Mann

unread,

Feb 1, 2010, 2:18:18 PM2/1/10

to datam...@googlegroups.com

Anthony,

I read your post, but I need time to process it. In general, DM's notions of repositories has always seemed a little "off" to me, and your proposal strikes me as a huge step in the right direction.

In our code, we use multiple repositories, and have never been able to get the identity map working correctly because of this. The fact that the identity map is not "automatic" is a clear sign that something needs to be cleaned up design-wise.

..tony..

--
You received this message because you are subscribed to the Google Groups "DataMapper" group.
To post to this group, send email to datam...@googlegroups.com.
To unsubscribe from this group, send email to datamapper+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/datamapper?hl=en.

Xavier Shay

unread,

Mar 6, 2010, 12:36:54 AM3/6/10

to DataMapper

Your proposal fits with all of my multi-repository use cases, I can't
think of one for the existing behaviour. As you mention, you can
achieve it with inheritance (I'd probably use mixins but whatever).

Ted Han

unread,

Mar 6, 2010, 6:39:42 PM3/6/10

to datam...@googlegroups.com

I think i disagree with the general motivation here.

The ability to arbitrarily map models and properties over data sources is a vital tool when pulling data out of legacy stores.

What we need to figure out a way to do is separate the notion of repositories as scopes/contexts (which i use regularly) from the ability to give fine grained control over how a model is defined across stores. Basically when it comes down to it, i don't like the idea of coupling models to specific repositories. Models are mutable for a reason. That's one of the things that impressed me so much about DM when i first started using it. You just write your description of what data you want access to, and boom, you've got querying over your data.

I think also that part of the lack of clarity may arise out of the fact that once your data is in ruby land (i.e. you've got an instance) you've lost all sense of how that data was mapped into ruby. Perhaps if there were better ways to interrogate repositories, and/or drill down on how results were gathered from repositories, some of clarity issues would be alleviated.

Regarding the identity map Tony, that seems to me like something that needs to be fixed in the ID map. But again maybe making DM smarter about repositories would make that easier.

Xavier Shay

unread,

Mar 6, 2010, 7:01:07 PM3/6/10

to DataMapper

On Mar 7, 12:39 pm, Ted Han <t...@knowtheory.net> wrote:
> The ability to arbitrarily map models and properties over data sources is a
> vital tool when pulling data out of legacy stores.

You disagree with the inheritance/mixin approach? Could you give an
example of what you are doing? I imagine something like this:

old_model = repository(:old) { Model.get(1) }
repository(:new) { Model.create(old_model) }

I find the scoping clunky, much prefer:
old_model = Old::Model.get(1)
New::Model.create(old_model)

Or is this not the case you're talking about?

Ted Han

unread,

Mar 6, 2010, 7:39:11 PM3/6/10

to datam...@googlegroups.com

Actually... i think i'd misread Anthony's email.

His last suggested API is something i could get behind – taking the notion of repositories as scopes, and pushing that down into the model as well. The only thing i'd want to see is the ability to inject or alter the repositories on the fly somehow.

I will give it more thought.

Incidentally dkubb has talked some about abstracting away knowledge of repositories from models, which is something i could get behind. Something like an inverted mapping pattern would be interesting to see. (i.e. you ask a model to save, it looks up how to route itself to stores somewhere, which has defaults which you could override)

Reply all

Reply to author

Forward