I've been thinking a little about defining properties in DataMapper,
particularly relating to how they work within repository blocks. I've
posted some ideas to my site: http://antw.me/thoughts/datamapper-property-api.html
If you prefer to read them here instead (sans links and syntax
highlighting), I've included the full message below.
I'd love to get some feedback; let me know what you think...
Anthony.
TRANSCRIPT:
In my own dm-core fork on Github I've recently been experimenting with
ways to trim down both Resource and Model, extracting specific
functionality out to separate classes and modules. I've started by
relieving Resource of the need to take care of attributes, creating an
AttributeSet class which holds all of a Resource's attributes, tracks
when they've been updated (marking them as dirty), and lazy-loading
attributes when needed.
Although not yet pushed to Github, my latest commits pass the full dm-
core spec suite against SQLite3 and PostgreSQL, but have 3-4 failures
with the InMemory and Yaml adapters. I eventually tracked this down to
an example where a Property is defined on a Model within the context
of a specific repository. For example:
require 'dm-core'
DataMapper.setup(:default, 'in_memory://localhost/one')
DataMapper.setup(:second, 'in_memory://localhost/two')
class Person
include DataMapper::Resource
property :id, Serial
property :name, String
repository(:second) do
property :external_id, Integer
end
end
This creates a Person model with two attributes: `id` and `name`, and
a third `external_id` attribute which applies only when using the
model within the `:second` repository context:
DataMapper.repository(:second) do
Person.create(:name => 'Michael Scarn', :external_id => 1)
end
In then realised my AttributeSet implementation didn't account for the
properties of a Model changing depending on the current repository
context. It then led to me thinking a little more about the purpose--
and usefulness--of being able to define models in this way.
I'd like to--perhaps a little presumptuously--suggest that this
functionality isn't as nice as it first seems, provide an alternative
means for achieving the same result, and elaborate on how I think such
repository blocks should work.
### Inconsistent instance API
Allowing a user to wrap properties in a repository block results in a
model changing it's behaviour depending on external state (the current
repository context). At one moment the resource has an `external_id`
attribute, and in the next the attribute seems to disappear.
DataMapper.repository(:second) do
Person.new(:external_id => 1)
end
# => #<Person @id=nil @name=nil>
Wait... where did the `external_id` attribute go? In fact the
attribute was set, it just doesn't appear since `Person#inspect` was
called outside of the repository block...
DataMapper.repository(:second) do
puts Person.new(:external_id => 1).inspect
end
# => #<Person @id=nil @name=nil @external_id=1>
Aha! There it is. Trying to set the `external_id` attribute outside of
the `:second` repository context will also (rightly) fail.
### Ambiguity as to where a resource is saved
DataMapper's repository context allows you to save any Resource to any
defined repository (providing they support the same features).
person = Person.new(:name => 'Samuel L. Chang')
# Now that I have my resource, I can save it to wherever I
# want... By default, the resource will be saved in the
# :default repository
person.save
# Alternatively, I can specify a different repository...
DataMapper.repository(:second) do
person.save
end
While this is an interesting feature, I'm struggling to come up with a
reason why you'd _want_ to do this. To me it just introduces
ambiguity:
> Erm, where did I save that person instance? I'm sure it's around here
> somewhere... Where are you little person instance? Peekaboo!
>
> <cite>Me, a year later.</cite>
In reality, so long as you're explicit about wrapping parts of your
application in the correct repository blocks, this is not a problem.
But wherever you have to be explicit there is the possibility that
someone will forget; forgetting _just once_ might be enough to cause
obscure bugs.
If the `external_id` attribute was set to disallow nil, the second
call to `person.save` in the above example would fail, since no value
was set. (In fact, the above example would fail anyway, since the
first call to `person.save` would mark the resource as clean, thus the
second call would do nothing.)
## A better way?
I'm of the belief that each model should be associated with one--and
only one--repository. This would be the `:default` repository, except
where a user explicitly declares otherwise when setting up their
model. A `Person` would be associated with the default repository
_always_, regardless of the current repository context. In the example
below, the person would be persisted to the default repository even
though it's wrapped in another repo.
person = Person.new(:name => 'Michael Scarn')
DataMapper.repository(:second) do
person.save
end
DataMapper could provide a method for changing the default repository:
class Person
include DataMapper::Resource
# Tells DM that the Person model should be persisted
# to the :second repository.
set_repository :second
property :id, Serial
property :name, String
end
By doing this, users would never need to worry about repository
context outside of their models, making their domain objects much more
straight-forward.
As far as I'm concerned, `Person` and `repo(:second) { Person }` are
two different models, with different interfaces, different properties,
and are stored in different repositories. The second Person should
probably be represented as another model, distinct from the first.
Since DataMapper doesn't congflate class inheritance with Single Table
Inheritance, we could use inheritance to achieve the same effect as
the current API:
class Person
include DataMapper::Resource
property :id, Serial
property :name, String
end
# Inherits properties from Person, but adds it's own
# custom properties, and persists to another repo.
class HRPerson < Person
set_repository :second
property :external_id, Integer
end
### Problems with this approach...
`Model#copy` would break. Well... it wouldn't just break. The entire
concept of copying resources across repositories would become
redundant.
## An alternative meaning for repository blocks
By doing away with the current meaning of repository blocks within
model instances, we free up the API to do something I think is much
more interesting: models which persist _across_ multiple repositories.
Let's take a (slightly contrived) example...
DataMapper.setup(:default, 'yaml://localhost/main')
DataMapper.setup(:human_resources, 'yaml://localhost/hr')
class Employee
include DataMapper::Resource
property :id, Serial
property :name, String
property :username, String
property :password, String
repository(:human_resources) do
property :salary, Integer
property :pay_on, Date
end
end
Our employee model has six properties: `name`, `username`, and
`password` will be persisted to the default repository, while `salary`
and `pay_on` will be persisted to the human resources repository.
`id`, since it is a key, is used in _both_.
Let's create a employee...
Employee.create(
:name => 'Michael Scarn',
:username => 'mscarn',
:password => '12345',
:salary => 2000,
:pay_on => Date.today
)
Here's what would happen "under the hood":
1. We assume that the key is generated by the model's default
repository. In the absence of a `set_repository` statement, DataMapper
assumes `:default`.
2. DataMapper then saves the resource to the default repository. In
this example it persists the name, username, and password, and returns
the ID which was generated.
3. It then proceeds to persist the salary and pay_on attributes to the
human resources repository with the ID returned by the default repo.
Our storage ends up looking a little like this:
# default/employees.yaml
- id: 95143
name: "Michael Scarn"
username: "mscarn"
password: "12345"
# hr/employees.yaml
- id: 95143
salary: 2000
pay_on: 2010-02-01
### Lazy loading from multiple repositories
Loading a resource without specifying which fields you want to load
would work in a way similar to lazy loading.
user = User.get(95143)
This loads the User with `id`, `name`, `username`, and `password` from
the default repository. Calling `user.salary` would load all of the
attributes which belong to the human resources repository.
user = DataMapper.repository(:human_resources) do
User.get(95143)
end
# ... or ...
user = User.get(95143, :repository => :human_resources)
This loads the User with `id`, `salary`, and `pay_on` from the human
resources repository. Calling `user.name` would load all of the
attributes which belong to the default repository.
### Finishing up
I think this behaviour has a lot of potential: In many web
applications developers have made the compromise of denormalising data
in order to improve performance. DataMapper could instead provide an
API to store these denormalised "cache" attributes in a fast key/value
store.
class Journey
include DataMapper::Resource
property :id, Serial
property :start_at, String
property :end_at, String
repository(:redis) do
property :really_expensive_computation, String
end
end
--
You received this message because you are subscribed to the Google Groups "DataMapper" group.
To post to this group, send email to datam...@googlegroups.com.
To unsubscribe from this group, send email to datamapper+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/datamapper?hl=en.
On Mar 7, 12:39 pm, Ted Han <t...@knowtheory.net> wrote:
> The ability to arbitrarily map models and properties over data sources is a
> vital tool when pulling data out of legacy stores.
You disagree with the inheritance/mixin approach? Could you give an
example of what you are doing? I imagine something like this:
old_model = repository(:old) { Model.get(1) }
repository(:new) { Model.create(old_model) }
I find the scoping clunky, much prefer:
old_model = Old::Model.get(1)
New::Model.create(old_model)
Or is this not the case you're talking about?