Repository-session save/dirty?/valid?

Ashley Moran

unread,

Jul 7, 2009, 3:57:49 AM7/7/09

to DataMapper, Andy Shipman

Hi all,

Between me and mrship, we've probably spend the best part of a month
battling with DataMapper. While there have been (quite) a few boring
old this-doesn't-quite-work-right bugs, the biggest issue was my
misunderstanding about the level of abstraction DataMapper provides
over persistence. Specifically, I thought it was possible to much
more in memory than it actually is. By assuming DataMapper does not
suffer some of the limitations of ActiveRecord, I led us unwittingly
into a labyrinth of strange and often downright surreal situations.

The first thing to highlight this was dm-is-list. We lost days
figuring out what was going wrong with our code - only to find it was
a combination of dm-is-list attempting to save on any list update, and
DataMapper 0.9's save order (children before parents). DataMapper
0.10RC improves the situation, but eventually, writing the specs for a
dm-is-list-based queue structure (and, later, a compound object that
manipulated two queues) became so unwieldy due to artificial save
calls that we ripped it out. Eventually we decided to purge dm-is-
list from the whole application.

This led us to believe that we could move to entirely in-memory data
structures. For example, the queue we described, we re-implemented by
loading the elements into a plain Array, and recreating every join
table row (destroy all existing, save all in memory) upon a save.
This is not ideal for large structures, but is suitable for our ~20
elements. As we expanded this in-memory nature of the specs to cover
the whole app, we uncovered more subtle issues. For example, the way
you associate a parent and child affects what objects can be
saved[939]. We tried to get more of the object graph saved[940] in
one go, which has been improved for the next release, but still leaves
us not knowing how much of the graph will be saved. And even if an
object does attempt to save the full connected graph, we found an edge
case where the children of non-dirty parents would not get saved[950].

We actually now have (unless I missed something), specs that
manipulate our entire domain layer in memory - I'm fairly sure you
could run our app in memory if it wasn't over HTTP, with issues
possibly only occurring with created_at/updated_at fields, if these
are set at save time rather than object initialisation time. This
data-store freedom is, IMO, a Good Thing. Every situation where the
need to make persistence calls has been imposed on us, the code has
become significantly harder to spec and understand.

Not everyone agrees that a call to Resource#save should save the whole
connected object graph. In my situation, this would be ideal
behaviour, but it may cause problems for other people. (Without
seeing code though, I don't know why.) In either case, though, I
think a better solution is needed.

One of the features of SQLAlchemy[SA] I pine for most in a Ruby ORM is
"Commit entire graphs of object changes in one step". This is a holy
grail to me. DataMapper already has an identity map, so it knows
about every resource in memory. How much work would it be to use this
to implement Repository#save, Repository#dirty and Repository#valid ?
(In the crudest case, you could surely just iterate over the identity
map?)

Having repository-based methods would solve our problem of wanting to
manipulate the entire object graph in memory, but not impose this on
people who can't do this due to the nature of their domain models.

WDYAT?

Thanks for any feedback,
Ashley

[939] http://datamapper.lighthouseapp.com/projects/20609/tickets/939-assigning-a-parent-to-a-child-does-not-work-the-same-as-assigning-a-child-to-a-parent

[940] http://datamapper.lighthouseapp.com/projects/20609/tickets/940-datamapper-010-does-not-recursively-save-parent-resources

[950] http://datamapper.lighthouseapp.com/projects/20609/tickets/950-resourcedirty-does-not-consider-associations

[SA] http://www.sqlalchemy.org/

--
http://www.patchspace.co.uk/
http://www.linkedin.com/in/ashleymoran
http://aviewfromafar.net/
http://twitter.com/ashleymoran

dbussink

unread,

Jul 7, 2009, 5:31:05 AM7/7/09

to DataMapper

On Jul 7, 9:57 am, Ashley Moran <ashley.mo...@patchspace.co.uk> wrote:

> Not everyone agrees that a call to Resource#save should save the whole
> connected object graph. In my situation, this would be ideal
> behaviour, but it may cause problems for other people. (Without
> seeing code though, I don't know why.) In either case, though, I
> think a better solution is needed.

My biggest issue with this is clarity. Why call save on a specific
resource anyway if it always saves everything? This alone would mean
to me that it should move to another object because it apparently
isn't Resource's concern. Naming methods and putting them in the right
place is of major importance, especially for a framework like
DataMapper.

> One of the features of SQLAlchemy[SA] I pine for most in a Ruby ORM is
> "Commit entire graphs of object changes in one step". This is a holy
> grail to me. DataMapper already has an identity map, so it knows
> about every resource in memory. How much work would it be to use this
> to implement Repository#save, Repository#dirty and Repository#valid ?
> (In the crudest case, you could surely just iterate over the identity
> map?)

I agree with providing a Repository#save. Imho it more correctly
confers what it does, since if I would encounter this method for the
first time I'd probably think that it means what it probably does :).
It should possibly use the IdMap for a first try, don't know what the
best case is though for handling code outside a repository block.

--
Regards,

Dirkjan Bussink

Ashley Moran

unread,

Jul 7, 2009, 9:36:21 AM7/7/09

to DataMapper

On 7 Jul 2009, at 10:31, dbussink wrote:

> My biggest issue with this is clarity. Why call save on a specific
> resource anyway if it always saves everything? This alone would mean
> to me that it should move to another object because it apparently
> isn't Resource's concern. Naming methods and putting them in the right
> place is of major importance, especially for a framework like
> DataMapper.

Yes, I agree. I now think that Dan's change as described in [940] is
appropriate. An object should only save as much of the object graph
is it *needs*, in order to save itself. However, there are still
problems with this involving infinite loops and quadratic time saves,
at least when before/after filters are used. And, of course, we still
need an elegant way to save *everything*.

> I agree with providing a Repository#save. Imho it more correctly
> confers what it does, since if I would encounter this method for the
> first time I'd probably think that it means what it probably does :).
> It should possibly use the IdMap for a first try, don't know what the
> best case is though for handling code outside a repository block.

I don't think this can be solved when not using a repository block.
Or at least, my first impression is that it's not worth trying to
solve. But I've had a quick go at this using a repository block, and
documented the effort in [959]. WDYT?

Ashley

[940] http://datamapper.lighthouseapp.com/projects/20609-datamapper/tickets/940-datamapper-010-does-not-recursively-save-parent-resources#ticket-940-13

[959] http://datamapper.lighthouseapp.com/projects/20609-datamapper/tickets/959-repositorysave

Reply all

Reply to author

Forward