Best practices for using obects across Transaction/UOW borders

19 views
Skip to first unread message

jtuchel

unread,
Nov 5, 2020, 2:32:57 AM11/5/20
to glorp-group
As you may have read in my previous question about commits for Transactions with lots of objects loeded from the database, we are having issues with big numbers of objects in the caches.

This leads me to an old question I always had when it comes to effectively use objects across Transaction borders.

Example:
  1. We load a list of objects into memory in one dialog. Here we commit some change to some object (like the time somebody took a look at this list).
  2. The user double clicks one of these objects and opens a dialog to make some changes to this object and commits these changes and closes the dialog
  3. The initialt list of objects is updated and the user can choose to work on the details of another object
I guess this is not a very weird and strange scenario, and yet it is complicated from the perspective of an ORM. How do we make sure none of the Transactions (UOW) gets out of sync with the database?

What is the best practice for such a scenario.
  1. Do we use long UOWs by using #commitUnitOfWorkAndContinue?
  2. Is it best to use commitUnitOfWork and make sure the UOW in the dialog re-reads its object in a new UOW in order to track the changes? Simply handing the object to the dialog after a commitUOW for the logging of last access would detach the object from the database (the new Transaction doesn't know this object because it is not in the new Transaction's undoMap)
  3. Do people use #inTransactionDo: and take the full responsibility of all the bookkeeping?
We chose to use the ...AndContinue variants, because we thought this is the best compromise between speed and comfort. But it seems this approach has its limits when the number of objects gets large (see the other post).
We also thought that approach #2 might be dangerous, because something like

self session commitUnitOfWork.
self showDetailsDialogFor: self selectedObject.

has the danger of detaching the selected object or some objects it references from the database. It would then probably mean it is much better to just call other dialogs with the id of the "root object" to reload for editing, just to make sure the dialog reads the object and all its referenced objects in its own Transaction...

I hope I managed to communicate my concerns and questions. I'd be interested in other peoples' approach to this tricky problem. So far we thought we chose the best possible path but I am not so sure any more....

I look forward to a fruitful discussion and hope to read your comments and ideas soon.

Joachim




Alan Knight

unread,
Nov 5, 2020, 8:47:55 AM11/5/20
to glorp...@googlegroups.com
I always tried to keep transactions short. So my pattern would have been that if you have a list it's not in a transaction. When someone goes to edit an object, start a unit of work, refresh that object within the unit of work and open the dialog on it. By refreshing you should force it to re-read proxies as well, so it's a little bit slower to open but known to be consistent with the DB. And try to have the dialog take a single object, and to have changes that might happen within it not affect the rest of the list. Otherwise it gets a bit more complicated.

--
You received this message because you are subscribed to the Google Groups "glorp-group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to glorp-group...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/glorp-group/d7198a44-01fe-4317-acde-e68b01a6af65n%40googlegroups.com.

jtuchel

unread,
Nov 6, 2020, 3:44:18 AM11/6/20
to glorp-group
I am currently trying to do this in one of our dialogs as a prototype. But there are still a few things I don't really get my head around.


As a first attempt. I just started my details/edit dialog with

self dbSession rollbackUnitOfWork; beginUnitOfWork; refresh: myModelObjejct.

This does work. Every object I navigate to is re-read, so it seems like the proxifying of the tranisitveClosure etc. work as expected.

There is a problem, however: My dialog has an entry field (actualla a couple of them) which reads data from the database. Lots of objects are being read in order to make suggestions in this entry field. And there are certain business checks going on on this object and potentially lots of referenced objects. They all get read from the database. That is not a performance problem per se, but they all end up in the unitOfWork's cache and undoMap and whatnot. So the effect of this is close to zero, as long as these objects are read within the currentUnitOfWork.

The solution to this sounds easy: postpone the beginUnitOfWork until something is actually changed in the business object. But....

  • this also means postponing the refresh of the business object until we actually change it
  • this also means we need some kind of  transient memento for the business object and all objects that may be modified along the way while the user works on this dialog. We'd have to do something like #registerTransitiveClosure, just to find out which objects to copy as transient proxies of their persistent counterparts. Or we create such mementos by hand, which is likely to be error prone and may open flooding gates for errors that go undetected for a while (how could you possibly write tests this? "Is there any object we modified that is detached at the moment? I guess a WriteBarrier could come in handy at this problem...)
I'd be interested in opinions and ideas. How do you guys handle these issues?

I know this is not a new problem at all. This is a question every user of an ORM has to ask herself. We thought we found a good solution that carries us a long way in the use of ...AndContinue and just keeping everything, because our users typically have small numbers of objects. Until they hadn't.
Do you have pointers to literature on this complex of questions. I guess Object Transaction Framework is the keyword here? Has anybody built such a framework on top of Glorp? Is everybody implementing their own handmade ad-hoc version of this?

Joachim

Tom Robinson

unread,
Nov 6, 2020, 7:01:17 AM11/6/20
to glorp...@googlegroups.com, jtuchel
Hi Joachim,

At Washington Mutual, we had this problem with GemStone. We ended up building our own mementos for objects before doing the rollback. The VAST write barrier was involved, but it's been 18 years and I don't remember the details.

Store has this problem as well. It really doesn't need the undoMap because all of the modified Store objects are derived from image resident objects of a different class. Niall and I found this shortly before I left Cincom. I don't know whether he has done something to fix it or not.  What he did is likely to be Store specific though, and not part of Glorp (I think).

I would ask Alan and yourself if it might make sense to create ReadOnlyGlorpSession as a subclass of GlorpSession. ReadOnlyGlorpSession would retrieve objects from the database and instantiate them as immutable using the VA or VW immutability. It would not register the objects in the undoMap and it would not allow units of work or transactions. It would cache objects to avoid performance issues with recursive data structures and duplicate references. This assumes that there is an obvious line between objects being modified and objects never being modified. It could be problematic if you tried to read an object in using a read only session and then tried to modify it subsequently. I probably haven't sniffed out all of the implementation complications...

Tom

Alan Knight

unread,
Nov 6, 2020, 9:11:23 AM11/6/20
to glorp...@googlegroups.com, jtuchel
In the TOPLink world we definitely had people who kept a separate session around for 'constants'. So you had a lot of things that were read from the database, because they might change, but not within the lifetime of a user login. Lists of countries, postal codes, enumerated types, etc. So that session would never have a unit of work active, and would never refresh. From the point of view of the main session, they weren't database objects. It would be pretty easy to formalize that by making a session subclass that wouldn't let you start a unit of work and would mark everything immutable on read.

If the objects shouldn't be all in-memory, e.g. they're somewhat related to the object being edited, but not something that can change, then you could conceivably run a separate session for them, read them more directly instead of through proxies and just discard that whole session afterwards. The danger with that is that if you let objects leak between sessions or stay around afterward their session is gone or no longer knows about them, you can get errors that are very painful to debug.

If you can't segregate the objects at all, that's trickier. I guess you could run the whole dialog in a separate session, which had different descriptors for the objects that could be modified. I don't even remember now if Glorp had the ability to just mark a whole descriptor read-only, but it seems like a good ability. In fact, I think it would be pretty useful to have a feature of startUnitOfWorkWhereWeWillOnlyModify: [ User, UserContactInfo, UserPreferences] and all other descriptors are automatically read-only for that context.

Esteban Maringolo

unread,
Nov 8, 2020, 8:05:44 PM11/8/20
to GLORP Mailing List, jtuchel
Hi,

I would favor an immutability based read-only session, this would save both memory and time by not keeping the rowmaps around.

For an ORM we built in another company (and that is still in use) we used Dolphin's immutability exception handling, so after reading an object was marked as immutable, and when there was an attempt to mutate it we handled the signalled exception and used that to mark the object as dirty, etc. Of course no rowmap was available in memory, and neither in memory rollbacks, but it was seldom the case that we couldn't handle that situation by reloading the object from the DB and doing a one way become if extremely necessary.

Regards,

ps: I'm currently in the need for something like this because I have small web app that uses GLORP and Seaside, so each Seaside session [1] has its own GLORP session [2], and this in turn has its own DB connection [3] as well, and for the end user the data never changes (it's read only), only the backend UI/APIs modify the data, my only concern is that the read-only sessions show the most recent data committed to the DB from other sessions. As much I have 200 concurrent sessions at peak load, but saving me from keeping all that state that I'm not going to use would be useful.

[1] This isn't harmful, and it is lightweight.
[2] I might need to pool these sessions, but I don't want to get into situations where GLORP "phases" overlap each other
[3] Pharo currently doesn't have a Pooled Database Accessor, so I'm relying on pg_bounce on the server side, and so far works transparently.

Esteban A. Maringolo


Esteban Maringolo

unread,
Nov 9, 2020, 12:21:35 PM11/9/20
to Tom Robinson, GLORP Mailing List
Hi Tom,

But aren't these lookups going through the cache first when read? [1]

Last time I checked the `ObjectBuilder>>buildObjectsForRow:` checked whether the instance was in the cache first via `#findInstanceForRow:proxyType:`.

In any case I'm not at the point where that memory use is the next factor to optimize, as I said, pooling Glorp sessions and having "fresh" instances all the time without having to "rebuild the world" on each query are my current next things to optimize. 
I have an explicit inReadOnlyUnitOfWorkDo: to avoid any accidental mutation to be stored at the end of the block, it simply does an #rollbackUnitOfWork instead of committing.

Thanks!

[1] I actually have a "modified" version of the Cache to properly handle hierarchical type resolvers (so I have a single cache for all the hierarchy).

Esteban A. Maringolo


On Mon, Nov 9, 2020 at 1:48 PM Tom Robinson <zxrob...@gmail.com> wrote:
Hi Estaban,

I would think that you need to keep the rowmaps or alternatively, a cache of all read only primarykeys for each class of read only objects. Simple schemas might not require this, but as soon as you have many-to-many relationships, you need a way to avoid loading duplicate instances. Persons with multiple addresses and addresses with multiple persons are a good example. If you're only loading a few objects this way, duplication might not matter, but if you have lots, it could lead to more bloat than the rowmaps do.

Tom

jtuchel

unread,
Nov 10, 2020, 1:01:06 PM11/10/20
to glorp-group
Esteban,

we have a very similar architecture: every Seaside session has its own GlorpSession. They all share a single DB connection. This has been working for quiet a while and has never shown to be a problem. I am aware that we can hit a wall here, but we run enough images in parallel, so that we have some way to go before that.

I agree a modus operandi in which an Object realizes "Oops, they're changing me, I'd probably better register myself with the active GlorpSession" and avoids keeping thousands of rowMaps around. Unfortunately, afaik, VAST has no WriteBarrier or beMitable: false. So this wouldn't work easily in VAST.

The second best would probably be a way to send out a Query to the database that doesn't register the objects. This would still leave me with the heavy burden of keeping track of what may have to be registered before committing. This is close to the second session approach. Both have the big downside that they may lead to duplicate inserts because teh developer might forget to register an object. Example: I get a list of wheels in a read-only session and add it to my currently edited car. When I commit without explicitly registering this wheel in the current session as old, it will most likely decide this wheel is new and has to be INSERTed. What a mess... It is way too easy to make such mistakes and quite hard to find if you have magically duplicating objects in your database.

I can think of a workaround for this, but am not sure if that's really a good idea. I'll just draft it here.
What if we define a pair of Sessions, one for committing (let's call it the active session) and one that is kind of like the purchasing department or a spare parts warehouse for the active one. Let's call it passive.
The passive session does keep ist registered objects, but does not commit or rollback (or maybe it does rollback, but that would only proxify all objects or clear the caches)

When these two know each other, you could think of a variant of registerTransitiveClosure: and friends . Whenever the active session needs to decide whether an object is to be inserted or updated, it first looks at its own registeredObjects. If the objects is there, evreything works like it does now. But if not, it asks the passive Session if the object is registered there. If so, it registers the object with itself and continues like before.

Sounds easy for me at first glance, but I am sure I am not understanding all the consequences this may have. Not sure about deletions, for example. Or differing lock keys (these will probably work like they do now?).

Ideas? Counter-arguments?

Joachim

jtuchel

unread,
Nov 10, 2020, 1:09:40 PM11/10/20
to glorp-group
Tom,

what WriteBarrier are you referring to? Are you sure it was in VAST? I can only find abrInstancesAreImmutable, which just marks ALL instances of a Class immutable. I can find no implementors of #beMutable: or the like.

A ReadOnlyGlorpSession sounds like a good idea. Especially if we find a way to automate the registration of objects that croos the border to a "normal" session and want to be managed in the other session as well (is it new? is it changed?). This would greatly minimize the load on the "normal" GlorpSession's rowMaps. The ReadOnlySession should however keep track of objects that it read.

Regards

Joachim

Esteban Maringolo

unread,
Nov 10, 2020, 9:07:36 PM11/10/20
to GLORP Mailing List
Hi Joachim,


On Tue, Nov 10, 2020 at 3:01 PM jtuchel <jtu...@objektfabrik.de> wrote:
>
> Esteban,
>
> we have a very similar architecture: every Seaside session has its own GlorpSession. They all share a single DB connection. This has been working for quiet a while and has never shown to be a problem. I am aware that we can hit a wall here, but we run enough images in parallel, so that we have some way to go before that.

I never thought of sharing a single connection among different
GlorpSessions, is it a connection or a "transparent" connection pool?
I guess that since it's read only you don't have to deal with
transactions happening in different sessions (seaside/glorp).

> I can think of a workaround for this, but am not sure if that's really a good idea. I'll just draft it here.

> When these two know each other, you could think of a variant of registerTransitiveClosure: and friends . Whenever the active session needs to decide whether an object is to be inserted or updated, it first looks at its own registeredObjects. If the objects is there, evreything works like it does now. But if not, it asks the passive Session if the object is registered there. If so, it registers the object with itself and continues like before.

> Sounds easy for me at first glance, but I am sure I am not understanding all the consequences this may have. Not sure about deletions, for example. Or differing lock keys (these will probably work like they do now?).
>
> Ideas? Counter-arguments?

Well... it doesn't sound easy for me, that's not a counter argument at
all, but sometimes you hit limits with ORMs, and GLORP is programmed
in a way to make everything transparent (saves, deletes, etc.), but I
find the way of doing such stuff overwhelming for my use case.

In the context of a long lived stateful session such as a desktop app
or a continuations based Seaside app, then that model might work well,
but in a request/response world, having a good balance between
"recreate the world" and "have everything in memory" is paramount,
what seems to be missing is a "layering" of sessions for such
scenarios.

As a little contextual information, when doing this "in haus" ORM
(~2005) I did look at other popular ORMs, I remember that I was
impressed with Java's Hibernate because it has this concept of
intermediate cache/session, where the end user session was connected
to the "global" cache, but it let each user session to work as you
currently do with Glorp, with the benefit that if you modify one
object in other session, all other sessions were notified about it.
How it did that is beyond my understanding, and if the
DescriptorSystem is hell sometimes, well.... doing the same with XML
was like hitting your fingers with a hammer.

Of course this is a pipe dream for the current use base of Glorp, any
modification in this direction would mean a complete overhaul with an
open ended outcome.

Regards,
Reply all
Reply to author
Forward
0 new messages