Tracking instance commit status

22 views
Skip to first unread message

Wibowo Arindrarto

unread,
Aug 11, 2016, 3:30:39 AM8/11/16
to sqlalchemy
Dear all,

I have a little problem with getting object states that have been flushed but not committed. As far as I understand, SQLAlchemy does not have an official object state corresponding to this. However, in my application I need to track such objects. My current approach is to listen to the ``after_flush`` session event, similar to the one outlined here. However, that gets messy quite quickly if I want to track different instances from different tables as the approach uses a global db.session.info state.

For the record, I'm also open to suggestions / critiques on whether tracking flushed but not committed objects are a good idea. The reason I am doing this is because I need to do something like ``get_or_create`` (get a database record if present, create it otherwise). But I'm doing this multiple times, and some objects may be a container of multiple other objects. To make the ``get_or_create`` query of these container objects simpler, I store the hash of the object IDs that it contains. Doing so means that I need to have all the contained object IDs, which means I have to at least flush the contained objects first before I can run ``get_or_create`` on the container object.

Now, my question is, is it possible to extend my models such that I can track the commit status of it? If so, how do I do it? If not, is there a better way to tackle my underlying problem (i.e. doing ``get_or_create`` on a container object)? I am aware of the ``persistent`` state, but it does not make the distinction on the commit status of an object.

Thank you in advance,
Bow

Simon King

unread,
Aug 11, 2016, 4:33:02 AM8/11/16
to sqlal...@googlegroups.com
I don't really understand the question here - it's not clear to me why
get_or_create for the container objects is any different than
get_or_create for any other object, or what that has to do with
flushed-but-not-committed records. Why do you need to treat objects
that have been flushed but not committed any differently to committed
ones?

Simon

Wibowo Arindrarto

unread,
Aug 11, 2016, 4:54:09 AM8/11/16
to sqlal...@googlegroups.com
Hi Simon,

The purpose was to flag those contained objects. These objects are created by the application's users (both container and contained). The user can either create the contained objects manually and let the application infer the container, or they can also create the containers manually. In cases where the contained objects are created manually (i.e. not in the context of a container), the behavior should be to reject on existing records. Otherwise, the behaviour should be to use existing records and only reject on the duplicates of the container itself (this is where I use the hash of the member IDs, to detect container duplicates). Since a container object can contain multiple objects, I want to be able to flag all of the contained objects which are preexisting.

Best regards,
Bow



--
You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/bUGk1QTaRIE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Gombas, Gabor

unread,
Aug 11, 2016, 5:08:36 AM8/11/16
to sqlal...@googlegroups.com
Well, it is somewhat similar to my use case: there is a part (kind of a library) of the application which may load an object or it may transparently create one if it did not exist before. And there is a completely different part of the application which wants to do so something only if the object was freshly created (because it would be too expensive to do unconditionally) - but it has no idea where the object is coming from. What I've done is adding an attribute which gets set to True in __init__() and False by the @reconstructor - a more complete implementation would probably want to reset the flag to False on objects which are alive when a transaction commits, but my use case does not need that complexity.
--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--------------------------------------------------------------------------------

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers If you cannot access these links, please notify us by reply message and we will send the contents to you. By communicating with Morgan Stanley you consent to the foregoing and to the voice recording of conversations with personnel of Morgan Stanley.

Mike Bayer

unread,
Aug 11, 2016, 12:13:44 PM8/11/16
to sqlal...@googlegroups.com


On 08/11/2016 03:30 AM, Wibowo Arindrarto wrote:
> Dear all,
>
> I have a little problem with getting object states that have been
> flushed but not committed. As far as I understand, SQLAlchemy does not
> have an official object state corresponding to this.

The Session is intended to only be working with objects within a
transaction. Outside of a transaction, there is nothing. Therefore
there's no such state as "this object exists outside of the
transaction", because in fact we have no idea. It could have been
deleted since we last loaded it. We can't get to that state unless we
started a new transaction.

Now I know what someone might be getting at with this, that is, they
want to access the object concurrently from some other transaction,
assuming isolation levels are compatible with this. But those patterns
are outside of the Session's built-in functionality, so your approach of
using the events to do what you want is fine.


However, in my
> application I need to track such objects. My current approach is to
> listen to the ``after_flush`` session event, similar to the one outlined
> here <https://groups.google.com/forum/#!topic/sqlalchemy/eGxpQBChXQw>.

sounds great.

> However, that gets messy quite quickly if I want to track different
> instances from different tables as the approach uses a global
> db.session.info state.

organize the objects into a dictionary that is keyed to their class and
the value is a further collection of the objects themselves. Or just
key them to (obj.__class__, obj.id) in a single dictionary.


>
> For the record, I'm also open to suggestions / critiques on whether
> tracking flushed but not committed objects are a good idea. The reason I
> am doing this is because I need to do something like ``get_or_create``
> (get a database record if present, create it otherwise). But I'm doing
> this multiple times, and some objects may be a container of multiple
> other objects. To make the ``get_or_create`` query of these container
> objects simpler, I store the hash of the object IDs that it contains.
> Doing so means that I need to have all the contained object IDs, which
> means I have to at least flush the contained objects first before I can
> run ``get_or_create`` on the container object.

this sounds like you need the primary keys, which you get from flush().
That's fine but has nothing to do with "committed".


>
> Now, my question is, is it possible to extend my models such that I can
> track the commit status of it? If so, how do I do it? If not, is there a
> better way to tackle my underlying problem (i.e. doing ``get_or_create``
> on a container object)? I am aware of the ``persistent`` state, but it
> does not make the distinction on the commit status of an object.

Like other replies here I'm not understanding the link between "commit"
and "get_or_create". Having a dictionary of objects that already
exists is a fine pattern to use for get_or_create but if you have the
object and a primary key, it's in. If this is a collection that is used
concurrently by many sessions, then I'd build a session-local copy of
your object map and then push it out to the "global" one within the
after_commit() event.




>
> Thank you in advance,
> Bow
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.

Wibowo Arindrarto

unread,
Aug 15, 2016, 1:45:13 PM8/15/16
to sqlal...@googlegroups.com
Dear everyone,

On Thu, Aug 11, 2016 at 11:08 AM Gombas, Gabor <Gabor....@morganstanley.com> wrote:
Well, it is somewhat similar to my use case: there is a part (kind of a library) of the application which may load an object or it may transparently create one if it did not exist before. And there is a completely different part of the application which wants to do so something only if the object was freshly created (because it would be too expensive to do unconditionally) - but it has no idea where the object is coming from. What I've done is adding an attribute which gets set to True in __init__() and False by the @reconstructor - a more complete implementation would probably want to reset the flag to False on objects which are alive when a transaction commits, but my use case does not need that complexity.

I have not looked at @reconstructor before, thanks for pointing it out :).


On Thu, Aug 11, 2016 at 6:13 PM Mike Bayer <mik...@zzzcomputing.com> wrote:
Like other replies here I'm not understanding the link between "commit"
and "get_or_create".   Having a dictionary of objects that already
exists is a fine pattern to use for get_or_create but if you have the
object and a primary key, it's in.  If this is a collection that is used
concurrently by many sessions, then I'd build a session-local copy of
your object map and then push it out to the "global" one within the
after_commit() event.


The requirement mainly boils down to the container/contained status of the object and that I need to flag them as preexisting if they already exist in the database permanently. This is where I need the flush/commit distinction: I will always need to flush first to get the object's IDs (so I can calculate its container unique ID and determine whether the container already exists or not). However at this point, looking at the 'persistent' status alone to determine whether an object is preexisting or not is not enough since it will return True for my newly flushed object and my previously committed records in the database. I can't then flag the object appropriately.

`get_or_create` was simply the function where I need to do this. I explained it hoping that it would give more context, but I see now that maybe I should've kept it to `commit` vs `flush` status :).

But anyway, thanks for the explanation on the local-session copy, Mike. I think that would also be good for me to solve my problem.

Best regards,
Bow

Jonathan Vanasco

unread,
Aug 15, 2016, 7:10:15 PM8/15/16
to sqlalchemy
You may be handle this at the database level using triggers.

There are some widely used techniques for emulating a "trigger on commit" for postgres and oracle (which don't have that), and that can be used to toggle a flag on the table.

But on most databases, you could also track onto the row the transaction_id for the original create or last update via a create/update trigger.  Then you can query the current transaction id from the server, and compare it to the object's columns.  That might be the easiest.  Have the create_id and upate_id as read-only fields in sqlalchey; they'll be populated on the query.

All you have to check is if the current transaction id is the same as the create_id on the row. 

   object | transaction_id | create_id | update_id | status       | misc
   a      | 1              | 1         |           | new          | new object
   b      | 1              | 1         | 1         | new          | new + edit
   c      | 2              | 1         | 2         | existed      | existed + edit
   d      | 3              | 1         | 2         | existed      | existed + no change
   
After you flush, you just compare the ids.  if the create id matches the transaction id, you know it's new.  If they don't, it's an edit.
Reply all
Reply to author
Forward
0 new messages