TypeDecorators don't know which database session to use

28 views
Skip to first unread message

Chris Wilson

unread,
May 28, 2019, 9:10:19 AM5/28/19
to sqlal...@googlegroups.com

Dear Michael,

 

I have discovered a limitation of TypeDecorators (custom column types): any one that uses the database (e.g. to load objects serialised in a custom way) has no way to know which database session to use. During initial load one can use a global session object, but expired attributes may need to be refreshed at any time, when the current session is not necessarily the one that loaded the attribute, or its parent object.

 

For example, consider a TypeDecorator that stores a large number of Cats by ID, compressed to save space in the database:

 

import struct

import zlib

 

from sqlalchemy import Column, Integer, LargeBinary, Text, TypeDecorator, create_engine

from sqlalchemy.ext.declarative import declarative_base

from sqlalchemy.orm import object_session, sessionmaker

 

Base = declarative_base()

current_session = None

 

 

class Cat(Base):

    __tablename__ = 'cat'

    id = Column(Integer, primary_key=True)

    name = Column(Text)

 

 

class CompressedCats(TypeDecorator):

    ''' A house can have so many cats that we need to compress them. '''

 

    impl = LargeBinary

    _struct_protocol_id = struct.Struct('<i')

 

    def process_bind_param(self, value, _dialect):

        # Typically the column can be nullable, so you can write None to it: support that.

        if value is None:

            return None

 

        # FIXME: allow specification of different protocol id, and

        # use the id to determine the protocol argument to pass to serialize.

        return zlib.compress(b''.join(self._struct_protocol_id.pack(cat.id) for cat in value))

 

    def process_result_value(self, value, _dialect):

        # Typically the column can be nullable, so you can read None from it: support that.

        if value is None:

            return None

 

        # FIXME: unpack and use protocol id to determine protocol argument to pass to deserialize.

        decompressed = zlib.decompress(value)

        cat_ids = [self._struct_protocol_id.unpack(decompressed[i:i + 4])[0] for i in range(0, len(decompressed), 4)]

        # BUG: Which session can we use here?

        global current_session

        return [current_session.query(Cat).filter_by(id=cat_id).one() for cat_id in cat_ids]

 

 

class House(Base):

    __tablename__ = 'house'

 

    id = Column(Integer, primary_key=True)

   # A house can have so many cats that we need to compress them:

    cats = Column(CompressedCats)

 

engine = create_engine('sqlite://')

engine.echo = True

Base.metadata.create_all(engine)

 

DBSession = sessionmaker(bind=engine)

session_1 = DBSession(autocommit=True, expire_on_commit=True)

 

with session_1.begin():

    current_session = session_1

    with session_1.begin_nested():

        cats = [Cat(id=i, name='Cat #{i}') for i in range(100)]

        session_1.add_all(cats)

        house = House(cats=cats)

        session_1.add(house)

 

session_2 = DBSession(autocommit=True)

with session_2.begin():

    current_session = session_2

    assert len(house.cats) == 100

    assert object_session(house.cats[0]) is session_1  # fails because the Cats were loaded from session_2 instead

 

    # We don't even need to do this to trigger the bug, unless we construct session_1 with expire_on_commit=False

    session_1.expire(house, ['cats'])

    assert len(house.cats) == 100

    assert object_session(house.cats[0]) is session_1

 

At this point, house.cats needs to be reloaded, but if we use the current session (session_2) then it will be incorrect.

 

Arguably this is the fault of the TypeDecorator in question, but it seems impossible to write one which is correct, because the session is not passed to process_result_value. _populate_full has access to it (in state.session), but does not pass it to getter (TypeDecorator.result_processor).
 
Please let me know if you can think of a solution or workaround.
 
Thanks, Chris.
 

 

 





This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice

Jonathan Vanasco

unread,
May 28, 2019, 12:02:59 PM5/28/19
to sqlalchemy


On Tuesday, May 28, 2019 at 9:10:19 AM UTC-4, Chris Wilson wrote:


During initial load one can use a global session object,


You should not do that.  Global sessions are widely considered an anti-pattern.

 I have discovered a limitation of TypeDecorators (custom column types): any one that uses the database (e.g. to load objects serialised in a custom way) has no way to know which database session to use.

While TypeDecorators are designed to offer control on how data is encoded/decoded, they aren't really designed to do what you're attempting (bootstrap an ORM relationship into a column, instead of a table).  This is the first time I've seen anyone try to do this, and I am a long time abuser of SqlAlchemy myself.

Mike or Simon may have other suggestions, but my immediate thought is that you can quickly accomplish what you're trying to do by just having a TypeDecorator for `.cat_ids` serialize/deserialize a list of ids, and then have a python @property for `def cats(self):` perform the sql needed to turn `self.cat_ids` into `self.cats`.  You'd be able to determine the session for the actual object via `object_session`


@property
def cats(self):
   if self._cats is None:
       if self.cat_ids:
           _session = object_session(self)
           self._cats  = _session.query(Cats).filter(Cat.id.in_(self.cat_ids)).all()
   return self._cats




Mike Bayer

unread,
May 28, 2019, 4:35:32 PM5/28/19
to sqlal...@googlegroups.com


On Tue, May 28, 2019, at 12:03 PM, Jonathan Vanasco wrote:


On Tuesday, May 28, 2019 at 9:10:19 AM UTC-4, Chris Wilson wrote:


During initial load one can use a global session object,



You should not do that.  Global sessions are widely considered an anti-pattern.

additionally, running SQL inside of a TypeDecorator is not the intended usage, as well as using ORM features inside of a TypeDecorator is also not the intended usage.


 I have discovered a limitation of TypeDecorators (custom column types): any one that uses the database (e.g. to load objects serialised in a custom way) has no way to know which database session to use.

While TypeDecorators are designed to offer control on how data is encoded/decoded, they aren't really designed to do what you're attempting (bootstrap an ORM relationship into a column, instead of a table).  This is the first time I've seen anyone try to do this, and I am a long time abuser of SqlAlchemy myself.
Mike or Simon may have other suggestions, but my immediate thought is that you can quickly accomplish what you're trying to do by just having a TypeDecorator for `.cat_ids` serialize/deserialize a list of ids, and then have a python @property for `def cats(self):` perform the sql needed to turn `self.cat_ids` into `self.cats`.  You'd be able to determine the session for the actual object via `object_session`


@property
def cats(self):
   if self._cats is None:
       if self.cat_ids:
           _session = object_session(self)
           self._cats  = _session.query(Cats).filter(Cat.id.in_(self.cat_ids)).all()
   return self._cats

yup, use a @property, this could work with the load/refresh events (see https://docs.sqlalchemy.org/en/13/orm/events.html?highlight=load#sqlalchemy.orm.events.InstanceEvents.load ) which was my first idea but the @property is possibly better since it's very simple.

There is also the issue of the bigger pattern that is in use here, why a denormalized list of integers would be stored in a single column and how that serves any purpose beyond what a regular association table would achieve.   Assuming this is not  a legacy schema, I would advise using regular relational patterns.







--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan Vanasco

unread,
May 28, 2019, 6:42:59 PM5/28/19
to sqlalchemy


On Tuesday, May 28, 2019 at 4:35:32 PM UTC-4, Mike Bayer wrote:

additionally, running SQL inside of a TypeDecorator is not the intended usage, as well as using ORM features inside of a TypeDecorator is also not the intended usage.


thanks, mike. I was 99.99% sure that was the case, but didn't want to risk bring wrong.

Chris Wilson

unread,
May 29, 2019, 6:22:48 AM5/29/19
to sqlalchemy
Hi all,

Thanks for the replies! Sorry, perhaps I wasn't clear, this is just a minimal example. We are actually storing serialized objects in a column, which can be e.g. dicts or lists of (dehydrated) SQLAlchemy objects, numpy arrays, etc. It's much faster to store (both read and write) a complex structure serialized into a single column than to split it across many tables. Generally this works fine for us, but I just discovered this unusual case, and couldn't see how to solve it.

Changing how we serialize objects is not realistically going to happen, so we are going to continue using TypeDecorators unless there is a better option that enables us to continue doing what we are currently doing.

If there were pre-load hooks as well as post-load, then we could set a global variable to object_session(parent_object) for the duration of the load. Alternatively, if the state or context was passed to the TypeDecorator then we could use it to get the session. But right now it appears that there is no hook that does what we want to do. Please would you accept my request to implement something like this?

Thanks, Chris.

Mike Bayer

unread,
May 29, 2019, 10:30:01 AM5/29/19
to sqlal...@googlegroups.com


On Wed, May 29, 2019, at 6:22 AM, Chris Wilson wrote:
Hi all,

Thanks for the replies! Sorry, perhaps I wasn't clear, this is just a minimal example. We are actually storing serialized objects in a column, which can be e.g. dicts or lists of (dehydrated) SQLAlchemy objects, numpy arrays, etc. It's much faster to store (both read and write) a complex structure serialized into a single column than to split it across many tables. Generally this works fine for us, but I just discovered this unusual case, and couldn't see how to solve it.

Note that we suggested approaches that work in conjunction with the serialized approach you gave, the @property approach and the mapper.load() event approach.    

However, if you are storing the fully serialized object in the column, like the whole Cat object, you don't need to emit a SQL query to restore it, for caching objects in serialized form you'd want to merge() it back into the Session with load=False so that no SQL is emitted.   You still need your Session though and of course, using a threadlocal variable is the best way to make that happen right now without changing the type API.   

If you are in fact looking to emit a SQL query for every occurrence of a value in a row, you will lose any performance gains you're getting here.    If you get 500 rows back and there are three of these columns in each row, that's 1500 additional SQL queries, and ORM-level object queries in particular are quite slow.   The @property approach suggested would mitigate this by only emitting a SQL query when called upon.   I would advise that any architecture that is taken on in the name of performance be measured, anytime I do performance work I use cProfile to validate and measure the positive effect of the change.




If there were pre-load hooks as well as post-load, then we could set a global variable to object_session(parent_object) for the duration of the load.

You shouldn't need "object_session(parent_object)",  there's only one Session in play at a time within a thread so just assign the Session to a thread local variable. 

The best hook to use here is simply the transaction-level hooks to set the current Session onto a global thread-local variable.  The ones that are SessionTransaction level should work well:




if you have multiple sessions per thread at a time, like an extra Session used for logging, the above events can swap out the previous Session already present and then restore it, to handle nesting of session contexts.




Alternatively, if the state or context was passed to the TypeDecorator then we could use it to get the session.

The context available is the ExecutionContext, however this isn't passed to the TypeEngine bind/result processor methods right now.  That might not be a bad idea in the future but for the moment would require a major breaking API change that cannot be made quickly or trivially.    An example of passing information between a Session and the execution-level context is at https://github.com/sqlalchemy/sqlalchemy/wiki/SessionModifiedSQL but this doesn't give you a way to get inside the TypeDecorator methods without using a global threadlocal.    If we did have the ExecutionContext passed to TypeEngine processors, this example would be part of how you'd be using it.   But we don't, so I think the threadlocal Session approach above should be very straightforward.









But right now it appears that there is no hook that does what we want to do. Please would you accept my request to implement something like this?

Thanks, Chris.

On Tuesday, 28 May 2019 23:42:59 UTC+1, Jonathan Vanasco wrote:


On Tuesday, May 28, 2019 at 4:35:32 PM UTC-4, Mike Bayer wrote:

additionally, running SQL inside of a TypeDecorator is not the intended usage, as well as using ORM features inside of a TypeDecorator is also not the intended usage.


thanks, mike. I was 99.99% sure that was the case, but didn't want to risk bring wrong.


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.

Chris Wilson

unread,
May 30, 2019, 8:42:08 AM5/30/19
to sqlalchemy
Hi Mike,

On Wed, 29 May 2019 at 15:30, Mike Bayer wrote:
Note that we suggested approaches that work in conjunction with the serialized approach you gave, the @property approach and the mapper.load() event approach.     

Unfortunately I think they would both require massive code changes. Everywhere that we use Column(CompressedBinary), we'd have to rename the column and add property getters and setters. Unless there's a way to hook into the instrumentation machinery to do that automatically?
 
However, if you are storing the fully serialized object in the column, like the whole Cat object, you don't need to emit a SQL query to restore it, for caching objects in serialized form you'd want to merge() it back into the Session with load=False so that no SQL is emitted.

We are not serializing the whole object, only the PK, but merging it back into the session with all its attributes expired, so that any attempt to access them triggers a load. That works well for us, unless the current session changes in the mean time (not the problem that I originally asked about, but a related one, that luckily isn't biting us right now).
 
   You still need your Session though and of course, using a threadlocal variable is the best way to make that happen right now without changing the type API.    

The problem isn't that we're passing objects between Threads, it's that we can use multiple sessions in the same thread. I noticed it while trying to create a reproducible test case in the debugger, which was switching to our test/scratch database, setting a local variable, and then exiting the context (back to our main database) with that local variable still in scope (with expired attributes). When the debugger rendered the repr() of the local variable, it causes its relationships to be loaded from the live database, which didn't compare equal to objects in the test database, so I had to restart the debugger every time this happened.

It's not critical but it was annoying because it made debugging much harder and slower than I thought it could/should be. I admit that this is a niche use case, so I consider this a feature request instead of a bug.
If there were pre-load hooks as well as post-load, then we could set a global variable to object_session(parent_object) for the duration of the load.

You shouldn't need "object_session(parent_object)",  there's only one Session in play at a time within a thread so just assign the Session to a thread local variable.  

The problem is with objects that have escaped the scope of one active Session, and are being refreshed when another is in scope.
 
The best hook to use here is simply the transaction-level hooks to set the current Session onto a global thread-local variable.  The ones that are SessionTransaction level should work well:



We already have a global variable for the current Session, but I need to get back to the one that was used to load the object (containing the TypeDecorated attribute) instead of the current one.
 
The context available is the ExecutionContext, however this isn't passed to the TypeEngine bind/result processor methods right now.  That might not be a bad idea in the future but for the moment would require a major breaking API change that cannot be made quickly or trivially.

Could it be added as an optional argument that is only passed if the recipient method is expecting it?
 
    An example of passing information between a Session and the execution-level context is at https://github.com/sqlalchemy/sqlalchemy/wiki/SessionModifiedSQL but this doesn't give you a way to get inside the TypeDecorator methods without using a global threadlocal.

Unfortunately even that doesn't help, I think, because the TypeDecorator is called after the statement has been executed, so there's no concept of a "current" load statement, only the last one, and I don't know for sure if that was really the same context/object/session that loaded the data being processed.

I'm looking at whether something like the mutable extension would have access to the parent object, to coerce data structures containing SQLAlchemy objects into serializable form on the way into the database, and coerce them back to SQLAlchemy objects after a load event.

Thanks again, Chris.

Mike Bayer

unread,
May 30, 2019, 9:21:02 AM5/30/19
to sqlal...@googlegroups.com


On Thu, May 30, 2019, at 8:42 AM, Chris Wilson wrote:
Hi Mike,

On Wed, 29 May 2019 at 15:30, Mike Bayer wrote:
Note that we suggested approaches that work in conjunction with the serialized approach you gave, the @property approach and the mapper.load() event approach.     

Unfortunately I think they would both require massive code changes. Everywhere that we use Column(CompressedBinary), we'd have to rename the column and add property getters and setters. Unless there's a way to hook into the instrumentation machinery to do that automatically?

All ORM events support propagation across all mappers, for InstanceEvents of wihch load() is a member this is documented at  https://docs.sqlalchemy.org/en/13/orm/events.html?highlight=event%20load#instance-events


The load() event would be applied to all mapped classes, or to a specific hierarchy / mixin hierarchy, if you prefer:

@event.listens_for(Base, "load", propagate=True)
def my_event(...):


The load event is per instance so you would need to emit your Session.query within the event.     There are many ways that Core and ORM events are applied automatically so this is not a problem.


As for the Column(), this is where the proposed pattern is the most broken, that a database Column would return an ORM mapped object bound to the Session directly.  This is not how the ORM works and I can't support this as a pattern going forward.    When an ORM object links to another object or a collection, relationship() is used for that.   The whole world of folks that have ORM objects that link to other ORM objects are using a Column to set up the database-level linkage and relationship() to represent the load of the object.   

In this case, you need something slightly different than a relationship(), you probably want to just have a plain list of some kind to which your load() handler appends, the addition of the @property can also be automated by using the mapper_configured event.      



 
However, if you are storing the fully serialized object in the column, like the whole Cat object, you don't need to emit a SQL query to restore it, for caching objects in serialized form you'd want to merge() it back into the Session with load=False so that no SQL is emitted.

We are not serializing the whole object, only the PK, but merging it back into the session with all its attributes expired, so that any attempt to access them triggers a load. That works well for us, unless the current session changes in the mean time (not the problem that I originally asked about, but a related one, that luckily isn't biting us right now).
 
   You still need your Session though and of course, using a threadlocal variable is the best way to make that happen right now without changing the type API.    

The problem isn't that we're passing objects between Threads, it's that we can use multiple sessions in the same thread. I noticed it while trying to create a reproducible test case in the debugger, which was switching to our test/scratch database, setting a local variable, and then exiting the context (back to our main database) with that local variable still in scope (with expired attributes). When the debugger rendered the repr() of the local variable, it causes its relationships to be loaded from the live database, which didn't compare equal to objects in the test database, so I had to restart the debugger every time this happened.


There are other ways to get at the context.    You can use the before_execute() and after_execute() events, scan the compiled objects for the tables that contain your datatypes in question, assign the connection being invoked to the global variable, within the connection.info you would have the Session.  This is many steps but can be done.    Unfortunately I don't have time today to write a proof of concept for this, however I'd sooner write one for you using the load() event, but again, I don't have time today to attend to this.



It's not critical but it was annoying because it made debugging much harder and slower than I thought it could/should be. I admit that this is a niche use case, so I consider this a feature request instead of a bug.

There has yet to be a mention of what "the feature" is exactly, I know how such a thing would have to work but I assume you have a different idea of it.



 
The context available is the ExecutionContext, however this isn't passed to the TypeEngine bind/result processor methods right now.  That might not be a bad idea in the future but for the moment would require a major breaking API change that cannot be made quickly or trivially.

Could it be added as an optional argument that is only passed if the recipient method is expecting it?

that's how the change would have to be which involves using getargspec() way ahead of time, emitting deprecation warnings, documenting, testing, all of that.  Additionally, this is a feature that nobody will ever use , this is the worst case kind of situation where the feature is breaking, complex, and it is for exactly one application that is doing something which we disagree is even a good idea.       

SQLAlchemy is moving away from adding APIs that support broken use cases as they spread confusion and maintenance burden so I really don't think this is a good idea.   You need at least the instance.load() event and at best you need some new hooks to emit queries at the end of an ORM load, such as, you could gather up all the Cat ids in one sweep at the end of a query and load them all.  An "after_query_results_loaded" event would be *VASTLY* more feasible.





 
    An example of passing information between a Session and the execution-level context is at https://github.com/sqlalchemy/sqlalchemy/wiki/SessionModifiedSQL but this doesn't give you a way to get inside the TypeDecorator methods without using a global threadlocal.

Unfortunately even that doesn't help, I think, because the TypeDecorator is called after the statement has been executed, so there's no concept of a "current" load statement, only the last one, and I don't know for sure if that was really the same context/object/session that loaded the data being processed.

I'm looking at whether something like the mutable extension would have access to the parent object, to coerce data structures containing SQLAlchemy objects into serializable form on the way into the database, and coerce them back to SQLAlchemy objects after a load event.

Thanks again, Chris.


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.

Mike Bayer

unread,
May 30, 2019, 9:40:54 AM5/30/19
to sqlal...@googlegroups.com


On Thu, May 30, 2019, at 9:21 AM, Mike Bayer wrote:


On Thu, May 30, 2019, at 8:42 AM, Chris Wilson wrote:
Hi Mike,

On Wed, 29 May 2019 at 15:30, Mike Bayer wrote:
Note that we suggested approaches that work in conjunction with the serialized approach you gave, the @property approach and the mapper.load() event approach.     

Unfortunately I think they would both require massive code changes.



re: "massive code changes", note that changing the TypeDecorator or TypeEngine API would mean that thousands of people would need to change their custom datatypes and third party dialects to accommodate for this change, which would be a deprecation warning as well as additional overhead.    I believe we can adjust your application without a "massive" code change and that the new patterns can be installed using up front automation.  However, consider that asking me to change the libraries' API so that you don't have to change your approach is still adding great inconvenience to other people not involved with this problem.




Everywhere that we use Column(CompressedBinary), we'd have to rename the column and add property getters and setters. Unless there's a way to hook into the instrumentation machinery to do that automatically?

All ORM events support propagation across all mappers, for InstanceEvents of wihch load() is a member this is documented at  https://docs.sqlalchemy.org/en/13/orm/events.html?highlight=event%20load#instance-events


The load() event would be applied to all mapped classes, or to a specific hierarchy / mixin hierarchy, if you prefer:

@event.listens_for(Base, "load", propagate=True)
def my_event(...):


The load event is per instance so you would need to emit your Session.query within the event.     There are many ways that Core and ORM events are applied automatically so this is not a problem.


As for the Column(), this is where the proposed pattern is the most broken, that a database Column would return an ORM mapped object bound to the Session directly.  This is not how the ORM works and I can't support this as a pattern going forward.    When an ORM object links to another object or a collection, relationship() is used for that.   The whole world of folks that have ORM objects that link to other ORM objects are using a Column to set up the database-level linkage and relationship() to represent the load of the object.   

In this case, you need something slightly different than a relationship(), you probably want to just have a plain list of some kind to which your load() handler appends, the addition of the @property can also be automated by using the mapper_configured event.      


try using relationship with viewonly=True and lazy="noload". have your load() handler assign a list to this relationship attribute at load time using attributes.set_committed_value().   on the persistence side, use a before_insert() / before_update() event to read the contents of this list and update the column.




Mike Bayer

unread,
May 30, 2019, 10:41:05 AM5/30/19
to sqlal...@googlegroups.com
Here's how to attach the correct Session to a threading local within the scope of the execution of each statement:

import threading
from sqlalchemy import event

global_session = threading.local()

@event.listens_for(Session, "after_begin")
def attach_session_to_connection(session, transaction, connection):
    connection.info["session"] = session

engine = create_engine("sqlite://")

@event.listens_for(engine, "before_execute")
def attach_session(conn, clauseelement, multiparams, params):
    if "session" in conn.info:
        global_session.session = conn.info['session']

then inside your type handler:

        current_session = global_session.session

        return [
            current_session.query(Cat).filter_by(id=cat_id).one()
            for cat_id in cat_ids
        ]


Below is your program above passing its test using this approach, I hope this solves your problem in a simple way:

import struct
import threading
import zlib

from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import event
from sqlalchemy import Integer
from sqlalchemy import LargeBinary
from sqlalchemy import Text
from sqlalchemy import TypeDecorator
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import object_session
from sqlalchemy.orm import Session
from sqlalchemy.orm import sessionmaker


Base = declarative_base()

global_session = threading.local()

@event.listens_for(Session, "after_begin")
def attach_session_to_connection(session, transaction, connection):
    connection.info["session"] = session

engine = create_engine("sqlite://")

@event.listens_for(engine, "before_execute")
def attach_session(conn, clauseelement, multiparams, params):
    if "session" in conn.info:
        global_session.session = conn.info['session']



class Cat(Base):

    __tablename__ = "cat"
    id = Column(Integer, primary_key=True)
    name = Column(Text)


class CompressedCats(TypeDecorator):

    """ A house can have so many cats that we need to compress them. """

    impl = LargeBinary

    _struct_protocol_id = struct.Struct("<i")

    def process_bind_param(self, value, _dialect):
        if value is None:
            return None
        return zlib.compress(
            b"".join(self._struct_protocol_id.pack(cat.id) for cat in value)
        )

    def process_result_value(self, value, _dialect):
        if value is None:
            return None
        decompressed = zlib.decompress(value)

        cat_ids = [
            self._struct_protocol_id.unpack(decompressed[i : i + 4])[0]
            for i in range(0, len(decompressed), 4)
        ]

        current_session = global_session.session

        return [
            current_session.query(Cat).filter_by(id=cat_id).one()
            for cat_id in cat_ids
        ]


class House(Base):

    __tablename__ = "house"

    id = Column(Integer, primary_key=True)
    cats = Column(CompressedCats)


engine.echo = True
Base.metadata.create_all(engine)


DBSession = sessionmaker(bind=engine)

session_1 = DBSession(autocommit=True, expire_on_commit=True)


with session_1.begin():
    current_session = session_1

    with session_1.begin_nested():
        cats = [Cat(id=i, name="Cat #{i}") for i in range(100)]
        session_1.add_all(cats)
        house = House(cats=cats)
        session_1.add(house)


session_2 = DBSession(autocommit=True)

with session_2.begin():

    current_session = session_2
    assert len(house.cats) == 100

    assert (
        object_session(house.cats[0]) is session_1
    )  # fails because the Cats were loaded from session_2 instead

    session_1.expire(house, ["cats"])

    assert len(house.cats) == 100

    assert object_session(house.cats[0]) is session_1




 

Arguably this is the fault of the TypeDecorator in question, but it seems impossible to write one which is correct, because the session is not passed to process_result_value. _populate_full has access to it (in state.session), but does not pass it to getter (TypeDecorator.result_processor).
 
Please let me know if you can think of a solution or workaround.
 
Thanks, Chris.
 

 

 





This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages