Dear Michael,
I have discovered a limitation of TypeDecorators (custom column types): any one that uses the database (e.g. to load objects serialised in a custom way) has no way to know which database session to use. During initial load one can use a global session object, but expired attributes may need to be refreshed at any time, when the current session is not necessarily the one that loaded the attribute, or its parent object.
For example, consider a TypeDecorator that stores a large number of Cats by ID, compressed to save space in the database:
import struct
import zlib
from sqlalchemy import Column, Integer, LargeBinary, Text, TypeDecorator, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import object_session, sessionmaker
Base = declarative_base()
current_session = None
class Cat(Base):
__tablename__ = 'cat'
id = Column(Integer, primary_key=True)
name = Column(Text)
class CompressedCats(TypeDecorator):
''' A house can have so many cats that we need to compress them. '''
impl = LargeBinary
_struct_protocol_id = struct.Struct('<i')
def process_bind_param(self, value, _dialect):
# Typically the column can be nullable, so you can write None to it: support that.
if value is None:
return None
# FIXME: allow specification of different protocol id, and
# use the id to determine the protocol argument to pass to serialize.
return zlib.compress(b''.join(self._struct_protocol_id.pack(cat.id) for cat in value))
def process_result_value(self, value, _dialect):
# Typically the column can be nullable, so you can read None from it: support that.
if value is None:
return None
# FIXME: unpack and use protocol id to determine protocol argument to pass to deserialize.
decompressed = zlib.decompress(value)
cat_ids = [self._struct_protocol_id.unpack(decompressed[i:i + 4])[0] for i in range(0, len(decompressed), 4)]
# BUG: Which session can we use here?
global current_session
return [current_session.query(Cat).filter_by(id=cat_id).one() for cat_id in cat_ids]
class House(Base):
__tablename__ = 'house'
id = Column(Integer, primary_key=True)
# A house can have so many cats that we need to compress them:
cats = Column(CompressedCats)
engine = create_engine('sqlite://')
engine.echo = True
Base.metadata.create_all(engine)
DBSession = sessionmaker(bind=engine)
session_1 = DBSession(autocommit=True, expire_on_commit=True)
with session_1.begin():
current_session = session_1
with session_1.begin_nested():
cats = [Cat(id=i, name='Cat #{i}') for i in range(100)]
session_1.add_all(cats)
house = House(cats=cats)
session_1.add(house)
session_2 = DBSession(autocommit=True)
with session_2.begin():
current_session = session_2
assert len(house.cats) == 100
assert object_session(house.cats[0]) is session_1 # fails because the Cats were loaded from session_2 instead
# We don't even need to do this to trigger the bug, unless we construct session_1 with expire_on_commit=False
session_1.expire(house, ['cats'])
assert len(house.cats) == 100
assert object_session(house.cats[0]) is session_1
At this point, house.cats needs to be reloaded, but if we use the current session (session_2) then it will be incorrect.
Arguably this is the fault of the TypeDecorator in question, but it seems impossible to write one which is correct, because the session is not passed to process_result_value. _populate_full has access to it (in state.session), but does not pass it to getter (TypeDecorator.result_processor).
Please let me know if you can think of a solution or workaround.
Thanks, Chris.
During initial load one can use a global session object,
I have discovered a limitation of TypeDecorators (custom column types): any one that uses the database (e.g. to load objects serialised in a custom way) has no way to know which database session to use.
On Tuesday, May 28, 2019 at 9:10:19 AM UTC-4, Chris Wilson wrote:
During initial load one can use a global session object,
You should not do that. Global sessions are widely considered an anti-pattern.
I have discovered a limitation of TypeDecorators (custom column types): any one that uses the database (e.g. to load objects serialised in a custom way) has no way to know which database session to use.
While TypeDecorators are designed to offer control on how data is encoded/decoded, they aren't really designed to do what you're attempting (bootstrap an ORM relationship into a column, instead of a table). This is the first time I've seen anyone try to do this, and I am a long time abuser of SqlAlchemy myself.Mike or Simon may have other suggestions, but my immediate thought is that you can quickly accomplish what you're trying to do by just having a TypeDecorator for `.cat_ids` serialize/deserialize a list of ids, and then have a python @property for `def cats(self):` perform the sql needed to turn `self.cat_ids` into `self.cats`. You'd be able to determine the session for the actual object via `object_session`@propertydef cats(self):if self._cats is None:if self.cat_ids:_session = object_session(self)
self._cats = _session.query(Cats).filter(Cat.id.in_(self.cat_ids)).all()return self._cats
--SQLAlchemy -The Python SQL Toolkit and Object Relational MapperTo post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.---You received this message because you are subscribed to the Google Groups "sqlalchemy" group.To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.To post to this group, send email to sqlal...@googlegroups.com.Visit this group at https://groups.google.com/group/sqlalchemy.To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/daddca0f-c413-45e2-94cd-3fdb53b89a3a%40googlegroups.com.For more options, visit https://groups.google.com/d/optout.
additionally, running SQL inside of a TypeDecorator is not the intended usage, as well as using ORM features inside of a TypeDecorator is also not the intended usage.
Hi all,Thanks for the replies! Sorry, perhaps I wasn't clear, this is just a minimal example. We are actually storing serialized objects in a column, which can be e.g. dicts or lists of (dehydrated) SQLAlchemy objects, numpy arrays, etc. It's much faster to store (both read and write) a complex structure serialized into a single column than to split it across many tables. Generally this works fine for us, but I just discovered this unusual case, and couldn't see how to solve it.
If there were pre-load hooks as well as post-load, then we could set a global variable to object_session(parent_object) for the duration of the load.
Alternatively, if the state or context was passed to the TypeDecorator then we could use it to get the session.
But right now it appears that there is no hook that does what we want to do. Please would you accept my request to implement something like this?Thanks, Chris.On Tuesday, 28 May 2019 23:42:59 UTC+1, Jonathan Vanasco wrote:On Tuesday, May 28, 2019 at 4:35:32 PM UTC-4, Mike Bayer wrote:additionally, running SQL inside of a TypeDecorator is not the intended usage, as well as using ORM features inside of a TypeDecorator is also not the intended usage.thanks, mike. I was 99.99% sure that was the case, but didn't want to risk bring wrong.
--SQLAlchemy -The Python SQL Toolkit and Object Relational MapperTo post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.---You received this message because you are subscribed to the Google Groups "sqlalchemy" group.To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.To post to this group, send email to sqlal...@googlegroups.com.Visit this group at https://groups.google.com/group/sqlalchemy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/7850da06-e4a1-4864-8b92-26a817041c6f%40googlegroups.com.
Note that we suggested approaches that work in conjunction with the serialized approach you gave, the @property approach and the mapper.load() event approach.
However, if you are storing the fully serialized object in the column, like the whole Cat object, you don't need to emit a SQL query to restore it, for caching objects in serialized form you'd want to merge() it back into the Session with load=False so that no SQL is emitted.
You still need your Session though and of course, using a threadlocal variable is the best way to make that happen right now without changing the type API.
If there were pre-load hooks as well as post-load, then we could set a global variable to object_session(parent_object) for the duration of the load.You shouldn't need "object_session(parent_object)", there's only one Session in play at a time within a thread so just assign the Session to a thread local variable.
The best hook to use here is simply the transaction-level hooks to set the current Session onto a global thread-local variable. The ones that are SessionTransaction level should work well:
The context available is the ExecutionContext, however this isn't passed to the TypeEngine bind/result processor methods right now. That might not be a bad idea in the future but for the moment would require a major breaking API change that cannot be made quickly or trivially.
An example of passing information between a Session and the execution-level context is at https://github.com/sqlalchemy/sqlalchemy/wiki/SessionModifiedSQL but this doesn't give you a way to get inside the TypeDecorator methods without using a global threadlocal.
Hi Mike,On Wed, 29 May 2019 at 15:30, Mike Bayer wrote:Note that we suggested approaches that work in conjunction with the serialized approach you gave, the @property approach and the mapper.load() event approach.Unfortunately I think they would both require massive code changes. Everywhere that we use Column(CompressedBinary), we'd have to rename the column and add property getters and setters. Unless there's a way to hook into the instrumentation machinery to do that automatically?
However, if you are storing the fully serialized object in the column, like the whole Cat object, you don't need to emit a SQL query to restore it, for caching objects in serialized form you'd want to merge() it back into the Session with load=False so that no SQL is emitted.We are not serializing the whole object, only the PK, but merging it back into the session with all its attributes expired, so that any attempt to access them triggers a load. That works well for us, unless the current session changes in the mean time (not the problem that I originally asked about, but a related one, that luckily isn't biting us right now).You still need your Session though and of course, using a threadlocal variable is the best way to make that happen right now without changing the type API.The problem isn't that we're passing objects between Threads, it's that we can use multiple sessions in the same thread. I noticed it while trying to create a reproducible test case in the debugger, which was switching to our test/scratch database, setting a local variable, and then exiting the context (back to our main database) with that local variable still in scope (with expired attributes). When the debugger rendered the repr() of the local variable, it causes its relationships to be loaded from the live database, which didn't compare equal to objects in the test database, so I had to restart the debugger every time this happened.
It's not critical but it was annoying because it made debugging much harder and slower than I thought it could/should be. I admit that this is a niche use case, so I consider this a feature request instead of a bug.
The context available is the ExecutionContext, however this isn't passed to the TypeEngine bind/result processor methods right now. That might not be a bad idea in the future but for the moment would require a major breaking API change that cannot be made quickly or trivially.Could it be added as an optional argument that is only passed if the recipient method is expecting it?
An example of passing information between a Session and the execution-level context is at https://github.com/sqlalchemy/sqlalchemy/wiki/SessionModifiedSQL but this doesn't give you a way to get inside the TypeDecorator methods without using a global threadlocal.Unfortunately even that doesn't help, I think, because the TypeDecorator is called after the statement has been executed, so there's no concept of a "current" load statement, only the last one, and I don't know for sure if that was really the same context/object/session that loaded the data being processed.I'm looking at whether something like the mutable extension would have access to the parent object, to coerce data structures containing SQLAlchemy objects into serializable form on the way into the database, and coerce them back to SQLAlchemy objects after a load event.Thanks again, Chris.
--SQLAlchemy -The Python SQL Toolkit and Object Relational MapperTo post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.---You received this message because you are subscribed to the Google Groups "sqlalchemy" group.To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.To post to this group, send email to sqlal...@googlegroups.com.Visit this group at https://groups.google.com/group/sqlalchemy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/81bb9420-3f48-408e-b5d4-04ebc3a6ea01%40googlegroups.com.
On Thu, May 30, 2019, at 8:42 AM, Chris Wilson wrote:Hi Mike,On Wed, 29 May 2019 at 15:30, Mike Bayer wrote:Note that we suggested approaches that work in conjunction with the serialized approach you gave, the @property approach and the mapper.load() event approach.Unfortunately I think they would both require massive code changes.
Everywhere that we use Column(CompressedBinary), we'd have to rename the column and add property getters and setters. Unless there's a way to hook into the instrumentation machinery to do that automatically?All ORM events support propagation across all mappers, for InstanceEvents of wihch load() is a member this is documented at https://docs.sqlalchemy.org/en/13/orm/events.html?highlight=event%20load#instance-eventsThe load() event would be applied to all mapped classes, or to a specific hierarchy / mixin hierarchy, if you prefer:@event.listens_for(Base, "load", propagate=True)def my_event(...):The load event is per instance so you would need to emit your Session.query within the event. There are many ways that Core and ORM events are applied automatically so this is not a problem.As for the Column(), this is where the proposed pattern is the most broken, that a database Column would return an ORM mapped object bound to the Session directly. This is not how the ORM works and I can't support this as a pattern going forward. When an ORM object links to another object or a collection, relationship() is used for that. The whole world of folks that have ORM objects that link to other ORM objects are using a Column to set up the database-level linkage and relationship() to represent the load of the object.In this case, you need something slightly different than a relationship(), you probably want to just have a plain list of some kind to which your load() handler appends, the addition of the @property can also be automated by using the mapper_configured event.
To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/d50a0395-6116-4bdd-8354-2724e51c944d%40www.fastmail.com.
Arguably this is the fault of the TypeDecorator in question, but it seems impossible to write one which is correct, because the session is not passed to process_result_value. _populate_full has access to it (in state.session), but does not pass it to getter (TypeDecorator.result_processor).
Please let me know if you can think of a solution or workaround.Thanks, Chris.
This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.
GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice
--SQLAlchemy -The Python SQL Toolkit and Object Relational MapperTo post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.---You received this message because you are subscribed to the Google Groups "sqlalchemy" group.To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.To post to this group, send email to sqlal...@googlegroups.com.Visit this group at https://groups.google.com/group/sqlalchemy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/BCCA73C2165E8947A2E786EC482564DE013E32B1E4%40ccpmaildag02.cantab.local.