Issue with bulk updates on ORM and PostgreSQL

114 views
Skip to first unread message

João Miguel Neves

unread,
Jun 14, 2019, 10:30:39 AM6/14/19
to sqlalchemy
Hi,

I have a situation where an update tries to update the wrong table on when a column comes from the parent table and is not on the current table. I'll grant I didn't quite understand all the caveats in https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.update so let me apologise if it's written there - then I just need a pointer in the right direction. Here is some sample code (mostly adapted from https://stackoverflow.com/questions/44183500/problems-with-update-and-table-inheritance-with-sqlalchemy - the code won't work on sqlite).

import os
import sys

from sqlalchemy import Column, create_engine, ForeignKey, Integer, String, DateTime

from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import func, update
from sqlalchemy.ext.declarative import declarative_base


try:
   os.remove('test.db')
except FileNotFoundError:
   pass

engine = create_engine('postgresql+psycopg2://user:pass@server/database', echo=True)
Session = sessionmaker(engine)

Base = declarative_base()


class People(Base):
    __tablename__ = 'people'
    discriminator = Column('type', String(50))
    __mapper_args__ = {'polymorphic_on': discriminator}

    id = Column(Integer, primary_key=True)
    name = Column(String(50))
    updated = Column(DateTime, server_default=func.now(), onupdate=func.now())

class Engineer(People):
    __tablename__ = 'engineer'
    __mapper_args__ = {'polymorphic_identity': 'engineer'}
    id = Column(Integer, ForeignKey('people.id'), primary_key=True)
    kind = Column(String(100), nullable=True)

Base.metadata.create_all(engine)

session = Session()

e = Engineer()
e.name = 'Mike'
session.add(e)
session.flush()
session.commit()

# works when updating the object
e.name = "Doug"
session.add(e)
session.commit()


# works using the base class for the query
count = session.query(People).filter(
                           People.name == 'Doug').update({People.name: 'James'})

# fails when using the derived class
count = session.query(Engineer).filter(Engineer.id == People.id,
                           Engineer.name == 'James', ).update({Engineer.name: 'Mary'})

session.commit()
print("Count: {}".format(count))

----

The error message is:

Traceback (most recent call last):
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
    cursor, statement, parameters, context
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 550, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedColumn: column "name" of relation "engineer" does not exist
LINE 1: UPDATE engineer SET name='Mary', updated=now() FROM people W...
                            ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "update-inheritance.py", line 63, in <module>
    Engineer.name == 'James', ).update({Engineer.name: 'Mary'})
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3824, in update
    update_op.exec_()
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1673, in exec_
    self._do_exec()
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1866, in _do_exec
    self._execute_stmt(update_stmt)
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1678, in _execute_stmt
    self.result = self.query._execute_crud(stmt, self.mapper)
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3356, in _execute_crud
    return conn.execute(stmt, self._params)
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 988, in execute
    return meth(self, multiparams, params)
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
    distilled_params,
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    e, statement, parameters, cursor, context
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 128, in reraise
    raise value.with_traceback(tb)
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
    cursor, statement, parameters, context
  File "/home/joao/testes/sa-bugs/.ve/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 550, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column "name" of relation "engineer" does not exist
LINE 1: UPDATE engineer SET name='Mary', updated=now() FROM people W...
                            ^

[SQL: UPDATE engineer SET name=%(people_name)s, updated=now() FROM people WHERE engineer.id = people.id AND people.name = %(name_1)s]
[parameters: {'people_name': 'Mary', 'name_1': 'James'}]
(Background on this error at: http://sqlalche.me/e/f405)

I can get the correct table to update through:

table = Engineer.name.property.columns[0].table

And the following update works (albeit with a slightly different semantics):

update(table).where(table.c.name == 'James').values({table.c.name : 'Mary'})

From what I've read, adding the People.id == Engineer.id should be enough, but it seems like I missed some detail. Can you help?

Thanks in advance for any help/pointers,
João

Mike Bayer

unread,
Jun 14, 2019, 11:04:59 AM6/14/19
to sqlal...@googlegroups.com


On Fri, Jun 14, 2019, at 10:30 AM, João Miguel Neves wrote:
Hi,

I have a situation where an update tries to update the wrong table on when a column comes from the parent table and is not on the current table. I'll grant I didn't quite understand all the caveats in https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.update so let me apologise if it's written there - then I just need a pointer in the right direction. Here is some sample code (mostly adapted from https://stackoverflow.com/questions/44183500/problems-with-update-and-table-inheritance-with-sqlalchemy - the code won't work on sqlite).


Hi there -

yes the caveats refer to the joined inheritance situation as where there are intricacies here.

Your UPDATE statement refers to the "name" column which is part of People, so you need to format your UPDATE statement in terms of the "people" table, which means the second query is not supported.   It at least has to be against "People" otherwise the UPDATE is only targeting the "engineer" table.

if you want to limit the "name" updates to only those rows that are Engineer, use the discriminator column to filter to People.type == "engineer".




zi


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

João Miguel Neves

unread,
Jun 14, 2019, 11:48:59 AM6/14/19
to sqlal...@googlegroups.com
Hi Mike,

Thank you very much for the quick response!

Is there any other way to find the right table from the model other than somthing like Engineer.name.property.columns[0].table? (I'm trying to do an update in a codepath that can have several different models passed to it)

TIA,
João

Mike Bayer

unread,
Jun 14, 2019, 12:07:28 PM6/14/19
to sqlal...@googlegroups.com


On Fri, Jun 14, 2019, at 11:49 AM, João Miguel Neves wrote:
Hi Mike,

Thank you very much for the quick response!

Is there any other way to find the right table from the model other than somthing like Engineer.name.property.columns[0].table? (I'm trying to do an update in a codepath that can have several different models passed to it)


you're doing query.update() for performance reasons, right?  yeah you probably have to do something like that, or just use regular session patterns to update data.



João Miguel Neves

unread,
Jun 14, 2019, 12:22:12 PM6/14/19
to sqlal...@googlegroups.com
Not performance, actually to avoid a race condition with key/values written to JSONB fields. Ended up with the following function that we use for when multiple updates at the same time can occur (from the frontend mostly). It's based on your recommendations from https://groups.google.com/forum/#!topic/sqlalchemy/hjjIyEC8KHQ

When we got 2 requests in parallel (one updating key1 with value1 and another updating key2 with value2), the first would write value1, but as the second had loaded the JSONB field before, it would write to key2 and then save the whole content of the JSONB field without the update to key1 effectively losing data in our case.

def atomic_jsonb_set(context, dynamic_column_name, column_id, value):
    id_key = [col.name for col in inspect(context.__class__).primary_key]
    identifier = {key: getattr(context, key) for key in id_key}

    DBSession.query(context.__class__).filter_by(**identifier).update(
        {
            dynamic_column_name: func.jsonb_set(
                getattr(context.__class__, dynamic_column_name),
                f"{{{column_id}}}",
                config["sqlalchemy.json_serializer"](value),
            )
        },
        synchronize_session="fetch",
    )

Mike Bayer

unread,
Jun 14, 2019, 12:46:38 PM6/14/19
to sqlal...@googlegroups.com


On Fri, Jun 14, 2019, at 12:22 PM, João Miguel Neves wrote:
Not performance, actually to avoid a race condition with key/values written to JSONB fields. Ended up with the following function that we use for when multiple updates at the same time can occur (from the frontend mostly). It's based on your recommendations from https://groups.google.com/forum/#!topic/sqlalchemy/hjjIyEC8KHQ

When we got 2 requests in parallel (one updating key1 with value1 and another updating key2 with value2), the first would write value1, but as the second had loaded the JSONB field before, it would write to key2 and then save the whole content of the JSONB field without the update to key1 effectively losing data in our case.

def atomic_jsonb_set(context, dynamic_column_name, column_id, value):
    id_key = [col.name for col in inspect(context.__class__).primary_key]
    identifier = {key: getattr(context, key) for key in id_key}

    DBSession.query(context.__class__).filter_by(**identifier).update(
        {
            dynamic_column_name: func.jsonb_set(
                getattr(context.__class__, dynamic_column_name),
                f"{{{column_id}}}",
                config["sqlalchemy.json_serializer"](value),
            )
        },
        synchronize_session="fetch",
    )


OK so the patterns you can use with that are the versioning column, or if you want the UPDATE to be atomic there might be complications if you are updating columns in both "person" and "engineer".    I might use SELECT..FOR UPDATE for a pessimistic approach.



Mike Bayer

unread,
Jun 14, 2019, 12:50:55 PM6/14/19
to sqlal...@googlegroups.com


On Fri, Jun 14, 2019, at 12:46 PM, Mike Bayer wrote:


On Fri, Jun 14, 2019, at 12:22 PM, João Miguel Neves wrote:
Not performance, actually to avoid a race condition with key/values written to JSONB fields. Ended up with the following function that we use for when multiple updates at the same time can occur (from the frontend mostly). It's based on your recommendations from https://groups.google.com/forum/#!topic/sqlalchemy/hjjIyEC8KHQ

When we got 2 requests in parallel (one updating key1 with value1 and another updating key2 with value2), the first would write value1, but as the second had loaded the JSONB field before, it would write to key2 and then save the whole content of the JSONB field without the update to key1 effectively losing data in our case.

def atomic_jsonb_set(context, dynamic_column_name, column_id, value):
    id_key = [col.name for col in inspect(context.__class__).primary_key]
    identifier = {key: getattr(context, key) for key in id_key}

    DBSession.query(context.__class__).filter_by(**identifier).update(
        {
            dynamic_column_name: func.jsonb_set(
                getattr(context.__class__, dynamic_column_name),
                f"{{{column_id}}}",
                config["sqlalchemy.json_serializer"](value),
            )
        },
        synchronize_session="fetch",
    )


OK so the patterns you can use with that are the versioning column, or if you want the UPDATE to be atomic there might be complications if you are updating columns in both "person" and "engineer".    I might use SELECT..FOR UPDATE for a pessimistic approach.

oh and by versioning column i mean https://docs.sqlalchemy.org/en/13/orm/versioning.html?highlight=versioning, which is also using regular Session.add() types of patterns, not query.update().


João Miguel Neves

unread,
Jun 14, 2019, 12:52:31 PM6/14/19
to sqlal...@googlegroups.com
Cool, wasn't aware of that feature! Thanks!

João Miguel Neves

unread,
Jun 17, 2019, 12:02:06 PM6/17/19
to sqlal...@googlegroups.com
Hi,

Ok, versioning adds a different requirement level, as it fails if the version being updated has changed. I was looking for a situation where updating 2 keys inside a JSONB field wouldn't lose one of them. Using versioning it raises an exception when writing one of them (which is better than the previous situation where it silently lost one of the updates, but slightly worst than the update with synchronize_session if it would work with inheritance). I'm assuming there's no other alternative?

Thanks in advance,
João

Mike Bayer

unread,
Jun 17, 2019, 12:05:30 PM6/17/19
to sqlal...@googlegroups.com

Mike Bayer

unread,
Jun 17, 2019, 12:07:05 PM6/17/19
to sqlal...@googlegroups.com


On Mon, Jun 17, 2019, at 12:02 PM, João Miguel Neves wrote:
Hi,

Ok, versioning adds a different requirement level, as it fails if the version being updated has changed. I was looking for a situation where updating 2 keys inside a JSONB field wouldn't lose one of them. Using versioning it raises an exception when writing one of them (which is better than the previous situation where it silently lost one of the updates, but slightly worst than the update with synchronize_session if it would work with inheritance). I'm assuming there's no other alternative?


the ORM will see your JSONB field as one value, so if you are trying to do an UPDATE that is per element then you'd need to stick with using explicit UPDATE statements and you would need to ensure you are targeting the correct table in an inheritance scenario, as this is not supplied automatically.   If you were updating attributes across two tables you'd need to emit two separate UPDATE statements, for example.






Reply all
Reply to author
Forward
0 new messages