Changing primary key's value in a multi-to-multi relation

44 views
Skip to first unread message

Zsolt Ero

unread,
Feb 14, 2017, 8:15:17 PM2/14/17
to sqlalchemy

I would like to change a primary key's value, to be deterministic, based on a multi-to-multi relation. Thus I'm populating the tables with a temporary ids (just random strings), then calculating the right, unique id, and changing it afterwards.

I have the following models:

class Image(Base):
    id = Column(String, primary_key=True, default=lambda: random_string(16))
    collections = relationship('Collection', secondary='collections_images', back_populates='images')


class Collection(Base):
    id = Column(String, primary_key=True, default=lambda: random_string(16))
    images = relationship('Image', secondary='collections_images', back_populates='collections', order_by='desc(Image.date_created)', lazy='dynamic')


Table('collections_images', Base.metadata,
    Column('collection_id', ForeignKey('collections.id'), primary_key=True),
    Column('image_id', ForeignKey('images.id'), primary_key=True)
)

My problem is the following:

  1. Out of the 3 examples below, only one triggers an integrity exception, the other two does not.

    Why?

    In all three I'm trying to write to this primary key, which is referenced, thus should produce an exception. Yet, in 2. and 3. it seem nothing is happening when collection.id = is set. When I debug via SQL queries it shows absolutely nothing called for the collection.id = line.

  2. How can I solve this problem? I mean how can I change a primary key's value which is also used in a multi-to-multi relation?

The DB is PostgreSQL 9.5 with psycopg2.

The examples are:


A. triggers exception:

with transaction.manager:
    collections = dbsession.query(Collection).all()

    for collection in collections:
        image_ids = [i.id for i in collection.images.all()]
        image_ids_string = ','.join(sorted(image_ids)) + collection.name
        collection.id = md5(image_ids_string)[:16]


B. does not trigger exception

collections = dbsession.query(Collection).all()
# ^ and v only these two lines are swapped 
with transaction.manager:
    for collection in collections:
        image_ids = [i.id for i in collection.images.all()]
        image_ids_string = ','.join(sorted(image_ids)) + collection.name
        collection.id = md5(image_ids_string)[:16]


C. also does not trigger exception

collections = dbsession.query(Collection).all()

for collection in collections:
    image_ids = [i.id for i in collection.images.all()]
    image_ids_string = ','.join(sorted(image_ids)) + collection.name
    with transaction.manager:
        collection.id = md5(image_ids_string)[:16]


The exception for the first one is:

sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) update or delete on table "collections" violates foreign key constraint "fk_collections_images_collection_id_collections" on table "collections_images" DETAIL: Key (id)=(jC3sN8952urTGrqz) is still referenced from table "collections_images".


I've also tried onupdate='CASCADE' for both columns in collections_images but didn't change anything.

mike bayer

unread,
Feb 14, 2017, 10:17:09 PM2/14/17
to sqlal...@googlegroups.com


On 02/14/2017 08:15 PM, Zsolt Ero wrote:
> I would like to change a primary key's value, to be deterministic, based
> on a multi-to-multi relation. Thus I'm populating the tables with a
> temporary ids (just random strings), then calculating the right, unique
> id, and changing it afterwards.

the examples seem to move "transaction.manager" around, which we assume
is the Zope transaction manager and that by using the context manager
the Session.commit() method is ultimately called, which raises this
error. One guess is that in the second two examples, the Session is
not actually getting committed, because no invocation of "dbsession" is
present within the "with transaction.manager" block and I have a vague
recollection that zope.transaction might work this way. Another guess
is that in the second two examples, maybe you already changed the data
in the DB and the operation you're doing has no net change to the rows.

In any case, all three examples you should echo the SQL emitted so you
can see what it's doing. Setting up the onupdate="CASCADE" should fix
this problem. As to why that didn't work from you, keep in mind that is
a CREATE TABLE directive so if you just changed it in your model and
didn't recreate the tables, or at least recreate the foreign key
constraints using ALTER to drop and create them again with the CASCADE
rule set up; this is a server side rule.

Here's the MCVE to demonstrate:

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
import md5 as _md5
import random
import string


def md5(text):
return str(_md5.md5(text))


def random_string(num):
return ''.join(random.choice(
string.ascii_uppercase + string.digits) for _ in range(num))

Base = declarative_base()


class Image(Base):
__tablename__ = 'images'
id = Column(String, primary_key=True, default=lambda:
random_string(16))
collections = relationship(
'Collection', secondary='collections_images',
back_populates='images')
date_created = Column(DateTime, default=func.now())


class Collection(Base):
__tablename__ = 'collections'
id = Column(String, primary_key=True, default=lambda:
random_string(16))
name = Column(String)
images = relationship(
'Image', secondary='collections_images',
back_populates='collections', order_by='desc(Image.date_created)',
lazy='dynamic')


collections_images = Table(
'collections_images', Base.metadata,
Column('collection_id',
ForeignKey('collections.id', onupdate="CASCADE"),
primary_key=True),
Column('image_id', ForeignKey('images.id'), primary_key=True)
)

e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)

Base.metadata.drop_all(e)
Base.metadata.create_all(e)

s = Session(e)

with s.transaction:
s.add(Collection(name='c1', images=[Image(), Image(), Image()]))

with s.transaction:
collections = s.query(Collection).all()

for collection in collections:
image_ids = [i.id for i in collection.images.all()]
image_ids_string = ','.join(sorted(image_ids)) + collection.name
collection.id = md5(image_ids_string)[:16]




>
> I have the following models:
>
> |classImage(Base):id
> =Column(String,primary_key=True,default=lambda:random_string(16))collections
> =relationship('Collection',secondary='collections_images',back_populates='images')classCollection(Base):id
> =Column(String,primary_key=True,default=lambda:random_string(16))images
> =relationship('Image',secondary='collections_images',back_populates='collections',order_by='desc(Image.date_created)',lazy='dynamic')Table('collections_images',Base.metadata,Column('collection_id',ForeignKey('collections.id'),primary_key=True),Column('image_id',ForeignKey('images.id'),primary_key=True))|
>
> My problem is the following:
>
> 1.
>
> Out of the 3 examples below, only one triggers an integrity
> exception, the other two does not.
>
> Why?
>
> In all three I'm trying to write to this primary key, which is
> referenced, thus should produce an exception. Yet, in 2. and 3. it
> seem nothing is happening when |collection.id =| is set. When I
> debug via SQL queries it shows absolutely nothing called for
> the |collection.id =| line.
>
> 2.
>
> How can I solve this problem? I mean how can I change a primary
> key's value which is also used in a multi-to-multi relation?
>
> The DB is PostgreSQL 9.5 with psycopg2.
>
> The examples are:
>
>
> A. triggers exception:
>
> |withtransaction.manager:collections
> =dbsession.query(Collection).all()forcollection incollections:image_ids
> =[i.id fori incollection.images.all()]image_ids_string
> =','.join(sorted(image_ids))+collection.name collection.id
> =md5(image_ids_string)[:16]|
>
>
> B. does not trigger exception
>
> |collections =dbsession.query(Collection).all()# ^ and v only these two
> lines are swapped withtransaction.manager:forcollection
> incollections:image_ids =[i.id fori
> incollection.images.all()]image_ids_string
> =','.join(sorted(image_ids))+collection.name collection.id
> =md5(image_ids_string)[:16]|
>
>
> C. also does not trigger exception
>
> |collections =dbsession.query(Collection).all()forcollection
> incollections:image_ids =[i.id fori
> incollection.images.all()]image_ids_string
> =','.join(sorted(image_ids))+collection.name
> withtransaction.manager:collection.id =md5(image_ids_string)[:16]|
>
>
> The exception for the first one is:
>
> |
> sqlalchemy.exc.IntegrityError:(psycopg2.IntegrityError)update ordeleteon
> table "collections"violates foreign key constraint
> "fk_collections_images_collection_id_collections"on table
> "collections_images"DETAIL:Key(id)=(jC3sN8952urTGrqz)isstill referenced
> fromtable "collections_images".
>
> |
>
> I've also tried onupdate='CASCADE' for both columns
> in collections_images but didn't change anything.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Zsolt Ero

unread,
Feb 15, 2017, 9:46:14 AM2/15/17
to sqlal...@googlegroups.com
Thanks Mike for looking into this.

I've created a minimal program which reproduces my error. As a
context, this is a init script for a Pyramid app, but is run from
command line, not in a request loop. The tables are dropped and
recreated at start. Inklesspen helped me figure out the transaction
manager over IRC and I've simplified it into one single block which is
both simpler and more reliable.

So about SQLAlchemy's behaviour:

1. I do not see anything related to CASCADE in echo when I use
onupdate='CASCADE'.
2. Update does not work in my case, logs attached.
3. A manual hack of creating a new collection and deleting the old one
does work. It means that delete does not need CASCADE, but SQLAlchemy
can calculate the order of calls, if I understand right?
4. An interesting thing is that SQLAlchemy does 3 select calls in the
delete case, even if 1 would be enough. Can be seen in the logs.

Zsolt
> --- You received this message because you are subscribed to a topic in the
> Google Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/sqlalchemy/breBc7iStF0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
delete.log
test.py
update.log

mike bayer

unread,
Feb 15, 2017, 10:52:00 AM2/15/17
to sqlal...@googlegroups.com
onupdate=CASCADE is an option of ForeignKey, not Column:
Table(
'collections_images', Base.metadata,
Column('collection_id',
ForeignKey('collections.id', onupdate='CASCADE'),
primary_key=True, ),
Column('image_id', ForeignKey('images.id'), primary_key=True))




Zsolt Ero

unread,
Feb 15, 2017, 11:03:02 AM2/15/17
to sqlal...@googlegroups.com
Thanks a lot! I would be still interested in your answer for 3. and 4.

Especially, what is the difference between update and delete from's
behaviour here? Why
does SQLAlchemy know how to "cascade" a delete just on the client
side, while for update it needs server side CASCADE support?

mike bayer

unread,
Feb 15, 2017, 11:08:04 AM2/15/17
to sqlal...@googlegroups.com
a "delete" cascade can be done on the client side because it involves a
simple ordering of steps:

1. DELETE the rows that depend on the target row

2. DELETE the target row.


an "update" cascade, OTOH, can't work this way. Because the dependent
row remains existing and needs to have a primary key value at all times,
and that primary key needs to correspond to a row in the target table.
The only way to do this client side (without disabling or dropping the
constraints themselves) would be:

1. INSERT a new row into the target table with the new primary key value

2. UPDATE the rows that depend on the target row

3. DELETE the old row from the target table

For a simple UPDATE of the target table, this is not feasible;
INSERT/DELETE is a very different operation than an UPDATE at many
levels; data wise, isolation/concurrency-wise, etc. Only "ON UPDATE
CASCADE" configured on the server level can accommodate the flow as an
UPDATE on the target table, where Postgresql internally handles the
cascading of the primary key change to all dependent tables without
violating referential integrity.

Zsolt Ero

unread,
Feb 15, 2017, 11:15:53 AM2/15/17
to sqlal...@googlegroups.com
Thank you for the explanation! It is indeed deeper than I first
thought, but I understand it now.

Finally, consider question 4. (multiple select on delete) to be a bug
report, even if harmless.

mike bayer

unread,
Feb 15, 2017, 11:26:04 AM2/15/17
to sqlal...@googlegroups.com


On 02/15/2017 09:45 AM, Zsolt Ero wrote:
> 4. An interesting thing is that SQLAlchemy does 3 select calls in the
> delete case, even if 1 would be enough. Can be seen in the logs.

the ORM only deletes rows one at a time based on primary key match. So
if you have a relationship() that is configured to cascade deletes, and
you have not instructed the system that "ON DELETE CASCADE" will take
care of those collections, it will need to ensure these collections are
present in memory (e.g. the SELECT) and target each row for deletion
individually. You see only one DELETE statement but you'll note it has
multiple parameter sets to indicate every row being deleted.
Background on how to optimize this is at
http://docs.sqlalchemy.org/en/latest/orm/collections.html#using-passive-deletes
.

In this specific case there seem to be two SELECT statements but that
appears to be because of the awkward production of a new object that has
the same primary key as another object. In the logs you'll see an
UPDATE but this is actually a special case "hack"; normally, we'd see a
DELETE of the old row and an INSERT of the new one, however the unit of
work does not support this process. There is an option to allow it to
work this way in specific cases, although this feature is not present in
SQLAlchemy at this time. In the absence of that feature, the behavior
is that if the same primary key is present on one object being deleted
and another one being added in the same flush, they are rolled into an
UPDATE. Then the collection is being deleted and re-added again too,
so this is a bit of a crazy example; using a straight UPDATE with
correct cascade rules is obviously much more efficient.

Zsolt Ero

unread,
Feb 15, 2017, 10:08:24 PM2/15/17
to sqlal...@googlegroups.com
I'm starting to understand it! Just a few very quick questions:

1. Is it a good strategy to always use onupdate='CASCADE',
ondelete='CASCADE' for all foreign keys if the db supports it (in my
case Postgres 9.5 + psycopg2)?

2. If I'd like to use it on an already populated db, can I "alter
table" to use these CASCADE options (for example in an alembic
migration)?

3. For delete, is this simply an optimisation step and the default
behaviour is also perfectly fine?

4. Simply adding the above parameters is not automatically changing
SQLAlchemy's delete strategy, I also need to tell it to use passive
deletes. Do I also need the cascade="all, delete-orphan" option like
in the docs?

Zsolt

Zsolt Ero

unread,
Feb 28, 2017, 9:37:36 AM2/28/17
to sqlal...@googlegroups.com
Can anyone help with these questions? Nothing urgent, I'm just
interested in understanding more about SQLAlchemy.

mike bayer

unread,
Feb 28, 2017, 10:22:39 AM2/28/17
to sqlal...@googlegroups.com
sorry, lost track on this one.

On 02/15/2017 10:07 PM, Zsolt Ero wrote:
> I'm starting to understand it! Just a few very quick questions:
>
> 1. Is it a good strategy to always use onupdate='CASCADE',
> ondelete='CASCADE' for all foreign keys if the db supports it (in my
> case Postgres 9.5 + psycopg2)?

if you want those child objects to actually delete or modify on PK like
that, as opposed to prevent it from happening, then yes.


>
> 2. If I'd like to use it on an already populated db, can I "alter
> table" to use these CASCADE options (for example in an alembic
> migration)?

yes you need to DROP the foreign key constraints and re-create them.
alembic autogenerate *should* be able to generate this script for you
these days without screwing up.


>
> 3. For delete, is this simply an optimisation step and the default
> behaviour is also perfectly fine?

it depends. For one level of "parent -> child", the cascade is just an
optimization. However, if you really have "parent -> child ->
grandchild -> great grandchild", and you need operations to cascade all
the way through, the ORM may not be able to support a long chain of
"onupdate" if they all share the same "foreign key" value that's
changing, and as far as "ondelete" I actually don't remember if it loads
everything in :) I'd have to try it - but it would be inefficient in
any case. I'd be using CASCADE for a long chain like that regardless.


>
> 4. Simply adding the above parameters is not automatically changing
> SQLAlchemy's delete strategy, I also need to tell it to use passive
> deletes. Do I also need the cascade="all, delete-orphan" option like
> in the docs?

To have the ORM fully handle cascading delete, you need to have the
"all, delete-orphan" so the ORM knows what's going on, then the FKs have
ondelete="CASCADE", and then the passive_deletes=True so that the ORM
doesn't needlessly load in collections to delete them.

Zsolt Ero

unread,
Feb 28, 2017, 10:26:49 AM2/28/17
to sqlal...@googlegroups.com
Thanks for the detailed explanation!
Reply all
Reply to author
Forward
0 new messages