SQLAlchemy is trying to insert duplicate rows in an M2M Relationship

2,412 views
Skip to first unread message

Donald Stufft

unread,
Dec 14, 2012, 10:22:41 PM12/14/12
to sqlal...@googlegroups.com
I have 2 tables with a third intermediary table. These tables are (shorted): https://gist.github.com/9ff8afa793c9150c6b70

Using this the association_proxy correctly reuses existing rows in the database if they already exist. However if I do this:

v = Version.query.first()
v.classifiers = [u"Foo"]
db.session.commit()  # A Classifier with trove=u"Foo" is either retrieved or created

v.classifiers = [u"Foo", u"Bar"]
db.session.commit()

An error occurs because SQLAlchemy tried to insert a second row in the intermediary table for this Version + Classifier(u"Foo").

So my question is how can I get it to properly handle the fact that there should only ever be 1 mapping of a particular Version
to a particular Classifier?

Note, I can work around this by doing:

classifiers = [u"Foo", u"Bar"]
for c in classifiers:
    if not c in v.classifiers:
        v.classifiers.append(c)

But I would really rather have a solution that is contained inside of my Model.

Michael Bayer

unread,
Dec 14, 2012, 11:30:57 PM12/14/12
to sqlal...@googlegroups.com
I've cobbled together a complete and simplified test case given your mapping and example code and I cannot reproduce with either 0.7 or 0.8 - the count of rows in the association table is one on the first commit, and two on the second.

You need to adapt the attached test case into a full reproducing case so that the specific trigger is illustrated...thanks.

test.py

Donald Stufft

unread,
Dec 14, 2012, 11:38:26 PM12/14/12
to sqlal...@googlegroups.com
Hrm. I'll see what I can do. Though looking at what you posted it works for me with that too.. So the problem must either be with Flask-SQLAlchemy or with my own app code.

Michael Bayer

unread,
Dec 14, 2012, 11:43:19 PM12/14/12
to sqlal...@googlegroups.com
its probably some subtlety to the data that's already loaded and how the collection is being mutated - it's unlikely Flask has anything to do with it.   There may or may not be some less-than-ideal or buggy behavior in association proxy, or it might be a premature flushing issue, but if you can come up with how to reproduce that would be very helpful.


--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/Xn2eZ0gifLgJ.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Donald Stufft

unread,
Dec 14, 2012, 11:52:21 PM12/14/12
to sqlal...@googlegroups.com
Ok, so what I've got so far. I believe it's related to the association_proxy as (Using my application code, not the test case):

v = Version.query.first()
v._classifiers = [Classifier.get_or_create(u"Foo")]
db.session.commit()
v._classifiers = [Classifier.get_or_create(u"Foo"), Classifier.get_or_create(u"Bar")]
db.session.commit()

Works, but if I then do:

v.classifiers = [u"Foo"]
db.session.commit()

I get IntegrityError: (IntegrityError) duplicate key value violates unique constraint "idx_version_classifiers"

So now i'm seeing what I can do with the test case you sent me to get it to exhibit the same behaviors :)

Donald Stufft

unread,
Dec 14, 2012, 11:59:17 PM12/14/12
to sqlal...@googlegroups.com
I'm not all that familar with SQLA internals. But here's something interesting.

>>> import test
>>> test.v.classifiers = [u"Foo"]
2012-12-14 23:53:15,291 INFO sqlalchemy.engine.base.Engine DELETE FROM version_classifiers WHERE version_classifiers.classifier_id = ? AND version_classifiers.version_id = ?
2012-12-14 23:53:15,291 INFO sqlalchemy.engine.base.Engine ((1, 1), (2, 1))
2012-12-14 23:53:15,291 INFO sqlalchemy.engine.base.Engine SELECT classifiers.id AS classifiers_id, classifiers.trove AS classifiers_trove 
FROM classifiers 
WHERE classifiers.trove = ?
2012-12-14 23:53:15,291 INFO sqlalchemy.engine.base.Engine (u'Foo',)

>>> from warehouse.models import Version
>>> v = Version.query.first()
>>> v.classifiers = [u"Foo"]
2012-12-14 23:55:00,951 INFO sqlalchemy.engine.base.Engine SELECT classifiers.id AS classifiers_id, classifiers.trove AS classifiers_trove 
FROM classifiers 
WHERE classifiers.trove = %(trove_1)s
2012-12-14 23:55:00,951 INFO sqlalchemy.engine.base.Engine {'trove_1': u'Foo'}

Notice how the test code is executing a delete statement beforehand while my app's copy is only selecting the classifier.

Donald Stufft

unread,
Dec 15, 2012, 12:06:28 AM12/15/12
to sqlal...@googlegroups.com
So it appears the problem is with UniqueConstraint vs Index(..., unique=True).

Donald Stufft

unread,
Dec 15, 2012, 12:12:26 AM12/15/12
to sqlal...@googlegroups.com
Ugh nevermind me. It's late and I forgot to name the Index :/

Donald Stufft

unread,
Dec 15, 2012, 11:31:23 AM12/15/12
to sqlal...@googlegroups.com
Fooled around with this some more. And i'm pretty sure it's got to be something with Flask-SQLAlchemy now. I spent a few hours futzing with the test case and was unable to make it do anything, but as soon as I switched it to Flask-SQLAlchemy (https://gist.github.com/7f15df7a2d20d9736fed) The IntegrityError came back. So now I have a new place to go bother to figure out why :)

Michael Bayer

unread,
Dec 16, 2012, 2:12:40 PM12/16/12
to sqlal...@googlegroups.com
Flask-SQLAlchemy sets the Session to autoflush=False.   That's pretty much the difference here, and "lazy='dynamic'" works poorly with this setting.  There is sort of a bug-like behavior I can pull out of it here where I see the history is getting set incorrectly, as I suspected, and I should look into that.

But for now just flushing within the get_or_create() step (or setting autoflush=True, or not using lazy='dynamic') will resolve this issue:

    @classmethod
    def get_or_create(cls, trove):
        try:
            sess.flush()
            obj = sess.query(cls).filter_by(trove=trove).one()
        except NoResultFound:
            obj = cls(trove)
        return obj


 

To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/w7AbNMBU6VIJ.

Donald Stufft

unread,
Dec 16, 2012, 2:53:45 PM12/16/12
to sqlal...@googlegroups.com
Hey thanks a ton!

Looking at the Flask-SQLAlchemy history, it seems autoflush=False has been the Flask-SQLAlchemy default since the initial checkin. Not being all that educated on when you'd want false/true is there any good reason for me to not just restore the SQLAlchemy default of autoflush=True?

Michael Bayer

unread,
Dec 18, 2012, 7:06:31 PM12/18/12
to sqlal...@googlegroups.com
I stick with autoflush=True, and flask probably shouldn't make a decision here.  But there really shouldn't be any major behavioral changes with autoflush=False other than data stays pending longer, so dynamic here definitely needs some fixes.

Michael Bayer

unread,
Dec 22, 2012, 6:46:42 PM12/22/12
to sqlal...@googlegroups.com
I finally figured out the real cause of the issue here, a method was missing from the "dynamic" attribute implementation that was needed for collections.    A fix + test for that as well as some other backref-related fixes are all ready to go for the next 0.8 release.    Part of the fix here is potentially not hard to backport to 0.7 but I haven't decided if I want to get into that here.

Donald Stufft

unread,
Dec 22, 2012, 6:47:42 PM12/22/12
to sqlal...@googlegroups.com
Awesome :) Glad my problems could help make SQLAlchemy a little bit better :)
Reply all
Reply to author
Forward
0 new messages