Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

SQLAlchemy is trying to insert duplicate rows in an M2M Relationship

2,383 views
Skip to first unread message

Donald Stufft

unread,
Dec 14, 2012, 10:22:41 PM12/14/12
to sqlal...@googlegroups.com
I have 2 tables with a third intermediary table. These tables are (shorted): https://gist.github.com/9ff8afa793c9150c6b70

Using this the association_proxy correctly reuses existing rows in the database if they already exist. However if I do this:

v = Version.query.first()
v.classifiers = [u"Foo"]
db.session.commit()  # A Classifier with trove=u"Foo" is either retrieved or created

v.classifiers = [u"Foo", u"Bar"]
db.session.commit()

An error occurs because SQLAlchemy tried to insert a second row in the intermediary table for this Version + Classifier(u"Foo").

So my question is how can I get it to properly handle the fact that there should only ever be 1 mapping of a particular Version
to a particular Classifier?

Note, I can work around this by doing:

classifiers = [u"Foo", u"Bar"]
for c in classifiers:
    if not c in v.classifiers:
        v.classifiers.append(c)

But I would really rather have a solution that is contained inside of my Model.

Michael Bayer

unread,
Dec 14, 2012, 11:30:57 PM12/14/12
to sqlal...@googlegroups.com
I've cobbled together a complete and simplified test case given your mapping and example code and I cannot reproduce with either 0.7 or 0.8 - the count of rows in the association table is one on the first commit, and two on the second.

You need to adapt the attached test case into a full reproducing case so that the specific trigger is illustrated...thanks.

test.py

Donald Stufft

unread,
Dec 14, 2012, 11:38:26 PM12/14/12
to sqlal...@googlegroups.com
Hrm. I'll see what I can do. Though looking at what you posted it works for me with that too.. So the problem must either be with Flask-SQLAlchemy or with my own app code.

Michael Bayer

unread,
Dec 14, 2012, 11:43:19 PM12/14/12
to sqlal...@googlegroups.com
its probably some subtlety to the data that's already loaded and how the collection is being mutated - it's unlikely Flask has anything to do with it.   There may or may not be some less-than-ideal or buggy behavior in association proxy, or it might be a premature flushing issue, but if you can come up with how to reproduce that would be very helpful.


--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/Xn2eZ0gifLgJ.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Donald Stufft

unread,
Dec 14, 2012, 11:52:21 PM12/14/12
to sqlal...@googlegroups.com
Ok, so what I've got so far. I believe it's related to the association_proxy as (Using my application code, not the test case):

v = Version.query.first()
v._classifiers = [Classifier.get_or_create(u"Foo")]
db.session.commit()
v._classifiers = [Classifier.get_or_create(u"Foo"), Classifier.get_or_create(u"Bar")]
db.session.commit()

Works, but if I then do:

v.classifiers = [u"Foo"]
db.session.commit()

I get IntegrityError: (IntegrityError) duplicate key value violates unique constraint "idx_version_classifiers"

So now i'm seeing what I can do with the test case you sent me to get it to exhibit the same behaviors :)

Donald Stufft

unread,
Dec 14, 2012, 11:59:17 PM12/14/12
to sqlal...@googlegroups.com
I'm not all that familar with SQLA internals. But here's something interesting.

>>> import test
>>> test.v.classifiers = [u"Foo"]
2012-12-14 23:53:15,291 INFO sqlalchemy.engine.base.Engine DELETE FROM version_classifiers WHERE version_classifiers.classifier_id = ? AND version_classifiers.version_id = ?
2012-12-14 23:53:15,291 INFO sqlalchemy.engine.base.Engine ((1, 1), (2, 1))
2012-12-14 23:53:15,291 INFO sqlalchemy.engine.base.Engine SELECT classifiers.id AS classifiers_id, classifiers.trove AS classifiers_trove 
FROM classifiers 
WHERE classifiers.trove = ?
2012-12-14 23:53:15,291 INFO sqlalchemy.engine.base.Engine (u'Foo',)

>>> from warehouse.models import Version
>>> v = Version.query.first()
>>> v.classifiers = [u"Foo"]
2012-12-14 23:55:00,951 INFO sqlalchemy.engine.base.Engine SELECT classifiers.id AS classifiers_id, classifiers.trove AS classifiers_trove 
FROM classifiers 
WHERE classifiers.trove = %(trove_1)s
2012-12-14 23:55:00,951 INFO sqlalchemy.engine.base.Engine {'trove_1': u'Foo'}

Notice how the test code is executing a delete statement beforehand while my app's copy is only selecting the classifier.

Donald Stufft

unread,
Dec 15, 2012, 12:06:28 AM12/15/12
to sqlal...@googlegroups.com
So it appears the problem is with UniqueConstraint vs Index(..., unique=True).

Donald Stufft

unread,
Dec 15, 2012, 12:12:26 AM12/15/12
to sqlal...@googlegroups.com
Ugh nevermind me. It's late and I forgot to name the Index :/

Donald Stufft

unread,
Dec 15, 2012, 11:31:23 AM12/15/12
to sqlal...@googlegroups.com
Fooled around with this some more. And i'm pretty sure it's got to be something with Flask-SQLAlchemy now. I spent a few hours futzing with the test case and was unable to make it do anything, but as soon as I switched it to Flask-SQLAlchemy (https://gist.github.com/7f15df7a2d20d9736fed) The IntegrityError came back. So now I have a new place to go bother to figure out why :)

Michael Bayer

unread,
Dec 16, 2012, 2:12:40 PM12/16/12
to sqlal...@googlegroups.com
Flask-SQLAlchemy sets the Session to autoflush=False.   That's pretty much the difference here, and "lazy='dynamic'" works poorly with this setting.  There is sort of a bug-like behavior I can pull out of it here where I see the history is getting set incorrectly, as I suspected, and I should look into that.

But for now just flushing within the get_or_create() step (or setting autoflush=True, or not using lazy='dynamic') will resolve this issue:

    @classmethod
    def get_or_create(cls, trove):
        try:
            sess.flush()
            obj = sess.query(cls).filter_by(trove=trove).one()
        except NoResultFound:
            obj = cls(trove)
        return obj


 

To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/w7AbNMBU6VIJ.

Donald Stufft

unread,
Dec 16, 2012, 2:53:45 PM12/16/12
to sqlal...@googlegroups.com
Hey thanks a ton!

Looking at the Flask-SQLAlchemy history, it seems autoflush=False has been the Flask-SQLAlchemy default since the initial checkin. Not being all that educated on when you'd want false/true is there any good reason for me to not just restore the SQLAlchemy default of autoflush=True?

Michael Bayer

unread,
Dec 18, 2012, 7:06:31 PM12/18/12
to sqlal...@googlegroups.com
I stick with autoflush=True, and flask probably shouldn't make a decision here.  But there really shouldn't be any major behavioral changes with autoflush=False other than data stays pending longer, so dynamic here definitely needs some fixes.

Michael Bayer

unread,
Dec 22, 2012, 6:46:42 PM12/22/12
to sqlal...@googlegroups.com
I finally figured out the real cause of the issue here, a method was missing from the "dynamic" attribute implementation that was needed for collections.    A fix + test for that as well as some other backref-related fixes are all ready to go for the next 0.8 release.    Part of the fix here is potentially not hard to backport to 0.7 but I haven't decided if I want to get into that here.

Donald Stufft

unread,
Dec 22, 2012, 6:47:42 PM12/22/12
to sqlal...@googlegroups.com
Awesome :) Glad my problems could help make SQLAlchemy a little bit better :)
Reply all
Reply to author
Forward
0 new messages