Trying to understand FlushErrors when merging

52 views
Skip to first unread message

Jacob Pavlock

unread,
Sep 10, 2020, 9:44:04 PM9/10/20
to sqlalchemy
Hello! I am new to sqlalchemy and was hoping to get some insight on the following issue:
import sqlalchemy
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.hybrid import hybrid_property
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy.schema import ForeignKeyConstraint

Session = sessionmaker()
Base = declarative_base()

class Album(Base):

    __tablename__ = "albums"

    artist = Column(String, nullable=False, primary_key=True)
    title = Column(String, nullable=False, primary_key=True)
    year = Column(Integer, nullable=False, primary_key=True)

    tracks = relationship(
        "Track", back_populates="_album_obj", cascade="all, delete-orphan"
    )

    def __init__(self, artist: str, title: str, year: int):
        self.artist = artist
        self.title = title
        self.year = year

class Track(Base):  # noqa: WPS230
    __tablename__ = "tracks"

    # track_num + Album = unique track
    track_num = Column(Integer, nullable=False, primary_key=True, autoincrement=False)
    _albumartist = Column(String, nullable=False, primary_key=True)
    _album = Column(String, nullable=False, primary_key=True)
    _year = Column(Integer, nullable=False, primary_key=True, autoincrement=False)

    artist = Column(String, nullable=False, default="")
    title = Column(String, nullable=False, default="")

    _album_obj = relationship("Album", back_populates="tracks")

    __table_args__ = (
        ForeignKeyConstraint(
            [_albumartist, _album, _year],  # type: ignore
            [Album.artist, Album.title, Album.year],
        ),
    )

    @hybrid_property
    def album(self) -> str:
        return self._album_obj.title

    @hybrid_property
    def albumartist(self) -> str:
        return self._album_obj.artist

    @hybrid_property
    def year(self) -> int:
        return self._album_obj.year

    def __init__(self, album: str, albumartist: str, track_num: int, year: int):
        self._album_obj = Album(artist=albumartist, title=album, year=year)
        self.track_num = track_num
        # self._album = album
        # self._albumartist = albumartist
        # self._year = year

engine = sqlalchemy.create_engine("sqlite:///:memory:", echo=True)
Session.configure(bind=engine)
Base.metadata.create_all(engine)  # creates tables
session = Session()

track1 = Track(album="tpab", albumartist="K.dot", track_num=1, year=2015)
track2 = Track(album="tpab", albumartist="K.dot", track_num=1, year=2015)

session.merge(track1)
session.merge(track2)
session.commit()
When executing the above code, I get
Traceback (most recent call last):
  File "test_merge.py", line 81, in <module>
    session.commit()
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1042, in commit
    self.transaction.commit()
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 504, in commit
    self._prepare_impl()
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 483, in _prepare_impl
    self.session.flush()
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2523, in flush
    self._flush(objects)
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2664, in _flush
    transaction.rollback(_capture_exception=True)
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2624, in _flush
    flush_context.execute()
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
    rec.execute(self)
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 586, in execute
    persistence.save_obj(
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 205, in save_obj
    for (
  File "/home/jacob/src/moe/.venv/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 405, in _organize_states_for_save
    raise orm_exc.FlushError(
sqlalchemy.orm.exc.FlushError: New instance <Track at 0x7f72ccd17f70> with identity key (<class '__main__.Track'>, (1, 'K.dot', 'tpab', 2015), None) conflicts with persistent instance <Track at 0x7f72ccc901c0>
But, If I uncomment
        # self._album = album
        # self._albumartist = albumartist
        # self._year = year
The code works fine. I don't quite understand why the above code is needed, and happened to add it just by random testing to see if I could pin down the error. I think the  section in the docs on merge tips is the closet thing I could find, but I can't quite understand if the same thing there is happening here. Any insight would be greatly appreciated!

Mike Bayer

unread,
Sep 10, 2020, 11:44:09 PM9/10/20
to noreply-spamdigest via sqlalchemy
When you create a Track object and don't assign the primary key attributes directly, the instance key of the object is all nulls:

Track.__mapper__.identity_key_from_instance(track1)
(<class '__main__.Track'>, (1, None, None, None), None)

Track.__mapper__.identity_key_from_instance(track2)
(<class '__main__.Track'>, (1, None, None, None), None)

this is because while the assignment of _album_obj will set the primary key attributes, that doesn't happen until the flush, so right now Track has no primary key.

merge() sees the first object, puts it in as a new object that's pending without a primary key.

then the second merge() call occurs. a flush proceeds first, which flushes track1, assigning the _album_obj which sets up the primary key attributes appropriately on track1.    track2 comes in, and the apparent primary key is again all nulls, so it again goes in as a pending object to be inserted.   the commit tries to flush and it fails because the object's primary key once calculated by _album_obj conflicts with what's in the database.  merge() had no idea you intended this to be the same primary key identity because those attributes are not set up front.

When you explicitly set up the three pk attributes on Track, then the merge() process can see the apparent primary key you had in mind and it can correctly match "track2" with the "track1" that's already in the database.

hope this helps!
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Jacob Pavlock

unread,
Sep 11, 2020, 2:26:25 AM9/11/20
to sqlalchemy

Mike,

Thanks for the great explanation and the quick response! I think am I starting to get it now. I guess the same or similar idea though leads to the following (somewhat confusing) behavior:

(I added a repr and setters for the hybrid properties)
track1 = Track(album="tpab", albumartist="K.dot", track_num=1, year=2015)
track2 = Track(album="tpab", albumartist="K.dot", track_num=1, year=2015
)

track2.album = "gkmc"
session.merge(track1)
session.merge(track2)
session.commit()

print(session.query(Track).all())
Results in
[K.dot - gkmc (2015): 1]

But
track1 = Track(album="tpab", albumartist="K.dot", track_num=1, year=2015)
track2 = Track(album="tpab", albumartist="K.dot", track_num=1, year=2015
)

track1.album = "gkmc"
session.merge(track1)
session.merge(track2)
session.commit()

print(session.query(Track).all())
Results in
[K.dot - gkmc (2015): 1, K.dot - tpab (2015): 1]

I feel like I'm starting to tread in non-standard waters here, and I am beginning to question whether my use of `session.merge()` is appropriate for my usecase. Any recommendations?

Mike Bayer

unread,
Sep 11, 2020, 8:02:51 AM9/11/20
to noreply-spamdigest via sqlalchemy


On Fri, Sep 11, 2020, at 2:26 AM, Jacob Pavlock wrote:

Mike,

Thanks for the great explanation and the quick response! I think am I starting to get it now. I guess the same or similar idea though leads to the following (somewhat confusing) behavior:

(I added a repr and setters for the hybrid properties)

track1 = Track(album="tpab", albumartist="K.dot", track_num=1, year=2015)
track2 = Track(album="tpab", albumartist="K.dot", track_num=1, year=2015)

track2.album = "gkmc"
session.merge(track1)
session.merge(track2)
session.commit()

print(session.query(Track).all())



Results in

[K.dot - gkmc (2015): 1]




But

track1 = Track(album="tpab", albumartist="K.dot", track_num=1, year=2015)
track2 = Track(album="tpab", albumartist="K.dot", track_num=1, year=2015)

track1.album = "gkmc"
session.merge(track1)
session.merge(track2)
session.commit()

print(session.query(Track).all())



Results in

[K.dot - gkmc (2015): 1, K.dot - tpab (2015): 1]



I feel like I'm starting to tread in non-standard waters here, and I am beginning to question whether my use of `session.merge()` is appropriate for my usecase. Any recommendations?

merge() is just a little weird in this regard, it's not my first go-to method for sure. it's based on the same idea in Hibernate.  I would not have ever thought of such a method on my own.

The specific issue in your example above is that you are setting "track1.album = "gkmc"", but you have this attribute set to be populated from that of your Album() object which I would assume is overwriting it during the flush.

that is, you have an Album object on your Track, it doesn't actually make sense to change track1.album = "foo" without changing that whole object.  

I guess what most people would do here would be to not use natural primary keys (e.g. use a surrogate primary key, like an integer).    your table rows would use less data and the model would be simpler to manipulate.


Jacob Pavlock

unread,
Sep 11, 2020, 3:10:54 PM9/11/20
to sqlalchemy

Awesome, thanks so much for the help and the ideas. I think I have some good info to go off of and definitely have a better understanding of how my application works wrt to sqlalchemy.
Reply all
Reply to author
Forward
0 new messages