Copying Between Databases - Can I modify part of a Composite Key?

Shawn Wheatley

unread,

Sep 26, 2012, 1:33:39 PM9/26/12

to sqlal...@googlegroups.com

I have the following ORM class defined:

class CiAcctK(DeclarativeBase):
    __tablename__ = 'ci_acct_k'

    __table_args__ = {}

    #column definitions
    acct_id = Column(u'acct_id', CHAR(length=10), ForeignKey('ci_acct.acct_id'), primary_key=True, nullable=False)
    env_id = Column(u'env_id', NUMERIC(precision=6, scale=0), primary_key=True, nullable=False)

I'm trying to use this class in a script to copy selected records from my production database (PROD) into my testing/development (DEV) instances. The "env_id" column is defined per environment--for ease we'll say PROD is "4" and DEV is "1". I was looking for the quickest way to set all "env_id" values to "1" as they are either read from PROD or before they are inserted into DEV.

The issue is when I have an account that already exists in the DEV. If I had the record {acct_id: 1, env_id: 1} in DEV, I could read in a row from PROD with { acct_id: 1, env_id: 4} and attempting to insert will trigger a unique constraint violation (a separate constraint from the key constraints defined above).

My first attempt was to write a before_insert/before_update event handler:

@event.listens_for(CiAcctK, "before_update")
@event.listens_for(CiAcctK, "before_insert")
def CiAcctK_gen_default(mapper, connection, instance):
    instance.env_id = ENV_ID   #ENV_ID=1

I saw "env_id" being updated to 1, but SQLAlchemy isn't aware of the non-key constraint and still tries to INSERT the updated row. Now there is a key constraint violation--both "acct_id" and "env_id" are identical to an existing row.

Searching through the documentation, I found another event "translate_row" that looks like what I would want to use, but the "row" object passed to the event handler method is a RowProxy and doesn't allow the row to be changed.

Which is the right approach, or is there a better approach? What am I doing wrong? If I add additional code to the database declaration for the additional constraint, will SA know to use an UPDATE instead?

Thanks,
Shawn

Michael Bayer

unread,

Sep 26, 2012, 1:53:24 PM9/26/12

to sqlal...@googlegroups.com

On Sep 26, 2012, at 1:33 PM, Shawn Wheatley wrote:

> I have the following ORM class defined:
>
> class CiAcctK(DeclarativeBase):
> __tablename__ = 'ci_acct_k'
>
> __table_args__ = {}
>
> #column definitions
> acct_id = Column(u'acct_id', CHAR(length=10), ForeignKey('ci_acct.acct_id'), primary_key=True, nullable=False)
> env_id = Column(u'env_id', NUMERIC(precision=6, scale=0), primary_key=True, nullable=False)
>
> I'm trying to use this class in a script to copy selected records from my production database (PROD) into my testing/development (DEV) instances. The "env_id" column is defined per environment--for ease we'll say PROD is "4" and DEV is "1". I was looking for the quickest way to set all "env_id" values to "1" as they are either read from PROD or before they are inserted into DEV.
>
> The issue is when I have an account that already exists in the DEV. If I had the record {acct_id: 1, env_id: 1} in DEV, I could read in a row from PROD with { acct_id: 1, env_id: 4} and attempting to insert will trigger a unique constraint violation (a separate constraint from the key constraints defined above).
>
> My first attempt was to write a before_insert/before_update event handler:
>
> @event.listens_for(CiAcctK, "before_update")
> @event.listens_for(CiAcctK, "before_insert")
> def CiAcctK_gen_default(mapper, connection, instance):
> instance.env_id = ENV_ID #ENV_ID=1
>
> I saw "env_id" being updated to 1, but SQLAlchemy isn't aware of the non-key constraint and still tries to INSERT the updated row. Now there is a key constraint violation--both "acct_id" and "env_id" are identical to an existing row.

the most idiomatic way to handle this is to merge the objects in:

obj = session.merge(existing_object)

this will emit a SELECT for the existing row, then copy the state of "existing_object" to an object located for that primary key, if found. It ensures that the correct choice of "pending" or "persistent" is made depending on if the row already exists.

Shawn Wheatley

unread,

Sep 27, 2012, 9:21:57 AM9/27/12

to sqlal...@googlegroups.com

the most idiomatic way to handle this is to merge the objects in:

obj = session.merge(existing_object)

this will emit a SELECT for the existing row, then copy the state of "existing_object" to an object located for that primary key, if found. It ensures that the correct choice of "pending" or "persistent" is made depending on if the row already exists.

Thanks for your response Michael. It wasn't clear from my original post, but I am using merge to copy from PROD to DEV. My merge function looks something like this (simplified, but I'm copying multiple entities)

session_dest.merge(entity)
session_dest.commit()
session_dest.expunge_all() # large object graphs were causing me to run low on memory, so I merge them one at a time and then clear the local cache.

So, assuming DEV has a single record {acct_id: 1, env_id: 1} and I'm copying a record {acct_id: 1, env_id: 4} from PROD, it incorrectly thinks that this record should be INSERTed, when in fact there is a constraint (acct_id must be unique) that prevents this.

The more I evaluate this, the more I think that correctly modeling the unique constraint will fix my problem. Then my before_update handler would function but would properly UPDATE the record.

Shawn

Shawn Wheatley

unread,

Sep 27, 2012, 9:34:22 AM9/27/12

to sqlal...@googlegroups.com

I updated my class to include a UniqueConstraint on acct_id:

class CiAcctK(DeclarativeBase):
    __tablename__ = 'ci_acct_k'

    __table_args__ = {}

    #column definitions
    acct_id = Column(u'acct_id', CHAR(length=10), ForeignKey('ci_acct.acct_id'), primary_key=True, nullable=False)
    env_id = Column(u'env_id', NUMERIC(precision=6, scale=0), primary_key=True, nullable=False)

    uix_1 = UniqueConstraint(u'acct_id')

I also separated out my event handlers:

@event.listens_for(CiAcctK, "before_update")
def CiAcctK_gen_default_upd(mapper, connection, instance):
    print "got here! update"
    instance.env_id = ENV_ID

@event.listens_for(CiAcctK, "before_insert")
def CiAcctK_gen_default_ins(mapper, connection, instance):
    print "got here! insert"
    instance.env_id = ENV_ID

Sure enough, when executing the session.merge, SA is trying to INSERT the record {acct_id: 1, env_id: 4} even though acct_id: 1 already exists. Any thoughts on what I'm doing wrong?

Shawn

Simon King

unread,

Sep 27, 2012, 9:59:55 AM9/27/12

to sqlal...@googlegroups.com

session.merge only looks at the primary key of the instance you are
inserting (it pays no attention to unique constraints). In your
example, the table contains a single row with PK (1, 1), and you are
merging an instance with PK (1, 4). SA sees these as different, so it
tries to INSERT the new row.

If (1, 4) is actually the new version of (1, 1), then I would say that
your primary key should just be the acct_id column, not the pair of
(acct_id, env_id). SA will then see them as the same row, and UPDATE
rather than INSERT.

Simon

Shawn Wheatley

unread,

Sep 27, 2012, 1:06:14 PM9/27/12

to sqlal...@googlegroups.com

session.merge only looks at the primary key of the instance you are
inserting (it pays no attention to unique constraints). In your
example, the table contains a single row with PK (1, 1), and you are
merging an instance with PK (1, 4). SA sees these as different, so it
tries to INSERT the new row.

If (1, 4) is actually the new version of (1, 1), then I would say that
your primary key should just be the acct_id column, not the pair of
(acct_id, env_id). SA will then see them as the same row, and UPDATE
rather than INSERT.

Simon

While I agree with you philosophically, Simon, this utility is for an off-the-shelf application, so I can't make changes to the underlying data structure. I could, however, remove "env_id" as a primary key:

class CiAcctK(DeclarativeBase):
    __tablename__ = 'ci_acct_k'

    __table_args__ = {}

    #column definitions

acct_id = Column(u'acct_id', CHAR(length=10), ForeignKey('ci_acct.acct_id'), primary_key=True, unique=True, nullable=False)
env_id = Column(u'env_id', NUMERIC(precision=6, scale=0), primary_key=False, nullable=False)

Will this mapping cause any issues, given that the acct_id/env_id pair really *is* the primary key?

Shawn

Shawn Wheatley

unread,

Sep 27, 2012, 1:24:42 PM9/27/12

to sqlal...@googlegroups.com

Clarification, I can't make changes to the underlying *database* (i.e. change the PK) but I can change my SA mapping. I did make the mapping change and it seems to run like a champ now, updating the env_id before either inserting or updating the row. Thanks everyone for your help!

Shawn

Michael Bayer

unread,

Sep 27, 2012, 2:02:50 PM9/27/12

to sqlal...@googlegroups.com

To answer the actual subject of the thread, SQLAlchemy will also update the PK itself if you do in fact change the primary key on the instance as loaded. but you'd have to roll a pseduo-merge like function to do this:

old_object = session.query(Class).get((acct_id, env_id))

old_object.env_id = <new id>

session.flush() # will do an UPDATE

the key above is that old_object has a "key" internally that represents the primary key it was loaded from. when that diverges from the attribute values, it gets changed.

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/LQn_FTa2OOIJ.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Reply all

Reply to author

Forward