duplicate an object

3,814 views
Skip to first unread message

moo...@posteo.org

unread,
Mar 6, 2015, 9:17:43 PM3/6/15
to sqlalchemy
I am using SQLAlchemy with PostgreSQL and Python3.

Is there a in-build-function to duplicate the instance of a data object
(except the primary key)?

Somebody suggested me this self-build function but I would like to use
the SQLAlchemy-way if there is one.

From <http://www.python-forum.de/viewtopic.php?p=273016#p273016>

def duplicate(self):
arguments = dict()
for name, column in self.__mapper__.columns.items():
if not (column.primary_key or column.unique):
arguments[name] = getattr(self, name)
return self.__class__(**arguments)

moo...@posteo.org

unread,
Mar 8, 2015, 10:41:02 AM3/8/15
to sqlal...@googlegroups.com
On 2015-03-07 03:17 <moo...@posteo.org> wrote:
> Is there a in-build-function to duplicate the instance of a data
> object (except the primary key)?

Isn't there a way for this?

Does the sqla-devs reading this list?

Michael Bayer

unread,
Mar 8, 2015, 11:17:51 AM3/8/15
to sqlal...@googlegroups.com
there’s no particular “SQLAlchemy way” to do this, there’s many ways to copy
an object in Python in general as well as in SQLAlchemy, it depends on your
needs. If the object has lots of attributes that aren’t mapped, then clearly
that's outside the realm of SQLAlchemy. If you want to build something based
on inspection of attributes that are mapped, you can build something using
the inspection system where you go through properties like mapper.attrs
(docs are down at the moment):

from sqlalchemy import inspect

mapper = inspect(MyClass)

new_obj = MyClass()
for attr in mapper.attrs:
setattr(new_obj, attr.key, getattr(old_obj, attr.key))

that’s obviously not a very slick approach but mapper.attrs,
mapper.column_attrs, etc. give you access to every attribute that the mapper
knows about.

For me I usually keep things extremely simple and just build a copy method:

class MyClass(Base):
def copy(self):
return MyClass(
a = self.a,
b = self.b,
# ...
)

That’s how I’d normally do this, but of course nobody likes that system. The
copy method has the advantage in that it can easily accommodate your special
use cases without trying to make it guess everything.

There’s also ways to do it with copy.copy() though some special steps are
needed at the moment to give the new object an independent state, probably
hold off on that method until we add some APIs to support that use case.

Docs are down for a linode maintenance window but will be up within an hour
or two.

> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

moo...@posteo.org

unread,
Mar 11, 2015, 10:28:07 PM3/11/15
to sqlal...@googlegroups.com
On 2015-03-08 11:17 Michael Bayer <mik...@zzzcomputing.com> wrote:
> there’s no particular “SQLAlchemy way” to do this,

What is about
make_transient()
?

I don't understand this function 100%tly.

moo...@posteo.org

unread,
Mar 11, 2015, 11:30:58 PM3/11/15
to sqlal...@googlegroups.com
On 2015-03-08 11:17 Michael Bayer <mik...@zzzcomputing.com> wrote:
> new_obj = MyClass()
> for attr in mapper.attrs:
> setattr(new_obj, attr.key, getattr(old_obj, attr.key))

This would copy everything including primary keys and unique members.

I have hard problems with the SQLA-docu. I know that 'attr' is from
type 'class ColumnProperty'. But I am not able to found out the members
of it. I can not see a 'primary_key' or a 'unique' member inside of it.

How can I "ask" if it is a primary key or unique?

And much better for the future: How do I have to use the SQLA-docu to
solve such problems myself? What do I understand wrong in that docu?

Michael Bayer

unread,
Mar 12, 2015, 9:53:50 AM3/12/15
to sqlal...@googlegroups.com


moo...@posteo.org wrote:

> On 2015-03-08 11:17 Michael Bayer <mik...@zzzcomputing.com> wrote:
>> new_obj = MyClass()
>> for attr in mapper.attrs:
>> setattr(new_obj, attr.key, getattr(old_obj, attr.key))
>
> This would copy everything including primary keys and unique members.

which is why, “copy an object” is not an out of the box thing. Because
nobody really wants a full “copy”, it’s something custom.

> I have hard problems with the SQLA-docu. I know that 'attr' is from
> type 'class ColumnProperty'. But I am not able to found out the members
> of it. I can not see a 'primary_key' or a 'unique' member inside of it.
>
> How can I "ask" if it is a primary key or unique?

mapper = inspect(MyClass)

is_pk = mapper.attrs.somekey.columns[0].primary_key

for “unique”, there are multiple definitions for “unique”. Do you consider a
column to be “unique” if it is part of a composite unique constraint or
index (e.g. has other columns)? I don’t know. Let’s assume you do:

for constraint in mapper.local_table.constraints:
if isinstance(constraint, UniqueConstraint):
if constraint.columns.contains_column(mapper.attrs.somekey.columns[0]):
is_unique = True

for index in mapper.local_table.indexes:
if index.unique and index.columns.contains_column(mapper.attrs.somekey.columns[0]):
is_unique = True


>
> And much better for the future: How do I have to use the SQLA-docu to
> solve such problems myself? What do I understand wrong in that docu?

use the search feature and familiarize with all the public accessors on the
handful of objects that are always involved in these kinds of things, Table,
Mapper, ColumnProperty, ColumnCollection, Constraint (that last one is missing, will add).

Michael Bayer

unread,
Mar 12, 2015, 10:28:54 AM3/12/15
to sqlal...@googlegroups.com
it changes the state of an object from persistent to transient. the states
are documented at
http://docs.sqlalchemy.org/en/rel_0_9/orm/session_state_management.html#quickie-intro-to-object-states.

I think the concept that might not be clear here is that the “foo.id” attribute on your object (assuming “id” is your primary key) and the “identity key” referred to in the documentation are not the same thing. The “identity key” is the primary key of an object, *in the database*. I can make an object: Foo(id=5), “5” is the value of the “id” attribute which corresponds to what would be the primary key column in the database, but there’s no “5” in the database. No identity key. I persist it, INSERT INTO table VALUES (5), now there’s a “5” in the database, now there’s an identity key.

this is a copy routine I recently implemented for openstack, perhaps it will
be helpful:

class Base(object):

def __copy__(self):
"""Implement a safe copy.copy().

SQLAlchemy-mapped objects travel with an object
called an InstanceState, which is pegged to that object
specifically and tracks everything about that object. It's
critical within all attribute operations, including gets
and deferred loading. This object definitely cannot be
shared among two instances, and must be handled.

The copy routine here makes use of session.merge() which
already essentially implements a "copy" style of operation,
which produces a new instance with a new InstanceState and copies
all the data along mapped attributes without using any SQL.

The mode we are using here has the caveat that the given object
must be "clean", e.g. that it has no database-loaded state
that has been updated and not flushed. This is a good thing,
as creating a copy of an object including non-flushed, pending
database state is probably not a good idea; neither represents
what the actual row looks like, and only one should be flushed.

"""
session = orm.Session()

copy = session.merge(self, load=False)
session.expunge(copy)
return copy

moo...@posteo.org

unread,
Mar 13, 2015, 4:26:57 PM3/13/15
to sqlal...@googlegroups.com
On 2015-03-12 09:53 Michael Bayer <mik...@zzzcomputing.com> wrote:
> which is why, “copy an object” is not an out of the box thing. Because
> nobody really wants a full “copy”, it’s something custom.

As I described I don't want a "full copy". I only want to copy the "real
data" handled by the user. All organizing/protocol data from SQLA or
from me (e.g. the oid) as the developer shouldn't be copied.

The use-case is simple:
The user want to add new data which just differ a little from the last
time data was added. So it is more ergonomic to offer a new-data-dialog
with preset values from the last data-entry instead of an clean empty
new-data-dialog. In the latter the user need to type in all data again
but e.g. 80% of it is the same then yesterday. Understand?

What do you think about this solution:
<https://stackoverflow.com/questions/29039635/how-to-duplicate-an-sqlalchemy-mapped-object-the-correct-way>

Jonathan Vanasco

unread,
Mar 13, 2015, 4:58:17 PM3/13/15
to sqlal...@googlegroups.com, moo...@posteo.org
Use a mixin on the classes to create a dict of the column values.  Then create a new object with that dict. 

You can customize the mixin to include or exclude columns as you wish.

I use this approach for revision tracking.  


Class Duplicable(object)

@property
def columns_as_dict(self):
return = dict((col.name, getattr(self, col.name)) for col in sqlalchemy_orm.class_mapper(self.__class__).mapped_table.c)

# ---- declare which columns to copy

_columns_duplicable = []

@property
def columns_as_dict(self):
cols = dict((col.name, getattr(self, col.name)) for col in sqlalchemy_orm.class_mapper(self.__class__).mapped_table.c)
return [c for c in cols if c in self._columns_duplicable]

# ---- declare which columns to exclude

_columns_notduplicable = []

@property
def columns_as_dict(self):
cols = dict((col.name, getattr(self, col.name)) for col in sqlalchemy_orm.class_mapper(self.__class__).mapped_table.c)
return [c for c in cols if c not in self._columns_notduplicable]
class MyClass(Base, Duplicable):
_columns_duplicable = [ 'a', 'b', 'c', ]

object_data = object.columns_as_dict

Michael Bayer

unread,
Mar 13, 2015, 5:24:33 PM3/13/15
to sqlal...@googlegroups.com


moo...@posteo.org wrote:

> On 2015-03-12 09:53 Michael Bayer <mik...@zzzcomputing.com> wrote:
>> which is why, “copy an object” is not an out of the box thing. Because
>> nobody really wants a full “copy”, it’s something custom.
>
> As I described I don't want a "full copy". I only want to copy the "real
> data" handled by the user. All organizing/protocol data from SQLA or
> from me (e.g. the oid) as the developer shouldn't be copied.

SQLA knows nothing about what data in the row is that which you care about
and that which you don’t, so the names of attributes you care about have to
be be arrived at in some way, either hardcoded in a list, or using an algorithmic
approach as you have in your stack overflow answer. Even a primary key
should not be assumed to be “protocol” data, lots of databases use
natural primary keys.


> The use-case is simple:
> The user want to add new data which just differ a little from the last
> time data was added. So it is more ergonomic to offer a new-data-dialog
> with preset values from the last data-entry instead of an clean empty
> new-data-dialog. In the latter the user need to type in all data again
> but e.g. 80% of it is the same then yesterday. Understand?

I understand completely - you have a custom copy use case where you only
care about particular attributes. But what I’m hoping you also understand is that
SQLAlchemy will never make a decision like this for you. You have to
come up with a system on your end to identify which ones you care about and which
ones you don’t.
if it works for you, then it’s great. It’s not a generalized solution for
the whole world, though. It doesn’t take into account columns that are part
of unique Index objects or columns that are mentioned in standalone
UniqueConstraint objects. I’d also imagine a real-world GUI attempting to
present an object with default values filled in would have a lot more
columns in the DB that aren’t exposed. The fact that you want to block only
“PK” and single-column “unique” constraints is quite arbitrary.

moo...@posteo.org

unread,
Mar 14, 2015, 5:59:44 AM3/14/15
to sqlal...@googlegroups.com
On 2015-03-13 17:24 Michael Bayer <mik...@zzzcomputing.com> wrote:
> if it works for you, then it’s great. It’s not a generalized solution
> for the whole world, though. It doesn’t take into account columns
> that are part of unique Index objects or columns that are mentioned
> in standalone UniqueConstraint objects.

Thank you very much for bringing my view "beyond my own nose". ;)
I know nothing about "unique Index objects". :D
I added and linked your comment to the stackoverflow-answer. Hope this
is ok for you?

moo...@posteo.org

unread,
Mar 14, 2015, 6:01:58 AM3/14/15
to sqlal...@googlegroups.com
On 2015-03-13 13:58 Jonathan Vanasco <jona...@findmeon.com> wrote:
> Use a mixin on the classes to create a dict of the column values.
> Then create a new object with that dict.
> [..]

Looks complex. My solution has only one function, work for nearly all
simple declarative_base() derived classes.

When I understand your solution correct I have to create a dict for
each new Class I add.

Can you please explain, what you mean with "revision tracking"?

Jonathan Vanasco

unread,
Mar 14, 2015, 11:40:42 AM3/14/15
to sqlal...@googlegroups.com, moo...@posteo.org
Thats a mixin class that can be added onto any object inheriting from declared_base.   You only need one of the `columns_as_dict` options.  I showed 3 variations: one that gives all columns, one that lets you specify which columns to include, and another that shows which columns to exclude.    You could adjust the defs to return a new object instead of a dict.

I find this solution to be more flexible than using entirely automatic properties, because I often have multiple-column primary keys and misc columns that are defined by the backend and computed by the application (ie, not editable or used by admins or users).  

In your situation you could have a class that does this:

     class YourClassA(Base, Duplicable):
           _columns_notduplicable = ['primary key', 'other private column', 'another private column']

     class YourClassB(Base, Duplicable):
           _columns_notduplicable = ['primary key', 'other private column', 'another private column']

     a = YourClassA()
     as_dict = a.columns_as_dict

     a = YourClassB()
     as_dict = b.columns_as_dict

I use something similar when I do Revision Tracking of objects in the database -- I call "columns_as_dict" to get a dict of all the relevant columns, then serialize that dict to the database with a timestamp.

Reply all
Reply to author
Forward
0 new messages