Quick way to deep copy an object?

10,792 views
Skip to first unread message

picoplex

unread,
Sep 9, 2008, 11:59:09 AM9/9/08
to sqlalchemy
Is there a quick way to copy an object (including all its
relationships as well as its attributes)?

Regards,

Tim

a...@svilendobrev.com

unread,
Sep 9, 2008, 2:16:00 PM9/9/08
to sqlal...@googlegroups.com
u can try:
from copy import copy
dest = copy(src)
it has pickle semantics, YMMV - i have not tried on SA objects.
i use my own, see my post in this thread with some (complex) code:
http://groups.google.com/group/sqlalchemy/tree/browse_frm/thread/2369b78e0f8e21cb/4dd4979ca8501fba

Michael Bayer

unread,
Sep 9, 2008, 5:14:54 PM9/9/08
to sqlal...@googlegroups.com
copy.copy() is going to have issues since you'd be copying the
_sa_state of the instance as well. "copying" an object that is SQLA
mapped implies a particular use case in mind, depending on this use
case is what would determine how you get your "copy".

Would you like to get a "copy" that is simply the same object with the
same identity, just as present in another session (in which case use
merge()) ?

Or would you like to get a "copy" that is a transient (no database
identity) copy of that object, with all attributes except primary key
attributes copied over ? ( in which case you'd use a copy constructor
which possibly uses mapping metadata to traverse ) ?

Tim Jones

unread,
Sep 10, 2008, 4:39:43 AM9/10/08
to sqlal...@googlegroups.com
Michael,

Thanks for your reply.

The use case is configuration management.

We have complex (SQLAlchemy) objects with many relationships which we wish to up-version, i.e. we want an exact persistent copy (attribute values and relationships) of the original object, but with a different identity, which we will then modify.

The original object and all its relationships should remain unchanged (other than a relationship to its successor version).

We would want the copied object and all its relationships saved to the database before making changes to it.

Regards,

Tim


2008/9/9 Michael Bayer <mik...@zzzcomputing.com>

a...@svilendobrev.com

unread,
Sep 10, 2008, 5:43:33 AM9/10/08
to sqlal...@googlegroups.com
this is close to what i have - bitemporal versions of same object.
i dont have relation to succesors/predecessor (this is assumed from
object_id being same).

the practice shows there are attributes
- which u dont want to copy at all (dontcopy=list e.g. timestamps),
- which u dont want to copy in depth (dontcopyme=true, e.g. refs to
external things),
- which u dont want the particular op to copy - e.g. because have
been copied/changed by hand (dontcopy_now=list).
There also the case of explicit m2m assoc_objects which are handled
differently of all else.

so it has to be some manual copy, or a very special setup of
pickle-protocol-controls.

seems this is a common requirement, so i may put mine somewhere in
dbcook/.

have fun
svil

Michael Bayer

unread,
Sep 10, 2008, 10:59:55 AM9/10/08
to sqlal...@googlegroups.com

On Sep 10, 2008, at 4:39 AM, Tim Jones wrote:

> Michael,
>
> Thanks for your reply.
>
> The use case is configuration management.
>
> We have complex (SQLAlchemy) objects with many relationships which
> we wish to up-version, i.e. we want an exact persistent copy
> (attribute values and relationships) of the original object, but
> with a different identity, which we will then modify.
>
> The original object and all its relationships should remain
> unchanged (other than a relationship to its successor version).
>
> We would want the copied object and all its relationships saved to
> the database before making changes to it.

You'd build a copy function of your own which iterates through
attributes and creates new instances. How you go about locating the
attributes to be copied depends on if you want only SQLAlchemy-
instrumented attributes, or other attributes as well which are not
known to SQLAlchemy.

To get a list of all SQLAlchemy attributes for a class, use this
expression:

[p.key for p in class_mapper(class).iterate_properties]

you might want to exclude primary key mappings:

pk_keys = set([c.key for c in class_mapper(class).primary_key])

[p.key for p in class_mapper(class).iterate_properties if p.key not in
pk_keys]

the value for each attribute can be retrieved from an instance using
getattr(instance, key), and you can construct new instances without
calling __init__ by saying class.__new__(class).

maybe someone here can flesh this out for me, im a little busy today....


a...@svilendobrev.com

unread,
Sep 10, 2008, 11:20:02 AM9/10/08
to sqlal...@googlegroups.com

i'll be doing it anyway.. maybe after a day or two.

some more details for above:
- references *toone are the props with p.use_list = False
- manytomany when prop.direction==orm.interfaces.MANYTOMANY
- explicit m2m has to be determined manualy

btw i've run into interesting case today, where some attribute (the
object_identity) needs to be shallow-copied if the copy is to be used
as another version of same object, or not be copied at all if it's to
be used for another object. but i guess this would be postprocessing
as the copy() wouldnot know such things.


ah yes, another related thing we're doing is a ORM-level dump, e.g.
like pickle-the-whole-or-partial-object-storage into lists/dict/sets
etc - for simple object replication. might be interesting to someone.

svil

Tim Jones

unread,
Sep 10, 2008, 12:56:37 PM9/10/08
to sqlal...@googlegroups.com
Michael,

I get the idea thanks.

Before I embarked on this - I wanted to check that there was not a simple way that I had overlooked.

Thanks for your time.

Regards,

Tim

2008/9/10 Michael Bayer <mik...@zzzcomputing.com>

Mark Sternig

unread,
Nov 30, 2015, 5:46:31 PM11/30/15
to sqlalchemy
I am in need of doing something similar (a deep copy of related objects). Has anyone embarked on this adventure yet and mind sharing any tips, code, etc? My use case is to copy a retail Order, and all OrderLineItems, OrderNotes, OrderStatus, etc. Thank you in advance for any help.

Jonathan Vanasco

unread,
Nov 30, 2015, 6:40:34 PM11/30/15
to sqlalchemy

Rudolf Cardinal

unread,
Mar 25, 2016, 6:30:26 PM3/25/16
to sqlalchemy
My method is as follows; for an SQLAlchemy ORM object x, call: y = deepcopy_sqla_object(x, session), with functions defined as below. It walks relationships (as defined by each ORM class; if there is no "relationship" then it won't proceed, which can be helpful), copying objects, and then rebuilds the relationships within the copies. However, it requires that all ORM classes so traversed accept an __init__() call with no parameters [so in practice: def __init__(self, *kwargs): ... self.something = kwargs.pop('something', somedefault)].

from sqlalchemy.inspection import inspect
from sqlalchemy.orm import class_mapper


def copy_sqla_object(obj, omit_fk=True):
    """
    Given an SQLAlchemy object, creates a new object (FOR WHICH THE OBJECT
    MUST SUPPORT CREATION USING __init__() WITH NO PARAMETERS), and copies
    across all attributes, omitting PKs, FKs (by default), and relationship
    attributes.
    """
    cls = type(obj)
    mapper = class_mapper(cls)
    newobj = cls()  # not: cls.__new__(cls)
    pk_keys = set([c.key for c in mapper.primary_key])
    rel_keys = set([c.key for c in mapper.relationships])
    prohibited = pk_keys | rel_keys
    if omit_fk:
        fk_keys = set([c.key for c in mapper.columns if c.foreign_keys])
        prohibited = prohibited | fk_keys
    for k in [p.key for p in mapper.iterate_properties
              if p.key not in prohibited]:
        try:
            value = getattr(obj, k)
            setattr(newobj, k, value)
        except AttributeError:
            pass
    return newobj


def deepcopy_sqla_object(startobj, session, flush=True):
    """
    For this to succeed, the object must take a __init__ call with no
    arguments. (We can't specify the required args/kwargs, since we are copying
    a tree of arbitrary objects.)
    """
    objmap = {}  # keys = old objects, values = new objects
    # Pass 1: iterate through all objects. (Can't guarantee to get
    # relationships correct until we've done this, since we don't know whether
    # or where the "root" of the PK tree is.)
    stack = [startobj]
    while stack:
        oldobj = stack.pop(0)
        if oldobj in objmap:  # already seen
            continue
        newobj = copy_sqla_object(oldobj)
        session.add(newobj)
        objmap[oldobj] = newobj
        insp = inspect(oldobj)
        for relationship in insp.mapper.relationships:
            related = getattr(oldobj, relationship.key)
            if relationship.uselist:
                stack.extend(related)
            elif related is not None:
                stack.append(related)
    # Pass 2: set all relationship properties.
    for oldobj, newobj in objmap.items():
        insp = inspect(oldobj)
        # insp.mapper.relationships is of type
        # sqlalchemy.utils._collections.ImmutableProperties, which is basically
        # a sort of AttrDict.
        for relationship in insp.mapper.relationships:
            # The relationship is an abstract object (so getting the
            # relationship from the old object and from the new, with e.g.
            # newrel = newinsp.mapper.relationships[oldrel.key],
            # yield the same object. All we need from it is the key name.
            related_old = getattr(oldobj, relationship.key)
            if relationship.uselist:
                related_new = [objmap[r] for r in related_old]
            elif related_old is not None:
                related_new = objmap[related_old]
            else:
                related_new = None
            setattr(newobj, relationship.key, related_new)
    # Done
    if flush:
        session.flush()
    return objmap[startobj]  # returns the new object matching startobj

Rudolf Cardinal

unread,
Mar 26, 2016, 3:39:44 PM3/26/16
to sqlalchemy

Whoops - occasionally a session autoflush can break the function above - a better version delays the session.add() calls until the relationships have been built:

from sqlalchemy.inspection import inspect
from sqlalchemy.orm import class_mapper
import logging
log = logging.getLogger(__name__)


def copy_sqla_object(obj, omit_fk=True):
    """
    Given an SQLAlchemy object, creates a new object (FOR WHICH THE OBJECT
    MUST SUPPORT CREATION USING __init__() WITH NO PARAMETERS), and copies
    across all attributes, omitting PKs, FKs (by default), and relationship
    attributes.
    """
    cls = type(obj)
    mapper = class_mapper(cls)
    newobj = cls()  # not: cls.__new__(cls)
    pk_keys = set([c.key for c in mapper.primary_key])
    rel_keys = set([c.key for c in mapper.relationships])
    prohibited = pk_keys | rel_keys
    if omit_fk:
        fk_keys = set([c.key for c in mapper.columns if c.foreign_keys])
        prohibited = prohibited | fk_keys
    log.debug("copy_sqla_object: skipping: {}".format(prohibited))
    for k in [p.key for p in mapper.iterate_properties
              if p.key not in prohibited]:
        try:
            value = getattr(obj, k)
            log.debug("copy_sqla_object: processing attribute {} = {}".format(
                k, value))
            setattr(newobj, k, value)
        except AttributeError:
            log.debug("copy_sqla_object: failed attribute {}".format(k))
            pass
    return newobj


def deepcopy_sqla_object(startobj, session, flush=True):
    """
    For this to succeed, the object must take a __init__ call with no
    arguments. (We can't specify the required args/kwargs, since we are copying
    a tree of arbitrary objects.)
    """
    objmap = {}  # keys = old objects, values = new objects
    log.debug("deepcopy_sqla_object: pass 1: create new objects")
    # Pass 1: iterate through all objects. (Can't guarantee to get
    # relationships correct until we've done this, since we don't know whether
    # or where the "root" of the PK tree is.)
    stack = [startobj]
    while stack:
        oldobj = stack.pop(0)
        if oldobj in objmap:  # already seen
            continue
        log.debug("deepcopy_sqla_object: copying {}".format(oldobj))
        newobj = copy_sqla_object(oldobj)
        # Don't insert the new object into the session here; it may trigger
        # an autoflush as the relationships are queried, and the new objects
        # are not ready for insertion yet (as their relationships aren't set).
        # Not also the session.no_autoflush option:
        # "sqlalchemy.exc.OperationalError: (raised as a result of Query-
        # invoked autoflush; consider using a session.no_autoflush block if
        # this flush is occurring prematurely)..."
        objmap[oldobj] = newobj
        insp = inspect(oldobj)
        for relationship in insp.mapper.relationships:
            log.debug("deepcopy_sqla_object: ... relationship: {}".format(
                relationship))
            related = getattr(oldobj, relationship.key)
            if relationship.uselist:
                stack.extend(related)
            elif related is not None:
                stack.append(related)
    # Pass 2: set all relationship properties.
    log.debug("deepcopy_sqla_object: pass 2: set relationships")
    for oldobj, newobj in objmap.items():
        log.debug("deepcopy_sqla_object: newobj: {}".format(newobj))
        insp = inspect(oldobj)
        # insp.mapper.relationships is of type
        # sqlalchemy.utils._collections.ImmutableProperties, which is basically
        # a sort of AttrDict.
        for relationship in insp.mapper.relationships:
            # The relationship is an abstract object (so getting the
            # relationship from the old object and from the new, with e.g.
            # newrel = newinsp.mapper.relationships[oldrel.key],
            # yield the same object. All we need from it is the key name.
            log.debug("deepcopy_sqla_object: ... relationship: {}".format(
                relationship.key))
            related_old = getattr(oldobj, relationship.key)
            if relationship.uselist:
                related_new = [objmap[r] for r in related_old]
            elif related_old is not None:
                related_new = objmap[related_old]
            else:
                related_new = None
            log.debug("deepcopy_sqla_object: ... ... adding: {}".format(
                related_new))
            setattr(newobj, relationship.key, related_new)
    # Now we can do session insert.
    log.debug("deepcopy_sqla_object: pass 3: insert into session")
    for newobj in objmap.values():
        session.add(newobj)
    # Done
    log.debug("deepcopy_sqla_object: done")
Reply all
Reply to author
Forward
0 new messages