session memory management

kris

unread,

Dec 3, 2011, 2:31:10 PM12/3/11

to sqlal...@googlegroups.com

I have a database migration using sqlalchemy migrate that needs to migrate several million rows.

After running for several days, I see the job has grown to consume all available memory and

is swapping.

Can someone point me to a description of what is the best way to manage memory in the session

during a long transaction?

i.e.

1. Does session.flush remove instances of the objects in the session

2. Should I break the migration into several subtransactions?

3. use expunge at different times?

Any help appreciated.

Thanks

Michael Bayer

unread,

Dec 3, 2011, 4:40:04 PM12/3/11

to sqlal...@googlegroups.com

You should at least be calling flush() every thousand records or so, and ensure that none of the objects involved in that flush remain strongly referenced afterwards. When an object is "clean", it is only weakly referenced by the Session so will fall out of scope naturally. The Session also maintains a list of objects that came into the current transaction as "new", so that if a rollback occurs, these can be evicted, but that list is also self-cleaning when the newly inserted objects are dereferenced.

Making sure you flush() periodically and don't maintain references in memory should allow you an app that runs forever without memory growth.

The other thing to watch out for is very large selects. Most DBAPIs fully buffer a result set in memory, and the ORM by default does this as well. If you select tens of thousands of rows in one result set, the memory usage of the Python interpreter will grow to fit that space, and then generally never be returned to the OS (CPython behavior). So try to avoid having very large, single result sets - there are ways to get an ORM result to "stream" but there are a bunch of limitations (only certain backends, can't use eager loading, etc.), it's better to avoid it.

kris

unread,

Dec 5, 2011, 1:40:31 AM12/5/11

to sqlal...@googlegroups.com

I have added the flush every 1024 loads, and also set DBSession.autoflush = False, but I see that I have

gone and used up all available memory again. There are no other references than the ones in the session.

A bit more information, I am getting stuck in a routine that adds 'root' pointer to set of records

so that I can use a loading pattern similar to example in elementtrree/optimized_al, but without

the separate document table. Visiting all the nodes in hierarchy and adding the 'root' node.

nodes = Table('nodes'', metadata,

...

Column('parent_id', Integer, ForeignKey('nodes.id')),

Column('root_id', Integer, ForeignKey('nodes.id')),

mapper( Node, nodes,

...

'children' : relation(Node, lazy=True, cascade="all, delete-orphan",

backref = backref('parent', enable_typechecks=False, remote_side = [ nodes.c.id]),

'docnodes': relation(Node, lazy=True,

cascade = "all, delete-orphan",

post_update=True,

primaryjoin = (taggable.c.id == taggable.c.root_id),

backref = backref('document', post_update=True,

enable_typechecks=False, remote_side=[taggable.c.id]),

Could this be making a cycle that the session cannot get rid of?

Michael Bayer

unread,

Dec 5, 2011, 11:36:17 AM12/5/11

to sqlal...@googlegroups.com

mmm not if you're on at least a recent 0.5 version or later, while there certainly can be reference cycles, the Session/collection system is extremely carefully designed and tested like crazy to not produce any uncollectable structures.

Though dealing with a hierarchical structure can certainly make it tough to keep only a part of it in memory at a time. You'd probably want to make sure you don't ever load any "children" collection since that ultimately refers to the whole list of subnodes, recursively down through the whole thing. It might be worth it to see what happens if you change those relationships to 'lazy="dynamic"', but overall you'd want to be aware of the structure you have in memory and how much of it is loaded. Sometimes I will pdb through obj.__dict__ to see what's present without tripping off any lazyloaders.

Reply all

Reply to author

Forward