SQLAlchemy Memory

137 views
Skip to first unread message

Anoop K

unread,
Jan 13, 2013, 12:40:14 AM1/13/13
to sqlal...@googlegroups.com
Hi,

related to session/engine cleanup.

After fixing the engine cleanup. I could see that engine and its associated objects are GCed properly. However 
process memory is gradually increasing. I am creating and destroying session/engine every 30 secs.

SQLAlchemy usage:

Every 30 sec following happens.

engine = create_engine(...)
sn = Session(bind=engine)...
try:
    sn.merge
    sn.commit
finally:
    sn.close()
    sn.bind.dispose()
>>> sn and engine don't have any reference after this point ....


GC objects after some runs are listed below, Looks like some items are increasing in memory(in BOLD).
Is this some kind of leak ?

<class 'sqlalchemy.util._collections.OrderedDict'>: last:176 curr:198 diff:22
<type 'builtin_function_or_method'>: last:900 curr:914 diff:14
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2'>: last:31 curr:38 diff:7
 <class 'sqlalchemy.types.NullType'>: last:32 curr:39 diff:7
 <type 'instancemethod'>: last:605 curr:627 diff:22
<class 'sqlalchemy.types.Text'>: last:177 curr:254 diff:77
 <class 'sqlalchemy.dialects.postgresql.psycopg2._PGArray'>: last:120 curr:197 diff:77
<class 'sqlalchemy.sql.expression._BindParamClause'>: last:658 curr:954 diff:296
 <class 'sqlalchemy.util._collections.PopulateDict'>: last:60 curr:82 diff:22
<type 'list'>: last:3308 curr:3462 diff:154
 <type 'instance'>: last:139 curr:146 diff:7
 <class 'sqlalchemy.dialects.postgresql.base.PGTypeCompiler'>: last:31 curr:38 diff:7
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGIdentifierPreparer_psycopg2'>: last:31 curr:38 diff:7
<type 'dict'>: last:10930 curr:12172 diff:1242
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGCompiler_psycopg2'>: last:60 curr:82 diff:22
<class 'sqlalchemy.types.String'>: last:571 curr:753 diff:182
 <class 'sqlalchemy.types.DateTime'>: last:92 curr:113 diff:21
<type 'weakref'>: last:2204 curr:2582 diff:378
 <type 'tuple'>: last:29850 curr:30231 diff:381
<type 'cell'>: last:747 curr:978 diff:231
<class 'sqlalchemy.types.Integer'>: last:52 curr:59 diff:7
<type 'function'>: last:8077 curr:8392 diff:315

Thanks
Anoop


Michael Bayer

unread,
Jan 13, 2013, 1:15:11 AM1/13/13
to sqlal...@googlegroups.com
On Jan 13, 2013, at 12:40 AM, Anoop K wrote:

Hi,

related to session/engine cleanup.

After fixing the engine cleanup. I could see that engine and its associated objects are GCed properly. However 
process memory is gradually increasing. I am creating and destroying session/engine every 30 secs.

SQLAlchemy usage:

Every 30 sec following happens.

engine = create_engine(...)
sn = Session(bind=engine)...
try:
    sn.merge
    sn.commit
finally:
    sn.close()
    sn.bind.dispose()
>>> sn and engine don't have any reference after this point ....


GC objects after some runs are listed below, Looks like some items are increasing in memory(in BOLD).
Is this some kind of leak ?

First off, there's no way to know without a real, reproducing test case.   Providing this would give us something tangible to examine.

As the pattern of creating and destroying engines many times is not the primary use case for the Engine, it's hard to say definitively, though we do have memory tests which test the pattern of engines being created and disposed many times, at the same time testing the length of gc.get_objects() to confirm no growth in memory.     In your example below, the objects in question are mostly type objects.  These objects do have a behavior in that as created, they will cache information about themselves within a dialect that is local to an Engine.   But as each Engine is disposed, so is the Dialect, and so is the cache.   The cache itself is also a WeakKeyDictionary, so that no strong reference to the type object is created.  So the cache does not prevent the type objects from being garbage collected.   There are tests which ensure this behavior as well.

In your counts below, there are 31 Dialect instances in one run and then 38 in another.  So this suggests that there are 31 and then 38 Engine instances, a growth of 7, and that the Engine references are not in fact being cleaned out, or gc.collect() is not being called within the test.   Depending on what you're doing with SQL expressions within each engine it is plausible that a few hundred type objects have accumulated in the cache.

Note that it is absolutely necessary to call gc.collect() in any Python program that is attempting to get an accurate count of how many objects are still strongly referenced, as unreachable cycles can remain for a significant period of time if one waits for cyclic GC to happen automatically.




<class 'sqlalchemy.util._collections.OrderedDict'>: last:176 curr:198 diff:22
<type 'builtin_function_or_method'>: last:900 curr:914 diff:14
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2'>: last:31 curr:38 diff:7
 <class 'sqlalchemy.types.NullType'>: last:32 curr:39 diff:7
 <type 'instancemethod'>: last:605 curr:627 diff:22
<class 'sqlalchemy.types.Text'>: last:177 curr:254 diff:77
 <class 'sqlalchemy.dialects.postgresql.psycopg2._PGArray'>: last:120 curr:197 diff:77
<class 'sqlalchemy.sql.expression._BindParamClause'>: last:658 curr:954 diff:296
 <class 'sqlalchemy.util._collections.PopulateDict'>: last:60 curr:82 diff:22
<type 'list'>: last:3308 curr:3462 diff:154
 <type 'instance'>: last:139 curr:146 diff:7
 <class 'sqlalchemy.dialects.postgresql.base.PGTypeCompiler'>: last:31 curr:38 diff:7
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGIdentifierPreparer_psycopg2'>: last:31 curr:38 diff:7
<type 'dict'>: last:10930 curr:12172 diff:1242
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGCompiler_psycopg2'>: last:60 curr:82 diff:22
<class 'sqlalchemy.types.String'>: last:571 curr:753 diff:182
 <class 'sqlalchemy.types.DateTime'>: last:92 curr:113 diff:21
<type 'weakref'>: last:2204 curr:2582 diff:378
 <type 'tuple'>: last:29850 curr:30231 diff:381
<type 'cell'>: last:747 curr:978 diff:231
<class 'sqlalchemy.types.Integer'>: last:52 curr:59 diff:7
<type 'function'>: last:8077 curr:8392 diff:315

Thanks
Anoop



--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/gESaFAd9DHkJ.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Anoop K

unread,
Jan 13, 2013, 1:51:10 AM1/13/13
to sqlal...@googlegroups.com
gc.collect() was not called in the last sample. Here is one where gc.collect() is also called


<class 'sqlalchemy.util._collections.OrderedDict'>: last:131 curr:134 diff:3
<type 'builtin_function_or_method'>: last:862 curr:868 diff:6
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2'>: last:18 curr:21 diff:3
<class 'sqlalchemy.types.NullType'>: last:19 curr:22 diff:3
<type 'instancemethod'>: last:561 curr:564 diff:3
<class 'sqlalchemy.types.Text'>: last:234 curr:264 diff:30
<class 'sqlalchemy.dialects.postgresql.psycopg2._PGArray'>: last:180 curr:210 diff:30
 <class 'sqlalchemy.sql.expression._BindParamClause'>: last:585 curr:678 diff:93
 <class 'sqlalchemy.util._collections.PopulateDict'>: last:18 curr:21 diff:3
 <type 'list'>: last:2962 curr:2983 diff:21
 <type 'instance'>: last:197 curr:200 diff:3
 <class 'sqlalchemy.dialects.postgresql.base.PGTypeCompiler'>: last:18 curr:21 diff:3
 <class 'sqlalchemy.dialects.postgresql.psycopg2.PGIdentifierPreparer_psycopg2'>: last:18 curr:21 diff:3
 <type 'dict'>: last:10470 curr:10854 diff:384
 <class 'sqlalchemy.dialects.postgresql.psycopg2.PGCompiler_psycopg2'>: last:18 curr:21 diff:3
 <class 'sqlalchemy.types.String'>: last:493 curr:550 diff:57
 <class 'sqlalchemy.types.DateTime'>: last:53 curr:56 diff:3
 <type 'weakref'>: last:2165 curr:2297 diff:132
 <type 'tuple'>: last:29338 curr:29473 diff:135
 <type 'cell'>: last:933 curr:1023 diff:90
 <class 'sqlalchemy.types.Integer'>: last:57 curr:63 diff:6
 <type 'function'>: last:8139 curr:8262 diff:123

Anoop K

unread,
Jan 13, 2013, 11:24:32 AM1/13/13
to sqlal...@googlegroups.com
I tried memory tests by reusing engine and memory is steady. I feel some memory is held up 
even after the release of engine.

Michael Bayer

unread,
Jan 13, 2013, 5:20:35 PM1/13/13
to sqlal...@googlegroups.com
I think if you truly want to measure unlimited memory growth, you need to watch the total memory of the process, under load, over time.    It will grow to a certain degree, then should stay constant.   In this case, there is more caching going on that will make it seem like memory is growing in the short term and it also describes why the Dialect and some expression related constructs are being held around for a bit.   The Dialect is a stateful object that accumulates information about a particular database URL, and is also used to cache information aobut various dialect-specific constructs like SQL statements and types.    The "many engine" use case might someday be helped by gaining the ability to share the same Dialect among many engines that are connecting to identically-behaving databases since its really the Dialect that has a larger memory footprint.

The attached test case illustrates the size management of another cache called mapper._compiled_cache which keys compiled forms of SQL statements to Dialects.   This cache is an LRU cache that manages its size around 150 items.    If you run the script, it prints these sizes which you'll see  grows to about 150 and then gets chopped down by 50.   The test also illustrates the size of gc.get_objects() which grows over the first few dozen iterations, but then levels out into a stepwise pattern:

sample gc sizes: [16939, 16997, 17045, 17093, 17141, 17189, 17237, 17285, 17333, 17381, 17429, 17477, 17525, 17573, 17621, 17669, 17717, 17765, 17813, 17861, 17909, 17957, 18005, 18053, 18101, 18149, 18197, 18245, 18293, 18341, 18389, 18437, 18485, 18533, 18581, 18629, 18677, 18725, 18773, 18821, 18869, 18917, 18965, 19013, 19061, 19109, 19157, 19205, 19253, 19301, 19349, 19397, 19445, 19493, 19541, 19589, 19637, 19685, 19733, 19781, 19829, 19877, 19925, 19973, 20021, 20069, 20117, 20165, 20213, 20261, 20309, 20357, 20405, 20453, 20501, 20549, 20597, 20645, 20693, 20741, 20789, 20837, 20885, 20933, 20981, 21029, 21077, 21125, 21173, 21221, 21269, 21317, 21365, 21413, 21461, 21509, 21557, 21605, 21653, 21701, 21749, 21797, 21845, 21893, 21941, 21989, 22037, 22085, 22133, 22181, 22229, 22277, 22325, 22373, 22421, 22469, 22517, 22565, 22613, 22661, 22709, 22757, 22805, 22853, 22901, 22949, 22997, 23045, 23093, 23141, 23189, 23237, 23285, 23333, 23381, 23429, 23477, 23525, 23573, 23621, 23669, 23717, 23765, 23813, 23861, 23909, 23957, 24005, 24053, 21683, 21731, 21779, 21827, 21875, 21923, 21971, 22019, 22067, 22115, 22163, 22211, 22259, 22307, 22355, 22403, 22451, 22499, 22547, 22595, 22643, 22691, 22739, 22787, 22835, 22883, 22931, 22979, 23027, 23075, 23123, 23171, 23219, 23267, 23315, 23363, 23411, 23459, 23507, 23555, 23603, 23651, 23699, 23747, 23795, 23843, 23891, 23939, 23987, 24035, 24083, 21683, 21731, 21779, 21827, 21875, 21923, 21971, 22019, 22067, 22115, 22163, 22211, 22259, 22307, 22355, 22403, 22451, 22499, 22547, 22595, 22643, 22691, 22739, 22787, 22835, 22883, 22931, 22979, 23027, 23075, 23123, 23171, 23219, 23267, 23315, 23363, 23411, 23459, 23507, 23555, 23603, 23651, 23699, 23747, 23795, 23843, 23891, 23939, 23987, 24035]

the number of objects gets to around 24000 then bounces down to 21000 again twice and continues to do so, as the LRU cache grows then reduces its size (the numbers are slightly different in 0.7 but same idea).  So this is not a leak.

If you have other cases to test you can send them back here by modifying the attached script to illustrate your scenario.

test_memory.py

Anoop K

unread,
Jan 14, 2013, 12:17:51 AM1/14/13
to sqlal...@googlegroups.com
I tried the test script using merge/add. I am also seeing similar behavior where object count is coming down after 150.

However I could see that total process RSS increases as shown below ?  When I try with a single engine process RSS always remains constant once it reaches a certain limit .

iter count        process RSS
----------------------------------
2000           - 21884
3000           - 21980

I did try in long running setups where a session/engine is created and destroyed every 30 sec and observed an increase of ~35 MB after 12 hours(No explicit call to gc.collect). 
Could the caching result in this high numbers ??

Thanks
Anoop

Michael Bayer

unread,
Jan 14, 2013, 9:34:03 AM1/14/13
to sqlal...@googlegroups.com
I'm not seeing that.  Script attached again, it runs in a fixed loop.   Memory under top is not moving.

So need to know:

- exact SQLAlchemy version
- exact DBAPI version
- C extensions in use ?   there have been leaks fixed as recently as 0.7.9 in the C extensions
- modify this script so that it shows memory growth ?

test_memory.py

Anoop K

unread,
Jan 14, 2013, 10:25:33 AM1/14/13
to sqlal...@googlegroups.com
I did some long running tests by increasing the rate of engine creation/deletion (total 2 Lakh). Memory increased and then it remained constant.

With the attached script top did show constant memory in MB.  However 'ps -o rss <PID>'  indicated the increase in KB. But again I could see memory settling down. 

So looks like there are no leaks.

Package versions.
  • SQLAlchemy-0.7.8-0.x86_64
  • psycopg2-2.4.5-0.x86_64
  • python-psycopg2-2.0.14-1.el6_2.1.x86_64

Do I need to upgrade any of the packages ? Are there any other memory leak fixes in newer versions (like the 0.7.9 C extension case mentioned) ?

Thanks for the excellent support.
Anoop

Michael Bayer

unread,
Jan 14, 2013, 12:44:47 PM1/14/13
to sqlal...@googlegroups.com
On Jan 14, 2013, at 10:25 AM, Anoop K wrote:

I did some long running tests by increasing the rate of engine creation/deletion (total 2 Lakh). Memory increased and then it remained constant.

With the attached script top did show constant memory in MB.  However 'ps -o rss <PID>'  indicated the increase in KB. But again I could see memory settling down. 

So looks like there are no leaks.

Package versions.
  • SQLAlchemy-0.7.8-0.x86_64
  • psycopg2-2.4.5-0.x86_64
  • python-psycopg2-2.0.14-1.el6_2.1.x86_64

Do I need to upgrade any of the packages ? Are there any other memory leak fixes in newer versions (like the 0.7.9 C extension case mentioned) ?

Thanks for the excellent support.

the only leak fix not released yet is that one you found with sessionmaker().    The C extension leak affected only pyodbc and was fixed in 0.7.8.


Reply all
Reply to author
Forward
0 new messages