Memory management in sqlalchemy

41 views
Skip to first unread message

Adrien Blin

unread,
Mar 4, 2020, 12:01:15 PM3/4/20
to sqlalchemy
Hello,

I'm having troubles understanding how to deal with memory management with sqlalchemy.

Here is my issue. I have a big script requesting data from a PostgreSQL DB. I process this data, and insert the generated data in my DB.

On the beggining of the process, I request an object from my DB :

Line #    Mem usage    Increment   Line Contents
================================================
223 144.2 MiB 144.2 MiB @profile(stream=fp)
224 def compute():
226 155.4 MiB 11.2 MiB obj= session.query(Table).filter(Table.idTable == some_id).scalar()

As you can see, this object takes roughly 11Mb in my RAM. At the end of my script, I try to free the memory from this object, but nothing seems to work (other requests are done using the session.query() or engine.execute(), the issue is the same for them. This example is the most significant).

589    161.4 MiB      0.0 MiB               session.flush(obj)
590 161.4 MiB 0.0 MiB session.expunge_all()
591 161.4 MiB 0.0 MiB session.close()
592 161.4 MiB 0.0 MiB s.close_all_sessions()
593 161.4 MiB 0.0 MiB obj= None
594 161.4 MiB 0.0 MiB gc.collect()


I call this function several thousand times in the process, which makes the RAM increase infinitely until it breaks.
I also tried expiring the object through the session, deleting its reference and garbage collecting it.

How could I request this object, extract the information I need from it, and delete it from my memory, so that my process won't run out of memory ?

Mike Bayer

unread,
Mar 4, 2020, 1:13:37 PM3/4/20
to noreply-spamdigest via sqlalchemy
how are you measuring memory ?   is that a  Python memory profiler?  note that the Python interpreter on the outside is not very consistent about returning memory to the OS.    if this is within Python then you'd just have a garbage collection issue and you'd want to look at GC debugging.  as always, the process is to create a script that looks similar to what you have, show that it does not have the memory issue, then incrementally add elements to it from your memory-leaking script until it does.    a self-contained script which illustrates the issue can be shared here.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Adrien Blin

unread,
Mar 5, 2020, 5:17:55 AM3/5/20
to sqlalchemy
I am using memory_profiler (https://pypi.org/project/memory-profiler/) to measure the memory usage.

I reduced the code to just the request to see ifthe issue only comes from here :


import gc

import sqlalchemy.orm.session as s
from MyDatabase.model.Table import Table
from memory_profiler import profile
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

HOST = 'myhost'

engine = create_engine(
'postgresql+psycopg2://postgres:myaccount@{}:5432'.format(HOST),
connect_args={'connect_timeout': 6000})

Session = sessionmaker(bind=engine)
session = Session() # type: Session

fp = open('memory_profiler_elem.log', 'w+')
@profile(stream=fp)
def calculate():
obj= session.query(Table).filter(Table.idTable == 1).scalar()
session.flush(obj)
session.expunge_all()
session.close()
s.close_all_sessions()
obj= None
gc.collect()

if __name__ == '__main__':
calculate()

Which returns in memory usage :

Line #    Mem usage    Increment   Line Contents
================================================
    19    140.8 MiB    140.8 MiB   @profile(stream=fp)
20 def calculate():
21 154.2 MiB 13.4 MiB obj= session.query(Table).filter(Table.idTable== 1).scalar()
22 154.2 MiB 0.0 MiB session.flush(obj)
23 154.2 MiB 0.0 MiB session.expunge_all()
24 154.2 MiB 0.0 MiB session.close()
25 154.2 MiB 0.0 MiB s.close_all_sessions()
26 154.2 MiB 0.0 MiB obj= None
27 154.2 MiB 0.0 MiB gc.collect()

For example, when I do a simple loop,the memory_profiler for this code :

from memory_profiler import profile

fp = open('memory_profiler_elem.log', 'w+')
@profile(stream=fp)
def calculate():
test = []
for i in range(1000000):
test.append(1)
test = None

if __name__ == '__main__':
calculate()

returns :

Line #    Mem usage    Increment   Line Contents
================================================
     4     51.2 MiB     51.2 MiB   @profile(stream=fp)
5 def calculate():
6 51.2 MiB 0.0 MiB test = []
7 58.9 MiB 0.0 MiB for i in range(1000000):
8 58.9 MiB 0.8 MiB test.append(1)
9 51.3 MiB 0.0 MiB test = None

So we see that the lib is correctly tracking memory usage.

I'll try to recreate the issue for you to test it, but my question really is : Is there something I didn't get about memory usage in sqlalchemy, meaning,  is it normal for it to retain objects in memory, and if so, how can we get past it ?

Mike Bayer

unread,
Mar 5, 2020, 9:26:11 AM3/5/20
to noreply-spamdigest via sqlalchemy


On Thu, Mar 5, 2020, at 5:17 AM, Adrien Blin wrote:


I'll try to recreate the issue for you to test it, but my question really is : Is there something I didn't get about memory usage in sqlalchemy, meaning,  is it normal for it to retain objects in memory, and if so, how can we get past it ?

no, it's not normal.   The ORM Session tracks your objects using Python weak references so that if your own application loses references to those objects, the Session will also.     If the objects have pending changes on them, meaning you altered the state of one or more of the mapped attributes, the Session will temporarily link to that object strongly, meaning it won't be immediately garbage collectable, until the next flush proceeds.   

So if you have turned off autoflush, are reading a large number of objects, modifying them all such that they have pending, flushable changes on them, and then never flushing or committing or otherwise closing / deleting that Session object, that will grow memory.  otherwise no.

Additionally, if you run the Python interpreter without garbage collection running, that will grow memory also as the gc collector is necessary in order to collect unreachable cycles.



--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Adrien Blin

unread,
Mar 5, 2020, 10:17:47 AM3/5/20
to sqlalchemy
I tried strong referencing the objects stored in the session using :

def strong_reference_session(session):
@event.listens_for(session, "pending_to_persistent")
@event.listens_for(session, "deleted_to_persistent")
@event.listens_for(session, "detached_to_persistent")
@event.listens_for(session, "loaded_as_persistent")
def strong_ref_object(sess, instance):
if 'refs' not in sess.info:
sess.info['refs'] = refs = set()
else:
refs = sess.info['refs']

refs.add(instance)
 
as specified in the docs. Still, I have no way to get control over my object, and in the example I sent you I stay in the same scope, so session shouldn't lose reference to the object anyhow.

Moreover, in this example, I am just requesting the object, I don't make any modifications on it, yet I can't flush, expire or expunge it, and finally, the session.close() don't free the memory.

I don't get why in the simple example I sent you I can't flush my object. If i was to loop over this function, my memory would grow infinitely, and that's precisely what's happening when my script is running.

Mike Bayer

unread,
Mar 5, 2020, 10:27:45 AM3/5/20
to noreply-spamdigest via sqlalchemy


On Thu, Mar 5, 2020, at 10:17 AM, Adrien Blin wrote:
I tried strong referencing the objects stored in the session using :

def strong_reference_session(session):
@event.listens_for(session, "pending_to_persistent")
@event.listens_for(session, "deleted_to_persistent")
@event.listens_for(session, "detached_to_persistent")
@event.listens_for(session, "loaded_as_persistent")
def strong_ref_object(sess, instance):
if 'refs' not in sess.info:
sess.info['refs'] = refs = set()
else:
refs = sess.info['refs']

refs.add(instance)
 
as specified in the docs.

I'm sorry, what document is this?    Are you referring just to how to use the events?   Why are you trying to create strong references to objects if the goal here is to reduce memory use ?








Still, I have no way to get control over my object, and in the example I sent you I stay in the same scope, so session shouldn't lose reference to the object anyhow.

Sorry, still not following here.   I thought a memory leak was the issue.



Moreover, in this example, I am just requesting the object, I don't make any modifications on it, yet I can't flush, expire or expunge it, and finally, the session.close() don't free the memory.
I don't get why in the simple example I sent you I can't flush my object. If i was to loop over this function, my memory would grow infinitely, and that's precisely what's happening when my script is running.

Why can't you flush it?  Is there some error message ?  I am not following any of what you are trying to do.




--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages