SqlAlchemy memory usage

1,960 views
Skip to first unread message

Nikunj Yadav

unread,
Jan 14, 2017, 3:06:21 AM1/14/17
to sqlalchemy
Hey guys, 
I have a flask app using sqlalchemy.  It is a read intensive app but writes are not frequent.
SQLAlchemy==0.9.0
Going through the documentation I know about various APIs that I can use. But I am interested in knowing, assuming the dumbest setting that I could have done is it possible that sqlalchemy is keeping a lot of references in memory ? 
What is the best practice around this ? 

mike bayer

unread,
Jan 14, 2017, 12:09:11 PM1/14/17
to sqlal...@googlegroups.com
There are a few angles to the "running out of memory" issue.

The most basic one is this: somewhere in your application, just once,
something did a SELECT that results in lots of rows, like, 100 thousand
or maybe even a million. Maybe you only fetched one, or ten, of those
rows, and you thought that this didn't use a lot of memory. However,
all the Python DBAPIs by default will load *all* the rows into memory
returned by that SELECT, before they give SQLAlchemy just the first row
at all.

Once the DBAPI loads in a ton of rows, the Python interpreter has to
expand its memory footprint (e.g. what you'd see in top), and then it
never gets returned. The Python interpreter generally doesn't return
all memory to the OS once it allocates it, this is a complicated issue
that also changes with different Python versions, there's some good
discussion at
http://stackoverflow.com/questions/15455048/releasing-memory-in-python
on this as well as
http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm,
which is an easier read but probably has outdated information in some
respects.

Continuing on that angle, the "Python VM doesn't return all the memory
to the OS" issue is also a problem if your application does at some
point load in many thousands of ORM objects at once, and then throws
them away - ORM objects are a lot more heavyweight than a plain row from
a cursor, so if you at some point handle a large query in memory like
that, you'll blow up the memory used by the interpreter. The ORM Query
object, building on the assumption that the DBAPI generally loaded all
the rows in anyway, also doesn't give you back the first object until it
has processed the entire DBAPI rowset into ORM objects. You can alter
this behavior using the yield_per() API call:
http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per
. You'll note it has a lot of caveats with eager loading.

The best practice for the above issues is simply to never SELECT more
rows than you'd like to have in memory at once. This means if you are
paging through a million rows, use limiting / windowing / criteria to
ensure you only get a few hundred at a time. The naive approach to
this is to use LIMIT/OFFSET, however OFFSET has the issue that the
database has to scan through all the previous rows, limiting
performance. For a better way, you need to have WHERE criteria that
gets just a chunk of rows at a time - the recipe at
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
shows a technique I use often when doing reporting-style applications -
there is also the probably simpler approach detailed at
http://use-the-index-luke.com/no-offset ; make sure you are ordering,
and on each chunk, get a handle to something you can use to get the next
chunk.

All of that said, there's another memory angle too, depending on what
you're getting at. Since you are on 0.9, there was a large pass made at
structural memory use in 1.0 which you might be interested in. This
change basically means lots of internal objects used by table metadata,
mappings, and events were made to use __slots__, to dramatically reduce
the memory footprint of a large set of mappings. This won't save you
from using up a lot of memory if you load a million rows, but if you
have a large application with hundreds of mappings, using 1.0 should
save you a significant chunk of memory when your application first
starts up and loads all of its mappings.

In general, if you want to do memory profiling on your application I'd
recommend looking at heapy http://guppy-pe.sourceforge.net/ . It's
stylistically old school but it does the job. It's what I used to
target the best areas for the __slots__ overhaul in 1.0.















>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Jonathan Vanasco

unread,
Jan 15, 2017, 1:14:19 PM1/15/17
to sqlalchemy


On Saturday, January 14, 2017 at 3:06:21 AM UTC-5, Nikunj Yadav wrote:
But I am interested in knowing, assuming the dumbest setting that I could have done is it possible that sqlalchemy is keeping a lot of references in memory ? 

Perhaps I can think of a dumber mistake than Mike did...

It is possible that your implementation does not properly create and close/expire/dispose/etc Sessions and transactions as it should.  In an *extremely unlikely variant( of this mistake, you might have a single Session that is started and every request uses it to read -- without ever being closed.  In that scenario, the session's object map would keep growing.    Most integration packages (ie flask-sqlalchemy) and tutorials actively guide you from doing this, so the chances of this happening are incredibly slim. 


Some best-practices of managing a session in web applications like Flask are detailed in these two sections of the documentation:

Nikunj Yadav

unread,
Jan 17, 2017, 2:16:39 PM1/17/17
to sqlalchemy
Hey Mike, 
Thanks for the detailed response. I will try looking at my queries and post back soon. Also thanks for the heads up on sqlalchemy update :) 

Nikunj

Nikunj Yadav

unread,
Jan 24, 2017, 7:14:08 PM1/24/17
to sqlalchemy
Thank you guys for your response. 
Upgrading to sqlalchemy >1.0 fixed it 
Reply all
Reply to author
Forward
0 new messages