There are a few angles to the "running out of memory" issue.
The most basic one is this: somewhere in your application, just once,
something did a SELECT that results in lots of rows, like, 100 thousand
or maybe even a million. Maybe you only fetched one, or ten, of those
rows, and you thought that this didn't use a lot of memory. However,
all the Python DBAPIs by default will load *all* the rows into memory
returned by that SELECT, before they give SQLAlchemy just the first row
at all.
Once the DBAPI loads in a ton of rows, the Python interpreter has to
expand its memory footprint (e.g. what you'd see in top), and then it
never gets returned. The Python interpreter generally doesn't return
all memory to the OS once it allocates it, this is a complicated issue
that also changes with different Python versions, there's some good
discussion at
http://stackoverflow.com/questions/15455048/releasing-memory-in-python
on this as well as
http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm,
which is an easier read but probably has outdated information in some
respects.
Continuing on that angle, the "Python VM doesn't return all the memory
to the OS" issue is also a problem if your application does at some
point load in many thousands of ORM objects at once, and then throws
them away - ORM objects are a lot more heavyweight than a plain row from
a cursor, so if you at some point handle a large query in memory like
that, you'll blow up the memory used by the interpreter. The ORM Query
object, building on the assumption that the DBAPI generally loaded all
the rows in anyway, also doesn't give you back the first object until it
has processed the entire DBAPI rowset into ORM objects. You can alter
this behavior using the yield_per() API call:
http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per
. You'll note it has a lot of caveats with eager loading.
The best practice for the above issues is simply to never SELECT more
rows than you'd like to have in memory at once. This means if you are
paging through a million rows, use limiting / windowing / criteria to
ensure you only get a few hundred at a time. The naive approach to
this is to use LIMIT/OFFSET, however OFFSET has the issue that the
database has to scan through all the previous rows, limiting
performance. For a better way, you need to have WHERE criteria that
gets just a chunk of rows at a time - the recipe at
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
shows a technique I use often when doing reporting-style applications -
there is also the probably simpler approach detailed at
http://use-the-index-luke.com/no-offset ; make sure you are ordering,
and on each chunk, get a handle to something you can use to get the next
chunk.
All of that said, there's another memory angle too, depending on what
you're getting at. Since you are on 0.9, there was a large pass made at
structural memory use in 1.0 which you might be interested in. This
change basically means lots of internal objects used by table metadata,
mappings, and events were made to use __slots__, to dramatically reduce
the memory footprint of a large set of mappings. This won't save you
from using up a lot of memory if you load a million rows, but if you
have a large application with hundreds of mappings, using 1.0 should
save you a significant chunk of memory when your application first
starts up and loads all of its mappings.
In general, if you want to do memory profiling on your application I'd
recommend looking at heapy
http://guppy-pe.sourceforge.net/ . It's
stylistically old school but it does the job. It's what I used to
target the best areas for the __slots__ overhaul in 1.0.
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
>
http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See
http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
sqlalchemy+...@googlegroups.com
> <mailto:
sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to
sqlal...@googlegroups.com
> <mailto:
sqlal...@googlegroups.com>.
> Visit this group at
https://groups.google.com/group/sqlalchemy.
> For more options, visit
https://groups.google.com/d/optout.