Large increase in memory use when upgrading from 1.3 to 1.4

262 views
Skip to first unread message

Tony Cosentini

unread,
Oct 21, 2022, 12:00:50 AM10/21/22
to sqlalchemy
Hi,

We recently upgraded our application (a Flask web app) from SQLAlchemy 1.3.19 to 1.4.41.

Overall things are stable, but we have noticed a very large increase in memory use:
Screen Shot 2022-10-21 at 11.26.18 AM.png

Is this from the new query caching feature? I'm planning on getting some heap dumps to see if there is something obvious, but thought I'd ask here as well.

The application is using the Postgres dialect. Nothing else was changed besides the SQLAlchemy version. It's running in a Docker container with 8 GB of RAM allocated to it.

Anyway, I'll continue digging in more, but just asking in case there is something obvious,
Tony

Mike Bayer

unread,
Oct 21, 2022, 12:20:07 AM10/21/22
to noreply-spamdigest via sqlalchemy


On Fri, Oct 21, 2022, at 12:00 AM, 'Tony Cosentini' via sqlalchemy wrote:
Hi,

We recently upgraded our application (a Flask web app) from SQLAlchemy 1.3.19 to 1.4.41.

Overall things are stable, but we have noticed a very large increase in memory use:
Screen Shot 2022-10-21 at 11.26.18 AM.png

Is this from the new query caching feature? I'm planning on getting some heap dumps to see if there is something obvious, but thought I'd ask here as well.

you would be able to tell if you set query_cache_size=0 which then resolves the memory issue.

The cache itself uses memory, which can cause memory increases.  However we have a slight concern for the case of extremely large and highly nested SQL constructs that might be generating unreasonably large cache keys.  We had one user with this problem some months ago and they were not able to give us details in order to reproduce the problem.     query_cache_size=0 would prevent this problem also, but if you have very nested queries, particularly with CTEs, we'd be curious if you can isolate particular queries that might have that issue.



The application is using the Postgres dialect. Nothing else was changed besides the SQLAlchemy version. It's running in a Docker container with 8 GB of RAM allocated to it.

Anyway, I'll continue digging in more, but just asking in case there is something obvious,
Tony


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.


Tony Cosentini

unread,
Oct 21, 2022, 8:08:32 AM10/21/22
to sqlalchemy
Hi Mike,

Thanks for such a fast reply! We tried setting query_cache_size on a canary environment today, will be rolling it out widely on servers on Monday and can report back on if it has a noticeable impact.

After thinking about this more, I think our situation might exacerbate things a bit, in particular because:
  • We have many engines (about 4 of them are used heavily, but there are like 9 total). Some are for different databases, others have different configurations for a database (for example, one has a more aggressive statement timeout).
  • We're running behind a Gunicorn server which has 17 worker processes. Each of these workers processes will have their own caches.
  • It's a fairly sizable app (at least for the engines that have a lot of throughput) so we might be constantly adding keys to the cache and evicting stale ones (this one is more of a theory though).
I'll report back if we see any changes.

Thanks again for the fast reply (and for building such a useful + well documented library),
Tony

Tony Cosentini

unread,
Nov 3, 2022, 3:11:09 AM11/3/22
to sqlalchemy
Hey, sorry for the crazy delay on this!

We ended up turning off the cache in everything via query_cache_size and memory usage returned to previous levels. We also didn't see any noticeable change in CPU usage in our web servers.

We did see a pretty noticable perf regression in a worker job that is also very query heavy. For that we turned the cache back on and CPU usage ended up being lower than previous levels (kind of as expected given the caching gains).

I think for our web servers, because of the number of processes + engines, we ended up with a very noticeable jump in memory usage. Additionally, I think when we did have aching turned on, we never really noticed any CPU usage improvements. My guess around this is because the cache might have been thrashing a lot - it's a fairly large code base
so it might not have been very effective.

Hope this helps for anyone else that runs into these kinds of issues. Thanks again Mike for the helpful response!

Tony

Mike Bayer

unread,
Nov 3, 2022, 8:34:15 AM11/3/22
to noreply-spamdigest via sqlalchemy


On Thu, Nov 3, 2022, at 3:11 AM, 'Tony Cosentini' via sqlalchemy wrote:
Hey, sorry for the crazy delay on this!

We ended up turning off the cache in everything via query_cache_size and memory usage returned to previous levels. We also didn't see any noticeable change in CPU usage in our web servers.

We did see a pretty noticable perf regression in a worker job that is also very query heavy. For that we turned the cache back on and CPU usage ended up being lower than previous levels (kind of as expected given the caching gains).

I think for our web servers, because of the number of processes + engines, we ended up with a very noticeable jump in memory usage. Additionally, I think when we did have aching turned on, we never really noticed any CPU usage improvements. My guess around this is because the cache might have been thrashing a lot - it's a fairly large code base
so it might not have been very effective.


OK it's too bad because we'd like to know what might be going on, the cache should not really "thrash" unless you have elements that are not being cached properly.    it defaults to 500 which will grow as large as 750.   It's difficult for your application to have 750 individually different SQL statements, all of which are in constant flow, unless you have some areas where there are perhaps very custom query building mechanisms where query structure is highly variable based on user input (like a search page).  you'd get better performance if you could restore the cache on and just locate those specific queries which have too much variability in structure, and just disable the cache for those queries specifically using the compiled_cache execution option (https://docs.sqlalchemy.org/en/14/core/connections.html#disabling-or-using-an-alternate-dictionary-to-cache-some-or-all-statements)




Mike Bayer

unread,
Nov 11, 2022, 11:42:36 AM11/11/22
to noreply-spamdigest via sqlalchemy
We've identified a significant source of memory over-use in the 1.4 caching system, on the particular environment where it was discovered, an extremely long and complex query nonetheless created a cache key that used orders of magnitude more memory than the statement itself.   A fix for this issue will be released in version 1.4.44, however if you have the ability to test ahead of time to see if it resolves your issues, let me know.  I am attempting to improve upon the patch to reduce memory use further.   issue is tracked at https://github.com/sqlalchemy/sqlalchemy/issues/8790.

Mike Bayer

unread,
Nov 12, 2022, 9:17:50 AM11/12/22
to noreply-spamdigest via sqlalchemy
1.4.44 is released with this change.  if you can try it out with your application and let me know if you see improvements in memory use for your memory-intensive case, that would be much appreciated!  thanks

Tony Cosentini

unread,
Nov 14, 2022, 2:43:53 AM11/14/22
to sqlal...@googlegroups.com
Oh wow, this sounds like it would definitely impact us. I'm out this week travelling, but I will definitely upgrade + re-enable the cache and report back.

Thanks!

You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/PLf_-6-2Re8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/279befa1-fce3-4801-b29f-d64bf9cff3b3%40app.fastmail.com.

Tony Cosentini

unread,
Jan 4, 2023, 3:10:13 AM1/4/23
to sqlalchemy
Somehow I never followed up on this one, sorry :(

We upgraded to 1.4.44 and indeed did see a big decrease in memory usage. However, with how we are using gunicorn there's still a noticable jump in memory usage with the cache enabled, however it's not nearly as large as before.

We're still running our web servers with the cache off, but mostly because our performance has remained similar to our 1.3.x performance. I'm hoping to enable it selectively for our super high throughput queries at some point though, but alas, there's always a huge backlog of things to do :).

Thanks again for helping look into this,
Tony
Reply all
Reply to author
Forward
0 new messages