Question about Pyramid + ZODB + RelStorage performances

22 views
Skip to first unread message

Thierry Florac

unread,
Jun 6, 2023, 3:17:33 AM6/6/23
to pylons-...@googlegroups.com
Hi!
I'm working on a Pyramid application using ZODB and RelStorage to store Python objects into a PostgreSQL database.
This application is a document management application using many file blobs which are stored on the filesystem (not in PostgreSQL), and a catalog is used to index documents metadatas. The application is running in an Apache process using mod_wsgi.
My question is that we have another application which is used to transfer documents from another repository to this one; this application is only using one process and one thread to avoid ZODB conflicts on catalog updates; when the application is started, we handle approximately 120 transactions/minute and everything is OK but after 50 to 60 minutes, transactions rate is highly decreasing down to 30 tpm!
While looking at server processes, it seems that most activity is related to PostgreSQL; I tried to disable autovacuum but it doesn't change anything! If I reload Apache service, the transaction rate is restarting at 120 tpm before decreasing after another hour of high activity...
So, does anyone have any idea about the origin of this increase in PostgreSQL activity and performance decrease?

Best regards,
Thierry

Julien Cigar

unread,
Jun 6, 2023, 3:25:05 AM6/6/23
to pylons-...@googlegroups.com

On Tue, Jun 06, 2023 at 09:17:18AM +0200, Thierry Florac wrote:
> Hi!

Hello,
Hard to say, is it I/O bound? CPU bound? Did you check pg_stat_activity?
Do you log slow queries? What's your server / postgresql configuration?

> Best regards,
> Thierry
> --
> https://www.ulthar.net -- http://pyams.readthedocs.io
>
> --
> You received this message because you are subscribed to the Google Groups "pylons-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pylons-discus...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pylons-discuss/CAPX_VWCNmjC6xvkRT8RnfHENEnP9KW2S3F5Eqc0BCt5M6y4ouA%40mail.gmail.com.

--
Julien Cigar
Belgian Biodiversity Platform (http://www.biodiversity.be)
PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.
signature.asc

Thierry Florac

unread,
Jun 6, 2023, 3:55:58 AM6/6/23
to jul...@perdition.city, pylons-...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Julien,
The server is an SSD-based Hyper-V virtual machine running Debian GNU/Linux
with Python 3.9.2 and PostgreSQL 13.
We have 32 GB of RAM and only 4 are used so there is no swapping and there
is no I/O wait while looking at "top".
There is no other application running on this server, and no other
connected users while loading documents.
I tried to look at pg_stat_activity but didn't found any useful
information...

Thierry

- --
- --
https://groups.google.com/d/msgid/pylons-discuss/gu7jtudne4gcfbfg2tnoyat2lgjawj46yfaqejw3v3cx3eyieg%40kizwmzrppuox.
-----BEGIN PGP SIGNATURE-----
Version: FlowCrypt Email Encryption 8.4.7
Comment: Seamlessly send and receive encrypted email

wsBzBAEBCAAnBYJkfuaICZAeEMoZ8dg1JhYhBFGZiY9tHFonIsnSNh4Qyhnx
2DUmAABM4Af/XpkJXxLXwJxcXH+AWSC/4iXu64JRqOPQ5zcezr2ycd7YI9tk
J557Y9cxbnK1v4S90FqLS0Z1lgwAmtzEiGOQ/f9ulW2Suimiw1hieAR+tvkV
0X4pooziZwi6vavu+ObnK+rOVYC8mPkowdn4wbn5R7ROZF5Mle3PkrtdSUc3
HHh1jOiwU3r5hMfIHLB6cEUUqnLAD0wZbZUaWm1eo6urdG+N6AAfue8obQdM
neyBUfhHCPfYxcUMwaggsGH/AmP7iLofFQJkjWhXwgNKbsg5mZj1OTVDdaMZ
XHHpCFkBfQrZaG5WwuK838qCuMTc8irqJXxYF2IFZ+iIFhMS3oLeIQ==
=u64S
-----END PGP SIGNATURE-----
0x1E10CA19F1D83526.asc

Jonathan Vanasco

unread,
Jun 6, 2023, 1:30:24 PM6/6/23
to pylons-discuss
> So, does anyone have any idea about the origin of this increase in PostgreSQL activity and performance decrease?

Since pg_stat_activity is not showing anything, my typical troubleshooting is this:

1. Have your app log the exact queries, along with the time it takes to run.  You can do this yourself, or with the debugtoolbar.   I wrote a few debugtoolbar extensions that can help; they not only track these things, but expose the data as .csv files via an API. They may not work with the current version of pyramid.  My production environment is a bit behind.

2. Wait for things to get jammed again.

3. When things get jammed, run the slow queries as an EXPLAIN in a pg console. ** make sure you do this while the app is jammed **.  e.g.   "EXPLAIN {QUERY}" 

4. Compare this EXPLAIN strategy to running the same query on a freshly restarted postgresql, and again on a freshly started app.

5. Also look at the history of where things slowed down, and see if it is happening after a "very different" query

In my experience, this stuff often happens because of memory and index loading, and can be fixed by either creating partial indexes on the database and hinting the queries to use those indexes OR increasing the memory allocation.

The most typical causes I've seen is the app runs really fast while a certain query/index is being used (query A), however running a "different" query (query B) will unload all of the indexes/data for Query A from memory as postgres needs to load the data for Query B. This can often be fixed by trying to minimize the footprint of indexes used for each query, and making sure both queries use indexes.

I've also had situations where many "test indexes" were still on a table, and writing 1 row of data was really writing 30+ rows instead of 6 rows.  (The first row is the data, the additional ones are the indexes).  There have also been situations where the rows have triggers and check constraints that involve other rows, however those triggers and check constraints did not utilize indexes, so they start scanning the entire table for that data. Doing this can also unload indexes used by the main query.  This sort of stuff gets buried from normal view, and people forget to check for it.

Reply all
Reply to author
Forward
0 new messages