Hoarded RAM

14 views
Skip to first unread message

Massimiliano della Rovere

unread,
Jul 2, 2020, 12:22:13 PM7/2/20
to sqlal...@googlegroups.com
Greetings,
I am using
* SQLAlchemy: 1.3.3
* psycopg2-binary: 2.8.3
* postgresql: 9.6

While running a SQLAlchemy CORE script that makes lots of updates (about 300k), some inserts (about 50k) and commits data only at the end, the script consumes about 8GB+ RAM (well... we have to stop it to avoid the machine crashing).

I managed to replicate the problem with the simpler script at the end of this email.

I'm just asking if SQLAlchemy somehow takes "note" of uncommitted data, or if in your opinion this smells like a psycopg2 leak/"feature" problem.

Regarding the script below, run it in a console and run top/htop in another console, you'll see the script uses more and more ram as time passes by.
In PostgreSQL's logs I see the initial insert and all the following updates, so the data is correctly received by the database, giving no reason to the SA/Psycopg2 to keep track of uncommitted data, and thus consume RAM.

------------------------------------

# encoding: utf-8

from __future__ import print_function

from importlib import import_module
from random import randint
from sys import getsizeof

from sqlalchemy.engine import create_engine
from sqlalchemy.schema import CreateTable, DropTable
from sqlalchemy.sql.expression import func, insert, update
from sqlalchemy.sql.schema import Column, Table
from sqlalchemy.types import Boolean, Date, Integer, String


def main():
    engine = create_engine(
        "postgres://user:pass...@127.0.0.1:5432/db")
    metadata = MetaData(bind=engine)

    table_object = Table(
        "prova_commit",
        metadata,
        Column("id", String(50), primary_key=True),
        Column("data", Date, nullable=False, default=func.now()),
        Column("opzionale", Boolean, nullable=False, default=False),
        *tuple(Column("h{}".format(i), Integer) for i in xrange(24)))

    engine.execute(CreateTable(table_object))
    pk = create_pk()
    try:
        with engine.begin() as transaction:
            transaction.execute(
                insert(table_object)
                .values(
                    id=pk,
                    opzionale=True,
                    **{"h{}".format(i): i for i in xrange(24)}))
            for i in xrange(int(1e5)):
                transaction.execute(
                    update(table_object)
                    .where(table_object.columns.id == pk)
                    .values({
                        "h{}".format(i): randint(0, 23)
                        for i in xrange(24)}))
                if 0 == i % 500:
                    print(
                        "{} => {} :: {}", 
                        i, 
                        getsizeof(transaction),
                        getsizeof(transaction.connection))
    finally:
        engine.execute(DropTable(table_object))


if __name__ == "__main__":
    main()

Mike Bayer

unread,
Jul 2, 2020, 3:10:04 PM7/2/20
to noreply-spamdigest via sqlalchemy


On Thu, Jul 2, 2020, at 12:21 PM, Massimiliano della Rovere wrote:
Greetings,
I am using
* SQLAlchemy: 1.3.3
* psycopg2-binary: 2.8.3
* postgresql: 9.6

While running a SQLAlchemy CORE script that makes lots of updates (about 300k), some inserts (about 50k) and commits data only at the end, the script consumes about 8GB+ RAM (well... we have to stop it to avoid the machine crashing).

I managed to replicate the problem with the simpler script at the end of this email.

I implemented missing "create_pk()" as:

    def create_pk():
        return str(randint(0, 1000))
    pk = create_pk()


also noted this is obviously Python 2.

the script prints out lines like this:

{} => {} :: {} 10500 64 64
{} => {} :: {} 11000 64 64
{} => {} :: {} 11500 64 64
{} => {} :: {} 12000 64 64
{} => {} :: {} 12500 64 64
{} => {} :: {} 13000 64 64
{} => {} :: {} 13500 64 64
{} => {} :: {} 14000 64 64


and memory stays constant.






I'm just asking if SQLAlchemy somehow takes "note" of uncommitted data,

outside of the ORM, definitely not.



or if in your opinion this smells like a psycopg2 leak/"feature" problem.

I'm not able to reproduce with psycopg2 either.  you may want to see if any part of your script is emitting warnings as in Python, warnings that have different text can pile up in a global collection if you have them suppressed.


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages