Greetings,
I am using
* SQLAlchemy: 1.3.3
* psycopg2-binary: 2.8.3
* postgresql: 9.6
While running a SQLAlchemy CORE script that makes lots of updates (about 300k), some inserts (about 50k) and commits data only at the end, the script consumes about 8GB+ RAM (well... we have to stop it to avoid the machine crashing).
I managed to replicate the problem with the simpler script at the end of this email.
I'm just asking if SQLAlchemy somehow takes "note" of uncommitted data, or if in your opinion this smells like a psycopg2 leak/"feature" problem.
Regarding the script below, run it in a console and run top/htop in another console, you'll see the script uses more and more ram as time passes by.
In PostgreSQL's logs I see the initial insert and all the following updates, so the data is correctly received by the database, giving no reason to the SA/Psycopg2 to keep track of uncommitted data, and thus consume RAM.
------------------------------------
# encoding: utf-8
from __future__ import print_function
from importlib import import_module
from random import randint
from sys import getsizeof
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import CreateTable, DropTable
from sqlalchemy.sql.expression import func, insert, update
from sqlalchemy.sql.schema import Column, Table
from sqlalchemy.types import Boolean, Date, Integer, String
def main():
engine = create_engine(
"postgres://user:pass...@127.0.0.1:5432/db")
metadata = MetaData(bind=engine)
table_object = Table(
"prova_commit",
metadata,
Column("id", String(50), primary_key=True),
Column("data", Date, nullable=False, default=func.now()),
Column("opzionale", Boolean, nullable=False, default=False),
*tuple(Column("h{}".format(i), Integer) for i in xrange(24)))
engine.execute(CreateTable(table_object))
pk = create_pk()
try:
with engine.begin() as transaction:
transaction.execute(
insert(table_object)
.values(
id=pk,
opzionale=True,
**{"h{}".format(i): i for i in xrange(24)}))
for i in xrange(int(1e5)):
transaction.execute(
update(table_object)
.where(table_object.columns.id == pk)
.values({
"h{}".format(i): randint(0, 23)
for i in xrange(24)}))
if 0 == i % 500:
print( "{} => {} :: {}",
i,
getsizeof(transaction),
getsizeof(transaction.connection))
finally:
engine.execute(DropTable(table_object))
if __name__ == "__main__":
main()