Mysql v8.0.23 on ubuntu 20.04
SQLAlchemy 1.3.22
Python 3.8.5
I have an SQL query as shown below that updates row in database.
When run by command line or using DBeaver SQL editor, I get 583k lines updated.
When using SQLAlchemy I get only 450k lines updated. i.e. some rows get updated but not all.
I checked the user permissions, the charset and collation, everything is ok.
I managed to narrow it down to a few IDs where it fails with SQL alchemy but works in IDE or CLI. However the underlying data has absolutely nothing different.
I've also tried to copy the query from the echo log and it works fine in IDE or CLI despite failing in SQLAlchemy python script.
Please see below output logs, script with query:
import time
import sqlalchemy
from datetime import *
from sqlalchemy import create_engine, exc
from dateutil.parser import parse
connect_args = {'init_command':"SET @@collation_connection='utf8mb4_0900_ai_ci'"}
# instantiate db connection
eng = create_engine('mysql+mysqldb://xxxxxxx@localhost:3306/mydb?charset=utf8mb4', connect_args=connect_args, echo=True)
try:
query = """UPDATE magic_users mu
JOIN (
SELECT
min.user_id as user_id,
(max.total_followers - min.total_followers) as progression
FROM(
select
user_id, measurement_date, total_followers
from followers_totals ft
where measurement_date = (
select max(measurement_date)
from followers_totals as f
where f.user_id = ft.user_id
and date_format(ft.measurement_date, '%%Y-%%m-%%d') >= date_format(date_sub(CURDATE(), interval 14 day), '%%Y-%%m-%%d'))
) max
JOIN (
select
user_id,measurement_date,total_followers
from followers_totals ft
where measurement_date = (
select min(measurement_date)
from followers_totals as f
where f.user_id = ft.user_id
and date_format(ft.measurement_date, '%%Y-%%m-%%d') >= date_format(date_sub(CURDATE(), interval 14 day), '%%Y-%%m-%%d'))
) min
ON max.user_id = min.user_id
WHERE min.user_id = '43691' and max.user_id = '43691') progression
ON progression.user_id = mu.user_id
SET mu.followers_count_progress_14D = progression.progression
WHERE progression.user_id is not null;"""
with eng.connect() as con:
rs = con.execute(query)
print(rs.rowcount)
con.close()
except exc.SQLAlchemyError as e:
print (e)
2021-02-14 13:27:59,593 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
2021-02-14 13:27:59,593 INFO sqlalchemy.engine.base.Engine ()
2021-02-14 13:27:59,595 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2021-02-14 13:27:59,595 INFO sqlalchemy.engine.base.Engine ()
2021-02-14 13:27:59,598 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
2021-02-14 13:27:59,598 INFO sqlalchemy.engine.base.Engine ()
2021-02-14 13:27:59,599 INFO sqlalchemy.engine.base.Engine show collation where `Charset` = 'utf8mb4' and `Collation` = 'utf8mb4_bin'
2021-02-14 13:27:59,599 INFO sqlalchemy.engine.base.Engine ()
2021-02-14 13:27:59,601 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
2021-02-14 13:27:59,601 INFO sqlalchemy.engine.base.Engine ()
2021-02-14 13:27:59,602 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
2021-02-14 13:27:59,602 INFO sqlalchemy.engine.base.Engine ()
2021-02-14 13:27:59,602 INFO sqlalchemy.engine.base.Engine SELECT CAST('test collated returns' AS CHAR CHARACTER SET utf8mb4) COLLATE utf8mb4_bin AS anon_1
2021-02-14 13:27:59,602 INFO sqlalchemy.engine.base.Engine ()
2021-02-14 13:27:59,603 INFO sqlalchemy.engine.base.Engine UPDATE magic_users mu JOIN (SELECT min.user_id as user_id, (max.total_followers - min.total_followers) as progression FROM( select user_id, measurement_date, total_followers from followers_totals ft where measurement_date = (select max(measurement_date) from followers_totals as f where f.user_id = ft.user_id and date_format(ft.measurement_date, "%%Y-%%m-%%d") >= date_format(date_sub(CURDATE(), interval 14 day), "%%Y-%%m-%%d"))) max JOIN ( select user_id, measurement_date, total_followers from followers_totals ft where measurement_date = (select min(measurement_date) from followers_totals as f where f.user_id = ft.user_id and date_format(ft.measurement_date, "%%Y-%%m-%%d") >= date_format(date_sub(CURDATE(), interval 14 day), "%%Y-%%m-%%d"))) min ON max.user_id = min.user_id WHERE min.user_id = 43691 and max.user_id = 43691) progression ON progression.user_id = mu.user_id SET mu.followers_count_progress_14D = progression.progression WHERE progression.user_id is not null;
2021-02-14 13:27:59,603 INFO sqlalchemy.engine.base.Engine ()
2021-02-14 13:27:59,617 INFO sqlalchemy.engine.base.Engine COMMIT
0
Any help or idea about how to debug this would be very much appreciated.