On 05/25/2017 01:44 PM, Михаил Доронин wrote:
>> SQLAlchemy batches inserts in the ORM as is possible and with Core you
>> do this explicitly, both make use of cursor.executemany() which is then
>> determined by how the DBAPI handles it.
>
>
> Ummm. If that is true, why this line is used when I pass a list of
> values (its from visit_insert)? Clearly sqlalchemy creates big string
> `VALUES (%s, %s), (%s, %s)....` by itself without executemany.
>
>
>
> elif insert_stmt._has_multi_parameters:
> text += " VALUES %s" % (
> ", ".join(
> "(%s)" % (
> ', '.join(c[1] for c in crud_param_set)
> )
> for crud_param_set in crud_params
> )
> )
>
>
> I've written benchmark again.
>
>
https://gist.github.com/purpleP/35c4920020ad95585419eb1d250200b7
>
> TLDR; alchemy is more than three times slower than executemany for
> inserting 10000 values.
Hello -
I gave you links to detailed documentation that represents the many
years I have spent and developing documenting the performance
characteristics of SQLAlchemy.
The test you have posted here, along with your conclusions, does not
take this guidance into account nor does it indicate awareness of basic
API use as documented in the Core tutorial at:
http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements
Here are the links again, which refer to the performance characteristics
in detail as well as illustrate how to make use of executemany(), along
with benchmarks comparing different API options:
http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
http://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html.
SQLAlchemy's use of executemany() is a top-level, well documented
feature. It is not an obscure, hidden side effect. Years of effort
have gone into SQLAlchemy's documentation to ensure that behaviors like
these will not be mysterious.
Let's apply it to your script at:
https://gist.github.com/purpleP/35c4920020ad95585419eb1d250200b7#gistcomment-2106299
for 10K results:
----------------------------------------------------------------------------------
benchmark: 3 tests
----------------------------------------------------------------------------------
Name (time in ms) Min Max
Mean StdDev Median IQR
Outliers(*) Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_insert[raw] 141.2799 (1.0) 326.9598 (1.0)
182.5684 (1.0) 80.9658 (1.55) 144.1729 (1.0) 57.0499
(1.16) 1;1 5 1
test_insert[alcehmy_correct] 179.7159 (1.27) 340.2121 (1.04)
224.5844 (1.23) 68.2835 (1.30) 188.8490 (1.31) 79.4864
(1.61) 1;0 5 1
test_insert[alchemy_incorrect] 495.6210 (3.51) 636.7581 (1.95)
550.5236 (3.02) 52.3564 (1.0) 537.3681 (3.73) 49.3491
(1.0) 2;0 5 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
for 100K rows (no OperationalError is raised):
-----------------------------------------------------------------------------
benchmark: 3 tests
-----------------------------------------------------------------------------
Name (time in s) Min Max
Mean StdDev Median IQR
Outliers(*) Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_insert[raw] 1.3214 (1.0) 1.4007 (1.0)
1.3660 (1.0) 0.0307 (1.94) 1.3709 (1.0) 0.0448 (1.95)
2;0 5 1
test_insert[alcehmy_correct] 1.7432 (1.32) 1.7848 (1.27)
1.7645 (1.29) 0.0159 (1.0) 1.7655 (1.29) 0.0229 (1.0)
2;0 5 1
test_insert[alchemy_incorrect] 5.7412 (4.34) 5.9293 (4.23)
5.8330 (4.27) 0.0687 (4.33) 5.8319 (4.25) 0.0831 (3.63)
2;0 5 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQLAlchemy Core is therefore about 1.2 times slower than pure C
mysqlclient when used as documented. Which is actually a little less
than the difference against SQLite's driver vs. Core, that is described
at
http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow,
where Core is more like 1.6 times slower.
The SQLAlchemy project is very welcoming of thoughtful criticism and is
in fact the product of over a decade of critique by a wide range of
users. I encourage you to contribute, but please ensure you take the
time for links that I provide in response to your questions. I am not
the brightest light on the planet but I have been doing this for a very
long time. Thanks!
>
> for a million values sqlalchemy drops with OperationalError.
>
>
> Here's a vmprof visualization for 10000 values
>
http://vmprof.com/#/c7f9a4fd-55f9-469d-b6e4-b7cac04473fa .
>
> vmprof is much better than cProfile give it a try if you still coding in python.
>
>
> If you will need more data then just a visualization - I'll try to provide it. But you have the benchmark code in the gist.
>