how fast can pure sqlalchemy perform?

746 views
Skip to first unread message

kk

unread,
Aug 7, 2015, 5:17:03 AM8/7/15
to sqlal...@googlegroups.com
Dear all,
I am planning to totally revamp my rdbms model and totally migrate to
sqlalchemy.
Right now we have a lot of stored procedurs for obvious performance
benefits. However it is becoming more and more difficult to maintain
the system and also difficult to migrate existing users when there are
major changes to our software.
Basically our stored procedures get created when the database is created
in the deploy phase.
So I wish to know how much performance I will loos if I totally switch
to using ORM, specifically SQLAlchemy.
I am also planning to use some thing like json columns in postgresql
tables, so will it really matter with an ORM?
Happy hacking.
Krishnakant.

Ladislav Lenart

unread,
Aug 7, 2015, 5:33:27 AM8/7/15
to sqlal...@googlegroups.com
Hello.

ORM is certainly slower. How much depends A LOT on your workload. For example
bulk operations with ORM are an order of magnitude slower than raw SQL. On the
other hand, SQLAlchemy Core let's you write generative SQL queries without ORM
features which are as performant as raw SQL. Overall SQLAlchemy is an excellent
library to work with!

For some numbers, see:

http://docs.sqlalchemy.org/en/latest/faq/performance.html
http://docs.sqlalchemy.org/en/latest/orm/examples.html#examples-performance


HTH,

Ladislav Lenart

kk

unread,
Aug 7, 2015, 11:06:38 AM8/7/15
to sqlal...@googlegroups.com


On Friday 07 August 2015 03:03 PM, Ladislav Lenart wrote:
> Hello.
>
> ORM is certainly slower. How much depends A LOT on your workload. For example
> bulk operations with ORM are an order of magnitude slower than raw SQL. On the
> other hand, SQLAlchemy Core let's you write generative SQL queries without ORM
> features which are as performant as raw SQL.

So is is there some kind of a method to have some prepared sql
statements in SQLAlchemy itself?
I have seen that prepared statements in other languages like Java do a
great job.

> Overall SQLAlchemy is an excellent
> library to work with!

So you mean performance will really get hit when pure ORM is used.
So shold I use a mixture?
For inserts let's say orm and for bulk select queries some prepared
statement like thing (if it exists )?
Happy hacking.
Krishnakant.


Claudio Freire

unread,
Aug 7, 2015, 11:18:36 AM8/7/15
to sqlal...@googlegroups.com
On Fri, Aug 7, 2015 at 12:05 PM, kk <krm...@gmail.com> wrote:
> On Friday 07 August 2015 03:03 PM, Ladislav Lenart wrote:
>>
>> Hello.
>>
>> ORM is certainly slower. How much depends A LOT on your workload. For
>> example
>> bulk operations with ORM are an order of magnitude slower than raw SQL. On
>> the
>> other hand, SQLAlchemy Core let's you write generative SQL queries without
>> ORM
>> features which are as performant as raw SQL.
>
>
> So is is there some kind of a method to have some prepared sql statements in
> SQLAlchemy itself?
> I have seen that prepared statements in other languages like Java do a great
> job.

Depending on the dialect and driver, you can use executemany.
Not all drivers implement executemany with prepared statements though,
check your case.

>> Overall SQLAlchemy is an excellent
>> library to work with!
>
>
> So you mean performance will really get hit when pure ORM is used.
> So shold I use a mixture?
> For inserts let's say orm and for bulk select queries some prepared
> statement like thing (if it exists )?

It really depends on each use case.

If your concern is whether ORM queries will be efficient, don't worry,
SQLAlchemy is powerful enough that you can make almost any kind of
query with the ORM. Almost all the optimizations you could do to plain
SQL are doable at the ORM level.

If your concern is CPU overhead on the application side, yes, the ORM
does induce quite an overhead, but whether it's a problem or not
greatly depends on your use case, the number of objects your
transactions will be handling, the complexity of the mapping, your
latency and thoughput constraints, etc. I've convinced myself over
time that a little overhead is fine in exchange for the benefits the
ORM gives you, in ease of coding mostly, but also robustness (the ORM
solves some issues that are hard to handle correctly and robustly with
raw SQL), and SQLAlchemy is flexible enough that you can usually
escape to raw sql if/when you need to. You shouldn't optimize
prematurely, the ORM won't be a death trap as it happens with other
ORMs.

So, we're saying we need more information if we're to give a meaningful answer.

Mike Bayer

unread,
Aug 7, 2015, 11:52:03 AM8/7/15
to sqlal...@googlegroups.com


On 8/7/15 11:05 AM, kk wrote:
>
>
> On Friday 07 August 2015 03:03 PM, Ladislav Lenart wrote:
>> Hello.
>>
>> ORM is certainly slower. How much depends A LOT on your workload. For
>> example
>> bulk operations with ORM are an order of magnitude slower than raw
>> SQL. On the
>> other hand, SQLAlchemy Core let's you write generative SQL queries
>> without ORM
>> features which are as performant as raw SQL.
>
> So is is there some kind of a method to have some prepared sql
> statements in SQLAlchemy itself?
> I have seen that prepared statements in other languages like Java do a
> great job.

That's totally a myth and you can see me ranting on this whole topic of
"explicit prepared statements == SPEED" here:
https://mail.python.org/pipermail/db-sig/2014-December/006147.html See
the benchmark there. Whatever "performance" we get with prepared
statements is vanishingly small and utterly dwarfed by the
order-of-magnitude-greater latencies we get from Python. The DBAPI
already has a great speed optimization in this area and it is known as
executemany() - it applies only to CRUD statements, not SELECT, but
SQLAlchemy uses executemany() very heavily and to great effect - the
speed gains here are not so much due to prepared statements, as psycopg2
does not use them in any way, but due to the fact that we roll up lots
of data into a single call that psycopg2 can run from pure compiled C code.

Reading that thread overall, you'll learn at the very least that the
Python DBAPI does not expose prepared statements. As you'll note,
I'm entirely against the idea of them being made explicit, for this
exact reason; now everyone's going to want the concept expressed
explicitly in SQLAlchemy, involving that multiple resource-holding
cursors be held onto which then open the doors to all kinds of new
concurrency / memory / connection pool / locking issues that will all be
reported as new bugs that I have to worry about, all for absolutely no
good reason as explicit PS does just about nothing to help performance
in any real way. Yet another chronically misunderstood concept that
everyone is going to demand everywhere even if you show them that it's
pointless (see:
http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
for the reigning king of this phenomenon).

Now, there is something in SQLAlchemy that will give you an *enormous*
boost of speed that is basically doing what everyone things a "prepared"
statement will do, which is a Python-side "prepare" of everything.
Because compared to the database's time to set up a statement handle,
the time it takes for SQLAlchemy to set up a core Select from a Query
object as well as the time to build the Query itself is very
significant. That feature is known as Baked Queries and it is
documented here:
http://docs.sqlalchemy.org/en/rel_1_0/orm/extensions/baked.html. A lot
of work went into this very unique feature and it is also benchmarked in
the example suite, which I would strongly recommend you read and run fully.




>
>> Overall SQLAlchemy is an excellent
>> library to work with!
>
> So you mean performance will really get hit when pure ORM is used.
> So shold I use a mixture?
> For inserts let's say orm and for bulk select queries some prepared
> statement like thing (if it exists )?
The suite in
http://docs.sqlalchemy.org/en/rel_1_0/orm/examples.html#module-examples.performance
was built in order to provide the answers to these questions. That's
where you need to be.




> Happy hacking.
> Krishnakant.
>
>

Claudio Freire

unread,
Aug 7, 2015, 12:35:28 PM8/7/15
to sqlal...@googlegroups.com
It may be a myth most of the time, but there are cases where it is not.

I had one case (in a whole decade of programming, so it is indeed very
rare) in which a very complex query ran very fast, and planning time
was the dominant cost (think 150ms for planning and 15ms for
execution). For that query, preparing it explicitly saved a lot of
runtime. Again, executemany helps when it's implemented with prepared
statements. But I was using psycopg2 and it doesn't use prepared
statements, so I had to prepare them explicitly myself. This was with
SQLAlchemy 0.3 and I managed to do it just fine. Some black magic was
needed of course (had to explicitly compile the query to SQL, bind
parameters, and generate a PREPARE statement from it), but nothing too
complex.

Now, pg8000 does use prepared statements, so it may be as simple as
using that driver if the need for prepared statements is there. You
can even have two engines pointing to the same database and use pg8000
only for the queries that really need prepared statements. So
SQLAlchemy (and DBAPI) has come a long way since I found that case I
mentioned above. I'm sure today handling that case would have been
even easier.

kk

unread,
Aug 7, 2015, 5:58:43 PM8/7/15
to sqlal...@googlegroups.com
Thanks for the details reply.
Now I understand it better.
So you mean to say executemany will give me the needed performance gain
in a nutshell.
Is that curect?
Secondly, is executemany good at only Insert, or Update or both?
And lastly if I have a big resultset through a select statement, more so
from a view, what is the best approach to use if I decide not to use
stored procedures?
I will come with more questions before the decision is made so I may
take some more of your valuable time.
happy hacking.
Krishnakant.

Claudio Freire

unread,
Aug 10, 2015, 1:06:48 PM8/10/15
to sqlal...@googlegroups.com
In a nutshell, yes.

> Secondly, is executemany good at only Insert, or Update or both?

Both. More precisely, anything that doesn't produce results.

> And lastly if I have a big resultset through a select statement, more so
> from a view, what is the best approach to use if I decide not to use stored
> procedures?

Well, the bigger your result sets are, the less you care about store procedures.

Assuming store procedures speed up queries (which is a big IF that is
very often false), they only speed up the planning phase, not the
execution. There's no execution optimization you cannot accomplish
with raw SQL, so the bigger the result set, the less you care about
planning time, and thus the less the relative benefit from using
stored procedures is.

Honestly, the benefits of store procedures is so small, and their
maintainance cost so high, that I would suggest never using them
unless you find a case you've thoroughly analyzed and profiled, and
that you find they'd be a huge help (which won't happen for a big
while).

kk

unread,
Aug 10, 2015, 5:28:08 PM8/10/15
to sqlal...@googlegroups.com

Hi,
I have gone through the documentation for baked queries.
It sounds interesting and I guess the performance boost is surely going
to be worth noticing.
I just wished to ask one thing, if there are series of queries firing
with each one of them bringing back bulk recordsets, will this concept
still be helpful?
I am asking because then the most important bottleneck wil be the time
the RDBMS takes to parse, compile and execute these queries.
So it kind of becomes a task in itself.
I am not a big ORM expert and don't really know the details under the
hood so asking this.
Happy hacking.
Krishnakant.

kk

unread,
Aug 10, 2015, 5:42:26 PM8/10/15
to sqlal...@googlegroups.com



Hello,
Tahnks to you and Mike for detaild insight, My questions follow
in-line.On Friday 07 August 2015 08:48 PM, Claudio Freire wrote:
> On Fri, Aug 7, 2015 at 12:05 PM, kk <krm...@gmail.com> wrote:
>> On Friday 07 August 2015 03:03 PM, Ladislav Lenart wrote:
>>> Hello.
>>>
>>> ORM is certainly slower. How much depends A LOT on your workload. For
>>> example
>>> bulk operations with ORM are an order of magnitude slower than raw SQL. On
>>> the
>>> other hand, SQLAlchemy Core let's you write generative SQL queries without
>>> ORM features which are as performant as raw SQL.

I am going to be mostly with Postgresql for now so I don't wish database
independent queries. Perhaps we will shift to NoSql in near future
(more on that in some time ).
So if I were to write core queries then I could as well do directly with
psycopg2. What advantage I will then get by using SQLAlchemy?
>>>
>>
>> So is is there some kind of a method to have some prepared sql statements in
>> SQLAlchemy itself?
>> I have seen that prepared statements in other languages like Java do a great
>> job.
> Depending on the dialect and driver, you can use executemany.
> Not all drivers implement executemany with prepared statements though,
> check your case.
Ok, so is this good for some kind of bulk inserts and Updates? Or does
it have any other not so obvious performance bennifit?

>>> Overall SQLAlchemy is an excellent
>>> library to work with!
>>
>> So you mean performance will really get hit when pure ORM is used.
>> So shold I use a mixture?
>> For inserts let's say orm and for bulk select queries some prepared
>> statement like thing (if it exists )?
> It really depends on each use case.
>
> If your concern is whether ORM queries will be efficient, don't worry,
> SQLAlchemy is powerful enough that you can make almost any kind of
> query with the ORM. Almost all the optimizations you could do to plain
> SQL are doable at the ORM level.
I see, So can you give me some example of this kind of optimization
which I should particularly look at? I have seen docs on lazy joine
etc. Are there some other efficiency and performance tricks I can do?
I have many queries which get data from views and some times have to do
joins, such data is really bulk, in the magnitude of 50000 records and
is repeated 2 or 3 times at a go.
Inserts are not very very frequent and even if they do, the records in
one insert would be hardly one in master and 2 in the detail table from
a single user.
And not more than 25 or 30 users are inserting at one time.

>
> If your concern is CPU overhead on the application side, yes, the ORM
> does induce quite an overhead, but whether it's a problem or not
> greatly depends on your use case, the number of objects your
> transactions will be handling, the complexity of the mapping, your
> latency and thoughput constraints, etc. I've convinced myself over
> time that a little overhead is fine in exchange for the benefits the
> ORM gives you, in ease of coding mostly, but also robustness (the ORM
> solves some issues that are hard to handle correctly and robustly with
> raw SQL), and SQLAlchemy is flexible enough that you can usually
> escape to raw sql if/when you need to. You shouldn't optimize
> prematurely, the ORM won't be a death trap as it happens with other
> ORMs.
We have good server and I don't think CPU overhead is to much of a
concern given the use case which I have already mentioned.
There is fair bit of complicated calculations going on big result sets
but not as complicated as scientific ones. These are mostly financial
calculations such as those in preparing a ledger statement in a book
keeping software.

> So, we're saying we need more information if we're to give a meaningful answer.
> So I have provided enough information I guess.
By the way we are thinking of moving to either couch db or the
postgresql's jsonb datatype for most of our work in near future.
Any thing you can specially advice me as far as using SQLAlchemy on such
data?


Happy hacking.
Krishnakant.

Claudio Freire

unread,
Aug 10, 2015, 6:14:58 PM8/10/15
to sqlal...@googlegroups.com
On Mon, Aug 10, 2015 at 6:42 PM, kk <krm...@gmail.com> wrote:
> Hello,
> Tahnks to you and Mike for detaild insight, My questions follow in-line.On
> Friday 07 August 2015 08:48 PM, Claudio Freire wrote:
>>
>> On Fri, Aug 7, 2015 at 12:05 PM, kk <krm...@gmail.com> wrote:
>>>
>>> On Friday 07 August 2015 03:03 PM, Ladislav Lenart wrote:
>>>>
>>>> Hello.
>>>>
>>>> ORM is certainly slower. How much depends A LOT on your workload. For
>>>> example
>>>> bulk operations with ORM are an order of magnitude slower than raw SQL.
>>>> On
>>>> the
>>>> other hand, SQLAlchemy Core let's you write generative SQL queries
>>>> without
>>>> ORM features which are as performant as raw SQL.
>
>
> I am going to be mostly with Postgresql for now so I don't wish database
> independent queries. Perhaps we will shift to NoSql in near future (more on
> that in some time ).
> So if I were to write core queries then I could as well do directly with
> psycopg2. What advantage I will then get by using SQLAlchemy?

For one benefit, building complex queries programatically is much
easier with Core than with SQL strings, and less error-prone.

kk

unread,
Aug 10, 2015, 8:07:40 PM8/10/15
to sqlal...@googlegroups.com
> So esssentially baked statements as Mike was pointing seems to be a great solution for bigger resultsets.
Is that correct?
happy hacking.
Krishnakant.


kk

unread,
Aug 10, 2015, 8:10:39 PM8/10/15
to sqlal...@googlegroups.com
I see, I am sorry, I had misunderstood that with core I will have to
actually right the select insert and update statements with all the
concatenation as-is.
If that is the case then definitely I will try experimenting with core.
happy hacking.
Krishnakant.
Reply all
Reply to author
Forward
0 new messages