SA Performance (some profiling on simple selects)

163 views
Skip to first unread message

Henk

unread,
Jul 11, 2008, 5:19:26 AM7/11/08
to sqlalchemy
Hi

I did some profiling a simple select type query to check SA
performance for such use-cases

The test consiste of a very simple Order record:

order = Table('tblorder', metadata,
Column('order_id', Integer, primary_key = True),
Column('order_user_id', Integer),
Column('order_state', Integer),
Column('order_price', Integer),
Column('order_delivery_cost', Integer),
Column('order_creation_date', DateTime),
Column('order_modification_date', DateTime),
Column('order_last_payment_check', DateTime),
Column('order_delivery_address_id', Integer),
)

and some related log lines that keep track of what happens to the
order:

orderlog = Table('tblorderlog', metadata,
Column('orderlog_id', Integer, primary_key = True),
Column('orderlog_order_id', Integer,
ForeignKey('tblorder.order_id')),
Column('orderlog_creation_date', DateTime),
Column('orderlog_message', String),
Column('orderlog_priority', Integer),
)

On the ORM side of things these 2 have a simple 1-1 mapping with
classes Order and OrderLog and
the loglines are mapped to the order using:

'orderLogs': relation(OrderLog, backref='order')

The main loop of the test program is the following:

times = 1000
start = time.time()
for i in range(times):
order = Session.query(Order).get(72244)

for log in order.orderLogs:
pass

Session.clear()

end = time.time()

Note that it is the same Order that I am fetching all the time,
because I am interested in SA overhead and not in the speed of the
database (MySQL) itself, this way mysql will fetch from memory and not
be disk bound (also in this test I made sure I am not network bound).
Also note that the session is cleared each iteration trough the loop
to force SA to perform the 2 queries and corresponding mapping
overhead (1 qry for fetching order, 1 qry for fetching the 17
corresponding orderlog lines) on each iteration.

Profiling with cProfile (python 2.5.1) gave the following results
(sorted by total time spend in function):

2923724 function calls (2880671 primitive calls) in 5.965 CPU
seconds

Ordered by: internal time
List reduced from 498 to 20 due to restriction <20>

ncalls tottime percall cumtime percall
filename:lineno(function)
2000 0.585 0.000 0.585 0.000 {method 'query' of
'_mysql.connection' objects}
18000 0.192 0.000 1.715 0.000 mapper.py:
1407(_instance)
18000 0.160 0.000 0.801 0.000 mapper.py:
1381(populate_state)
2000 0.153 0.000 0.259 0.000 base.py:
1448(_init_metadata)
20000 0.153 0.000 0.198 0.000 times.py:
43(DateTime_or_None)
36000 0.147 0.000 0.147 0.000 identity.py:
91(__contains__)
16000 0.139 0.000 0.270 0.000 compiler.py:
239(visit_column)
112000 0.130 0.000 0.240 0.000 base.py:
1363(__getitem__)
94000 0.127 0.000 0.318 0.000 strategies.py:
71(new_execute)
18000 0.122 0.000 0.122 0.000 attributes.py:
745(__init__)
42000/2000 0.114 0.000 1.233 0.001 compiler.py:
176(process)
162036/162022 0.113 0.000 0.134 0.000 {getattr}
14000 0.106 0.000 0.541 0.000 compiler.py:
228(visit_label)
2000 0.103 0.000 1.222 0.001 compiler.py:
466(visit_select)
112000 0.103 0.000 0.110 0.000 base.py:1617(_get_col)
32000 0.089 0.000 0.158 0.000 compiler.py:
401(_truncated_identifier)
72000 0.086 0.000 0.096 0.000 util.py:806(add)
135101 0.085 0.000 0.085 0.000 {isinstance}
18000 0.073 0.000 0.181 0.000 identity.py:104(add)
46000 0.068 0.000 0.080 0.000 {built-in method sub}

What we see here is that of the total of 5.965 seconds, the test
spend only about 0.585 seconds performing the actual SQL querys, or
about 10% of the total time. The other 90% of the time we are spending
on SA overhead (except for the times.py line which is somewhere in the
MySQLdb driver).

If I look at the cummulative output of the profiler (ommitted for
brevity) I see for instance that SA spends 1.3 seconds compiling the
SQL expressions (it apparently recompiles the same SQL expression on
each iteration).

My question to the SA community would be how much ORM overhead do you
find acceptable? and how does SA compare to for instance to java's
Hibernate in this regard or other ORM solutions?

I would love to help out trying to reduce this overhead, but I have no
idea what the SA devs have in mind for the future of SA with regard to
speed (e.g. speed vs. feature completeness). In the tutorials and
documentation there is currently a strong emphasis on use cases
highlighting the very dynamic nature of building complex queries using
SA's ORM tools. Not so much on how to perform simple (and static)
queries fast using the ORM.

Maybe there is a way to meoize the results of the compile step so that
this does not need to be redone all the time while the Query remains
the same?.










Michael Bayer

unread,
Jul 11, 2008, 11:26:59 AM7/11/08
to sqlal...@googlegroups.com

On Jul 11, 2008, at 5:19 AM, Henk wrote:

>
> What we see here is that of the total of 5.965 seconds, the test
> spend only about 0.585 seconds performing the actual SQL querys, or
> about 10% of the total time. The other 90% of the time we are spending
> on SA overhead (except for the times.py line which is somewhere in the
> MySQLdb driver).
>
> If I look at the cummulative output of the profiler (ommitted for
> brevity) I see for instance that SA spends 1.3 seconds compiling the
> SQL expressions (it apparently recompiles the same SQL expression on
> each iteration).
>
> My question to the SA community would be how much ORM overhead do you
> find acceptable? and how does SA compare to for instance to java's
> Hibernate in this regard or other ORM solutions?

As far as Hibernate, I work quite a bit with Hibernate and am
currently creating an almost exact port of a Java/Hibernate
application using Pylons and SQLAlchemy. My observation is that
SQLAlchemy is probably a little slower on execution overhead, but is
faster when fetching large numbers of rows. Your test is very much
focused on single execution speed.

The funny thing is, the SQLAlchemy version of the application overall
is *much* faster than the Hibernate version, simply because SQLAlchemy
knows how to issue a lot fewer queries to get something done. The
main example is loading a list of A, each of which has a relation to B
and then a relation to C - you need to display a slice of the "A"
table (using LIMIT/OFFSET) as well as all the B's and C's.

This is probably the most common operation one needs to do with an
ORM, and Hibernate quite simply cannot do this efficiently - see http://www.hibernate.org/117.html#A12
:

"It should be also obvious why resultset row-based "limit" operations,
such as setFirstResult(5) and setMaxResults(10) do not work with these
kind of eager fetch queries. If you limit the resultset to a certain
number of rows, you cut off data randomly. One day Hibernate might be
smart enough to know that if you call setFirstResult() or
setMaxResults() it should not use a join, but a second SQL SELECT. Try
it, your version of Hibernate might already be smart enough. If not,
write two queries, one for limiting stuff, the other for eager
fetching."

As it turns out, SQLAlchemy was "smart enough" to do this correctly
from day one (though I'm not currently aware of any other ORM that
does this). When querying for rows with a limit/offset in conjunction
with eager joins, SQLA wraps the limited results in a subquery, and
applies the eager joins to that subquery. The efficiency of the
query is much greater than what you'd get with Hibernate since the
outer joins are against only a small slice of the total rows, and of
course you get the correct results back, and there's no need to fall
back to a second per-child-object query, which I've found nearly
impossible to prevent Hibernate from doing in many cases. SQLA is
very good at not "dropping back" into lazy loading when eager loads
are requested. Because of reasons like this, most Hibernate
applications speed up significantly when converted to SQLAlchemy,
since you'll almost certainly see at least a few areas where Hibernate
issued a few hundred queries that SQLA issues only one.

SQLA's very dynamic generation of queries is also a reason we don't
rely upon caching of the "compiled" form of a SQL query by default -
the complexity and memory overhead of doing so would negate any speed
improvements. While statement compilation adds some overhead, its
not really the primary overhead during normal operation; fetching rows
and converting to object instances, along with the associated
bookkeeping, is where most time gets spent. In your own application,
you can cache the "compiled" form of a SQL query by using
statement.compile() and use it in conjunction with an ORM query (via
instances()), if there's a particular critical section in which you're
trying to optimize; but I think you'll find the aggregate time savings
(versus just issuing one less query) are marginal.

If you're still skeptical of the overhead of statement compilation,
see one very high capacity website which uses it regularly, http://www.reddit.com
. In their recently released source code, you can see at http://code.reddit.com/browser/r2/r2/lib/db/tdb_sql.py#L458
that they are using a SQLAlchemy "select()" construct, built fresh
on each request and compiled, at the base of their SQL database
library. A few hundred function calls here are not the issue; it's
the fact that *any* SQL at all is issued which places a burden on the
application. So Reddit scales the way any website has to scale; by
adding processing power horizontally both at the database level and at
the webserver level, and by applying a second level caching solution
(more on this below).

> I would love to help out trying to reduce this overhead, but I have no
> idea what the SA devs have in mind for the future of SA with regard to
> speed (e.g. speed vs. feature completeness). In the tutorials and
> documentation there is currently a strong emphasis on use cases
> highlighting the very dynamic nature of building complex queries using
> SA's ORM tools. Not so much on how to perform simple (and static)
> queries fast using the ORM.
> Maybe there is a way to meoize the results of the compile step so that
> this does not need to be redone all the time while the Query remains
> the same?.

Well in your test, you are clearing the Session on each iteration. If
you weren't doing that, then the object you loaded would be pulled
straight from the Session each time, so in that case the "memoization"
is already complete. Similarly, a Query issued for any number of
rows runs much faster when the resulting objects are already present
in the Session since they do not need to be re-created or populated
again.

For a more general form of "memoization" of queries and their results,
what you're looking for is a second level cache. Hibernate suppllies
a 2nd level caching solution but SQLAlchemy currently does not,
although simple ones are fairly straightforward to create based on a
product such as Memcached, or even just well-maintained dicts. Any
database-driven application which needs very high performing display
of data requires a second level cache of some kind, whether or not an
ORM is used; while SQLA does not have a built in solution to the
second level caching problem, it also didn't create the issue.

Overall, the issue of performance within SQLAlchemy has been addressed
exhaustively and continues to be on a daily basis; at this point, the
path to further speedups lies along two avenues - one is that we
constantly review the code and find small incremental gains to be
made; in particular we've squeezed out 10-20% more speed in SQLA 0.5
versus 0.4. We have performance tests in our test suite which run
through a full battery of SQL and ORM operations, ensuring that
funciton call counts fall within an expected range (http://www.sqlalchemy.org/trac/browser/sqlalchemy/trunk/test/profiling/zoomark_orm.py
for example).

The other avenue in which we are very interested is applying native
extensions to SQLA at key performance areas. Two that we have in mind
are result set processing and attribute instrumentation. This is a
longer road to travel (in particular since we are in need of a C
developer with some time to kill) but hopefully we'll get a resource
for this. There is another ORM called Storm which is in the process
of creating C extensions for their own product (the development of
which is fully funded by Storm's parent company Canonical, a luxury
SQLAlchemy unfortunately does not have), so that's something else you
might want to check out. A comparison of SQLAlchemy and Storm's
speeds (with the pre-C-extension version of Storm) is at http://techspot.zzzeek.org/?p=17
, where you'll see that Storm is generally a little faster than SQLA
but SQLA loads large numbers of rows more quickly (note that 0.5 is
even faster than the 0.4 tested in that post). I'm very satisfied
with those results since SQLA is philosophically more feature rich
than Storm, yet the speed of both products (sans any native
extensions) are not very different.

In general, as long as speed is "decent", our users don't have any
issues with it. For this reason, while the C-based extension route
would be helpful, it's not a super-burning issue either. Any web
application, no matter what platform it's running on, is scaled up to
higher capacity through second level caching and via the addition of
more processing support horizontally. There's plenty of applications
running SQLAlchemy, SQLObject, Storm, etc. at this point, all of which
introduce significant overhead versus raw SQL, and in general they
don't seem to be falling over.


a...@svilendobrev.com

unread,
Jul 11, 2008, 11:40:10 AM7/11/08
to sqlal...@googlegroups.com
can u elaborate more on the C level programming?
what are the bottleneck/s that can be redone in C?
maybe i can help there.

a...@svilendobrev.com

unread,
Jul 11, 2008, 3:48:53 PM7/11/08
to sqlal...@googlegroups.com
about (ways of) query compilation:
for example in dbcook i have query-filters expressed as plain python
functions, and that adds 3 or 4 more levels additional to the usual
query building. here the levels:

expr.translating:
-a: py-func -> makeExpresion -> expr
-b: expr -> walk( Translator( context) ) -> SA column expression
context needs plain vars, and vars-classes/mappers
-c: context binding: classes / mappers
-d: context binding: plain vars
-e: apply column-expression as select over a query
-f: sql.bindparams

way0: all everytime: a c, d,b,e,sql - no bindparams whatsoever
way1: a c, store; then d,b,e,sql - no bindparams whatsoever
way2: a b c, store; then d/bindparams, e,sql
way3: a b c, e, store; then d/bindparams, sql
way4: a b c, e, sql, store; then d/bindparams

i guess similar breakdown can be done on the SA-query-building itself,
with the most important division being query-into-statement,
statement+bindparams+exec, instantiation.

i admit it's a bit hard to build the model and app in a way that
allows usage of bindparams in their real power, which is: the whole
query statement is like a function and the bindparams are arguments
to that function. note i'm not talking about the constants that SA
automaticaly makes into bindparams, i'm talking about real
parameters.

i also must admit that i'm not too much sql-er and try to avoid it as
much as possible; still, if speed means replacing a 1000 hierarchical
per-node queries with one huge awfuly twisted query that gives those
1000 in one long shot, so be it (well but it takes.. weeks to
invent). i guess if SA was say 10 times faster i would not bother
doing it until i run it on some networked server and find that the
reality is about 10,000 nodes and it didn't realy matter if SA was
faster or slower - it's just a wrong way.

not that i like overheads either... but in the long run i think it's
correctness and consistency and re/usability that is important. What
was running in 5 secs yesterday will run in 3 secs next year on
same-price-equipment.

it would be quite interesting if u try these, in various independent
combinations, and compare the results:
a) the query-into-statement "compilation", then binding+executing
that 1000 times
b) keep the instances cached
c) have just _one_ query returning same thing that current
1000-queries do, in one long shot - and run that once.
i guess combining a) with c) is meaningless, hence the useful
combinations seems: nothing, a, a+b, c, c+b
it may show interesting sides of the query execution

but give them more than 1000, maybe 3000 (~15sec).

ciao
svilen

Reply all
Reply to author
Forward
0 new messages