>
> What we see here is that of the total of 5.965 seconds, the test
> spend only about 0.585 seconds performing the actual SQL querys, or
> about 10% of the total time. The other 90% of the time we are spending
> on SA overhead (except for the times.py line which is somewhere in the
> MySQLdb driver).
>
> If I look at the cummulative output of the profiler (ommitted for
> brevity) I see for instance that SA spends 1.3 seconds compiling the
> SQL expressions (it apparently recompiles the same SQL expression on
> each iteration).
>
> My question to the SA community would be how much ORM overhead do you
> find acceptable? and how does SA compare to for instance to java's
> Hibernate in this regard or other ORM solutions?
As far as Hibernate, I work quite a bit with Hibernate and am
currently creating an almost exact port of a Java/Hibernate
application using Pylons and SQLAlchemy. My observation is that
SQLAlchemy is probably a little slower on execution overhead, but is
faster when fetching large numbers of rows. Your test is very much
focused on single execution speed.
The funny thing is, the SQLAlchemy version of the application overall
is *much* faster than the Hibernate version, simply because SQLAlchemy
knows how to issue a lot fewer queries to get something done. The
main example is loading a list of A, each of which has a relation to B
and then a relation to C - you need to display a slice of the "A"
table (using LIMIT/OFFSET) as well as all the B's and C's.
This is probably the most common operation one needs to do with an
ORM, and Hibernate quite simply cannot do this efficiently - see http://www.hibernate.org/117.html#A12
:
"It should be also obvious why resultset row-based "limit" operations,
such as setFirstResult(5) and setMaxResults(10) do not work with these
kind of eager fetch queries. If you limit the resultset to a certain
number of rows, you cut off data randomly. One day Hibernate might be
smart enough to know that if you call setFirstResult() or
setMaxResults() it should not use a join, but a second SQL SELECT. Try
it, your version of Hibernate might already be smart enough. If not,
write two queries, one for limiting stuff, the other for eager
fetching."
As it turns out, SQLAlchemy was "smart enough" to do this correctly
from day one (though I'm not currently aware of any other ORM that
does this). When querying for rows with a limit/offset in conjunction
with eager joins, SQLA wraps the limited results in a subquery, and
applies the eager joins to that subquery. The efficiency of the
query is much greater than what you'd get with Hibernate since the
outer joins are against only a small slice of the total rows, and of
course you get the correct results back, and there's no need to fall
back to a second per-child-object query, which I've found nearly
impossible to prevent Hibernate from doing in many cases. SQLA is
very good at not "dropping back" into lazy loading when eager loads
are requested. Because of reasons like this, most Hibernate
applications speed up significantly when converted to SQLAlchemy,
since you'll almost certainly see at least a few areas where Hibernate
issued a few hundred queries that SQLA issues only one.
SQLA's very dynamic generation of queries is also a reason we don't
rely upon caching of the "compiled" form of a SQL query by default -
the complexity and memory overhead of doing so would negate any speed
improvements. While statement compilation adds some overhead, its
not really the primary overhead during normal operation; fetching rows
and converting to object instances, along with the associated
bookkeeping, is where most time gets spent. In your own application,
you can cache the "compiled" form of a SQL query by using
statement.compile() and use it in conjunction with an ORM query (via
instances()), if there's a particular critical section in which you're
trying to optimize; but I think you'll find the aggregate time savings
(versus just issuing one less query) are marginal.
If you're still skeptical of the overhead of statement compilation,
see one very high capacity website which uses it regularly, http://www.reddit.com
. In their recently released source code, you can see at http://code.reddit.com/browser/r2/r2/lib/db/tdb_sql.py#L458
that they are using a SQLAlchemy "select()" construct, built fresh
on each request and compiled, at the base of their SQL database
library. A few hundred function calls here are not the issue; it's
the fact that *any* SQL at all is issued which places a burden on the
application. So Reddit scales the way any website has to scale; by
adding processing power horizontally both at the database level and at
the webserver level, and by applying a second level caching solution
(more on this below).
> I would love to help out trying to reduce this overhead, but I have no
> idea what the SA devs have in mind for the future of SA with regard to
> speed (e.g. speed vs. feature completeness). In the tutorials and
> documentation there is currently a strong emphasis on use cases
> highlighting the very dynamic nature of building complex queries using
> SA's ORM tools. Not so much on how to perform simple (and static)
> queries fast using the ORM.
> Maybe there is a way to meoize the results of the compile step so that
> this does not need to be redone all the time while the Query remains
> the same?.
Well in your test, you are clearing the Session on each iteration. If
you weren't doing that, then the object you loaded would be pulled
straight from the Session each time, so in that case the "memoization"
is already complete. Similarly, a Query issued for any number of
rows runs much faster when the resulting objects are already present
in the Session since they do not need to be re-created or populated
again.
For a more general form of "memoization" of queries and their results,
what you're looking for is a second level cache. Hibernate suppllies
a 2nd level caching solution but SQLAlchemy currently does not,
although simple ones are fairly straightforward to create based on a
product such as Memcached, or even just well-maintained dicts. Any
database-driven application which needs very high performing display
of data requires a second level cache of some kind, whether or not an
ORM is used; while SQLA does not have a built in solution to the
second level caching problem, it also didn't create the issue.
Overall, the issue of performance within SQLAlchemy has been addressed
exhaustively and continues to be on a daily basis; at this point, the
path to further speedups lies along two avenues - one is that we
constantly review the code and find small incremental gains to be
made; in particular we've squeezed out 10-20% more speed in SQLA 0.5
versus 0.4. We have performance tests in our test suite which run
through a full battery of SQL and ORM operations, ensuring that
funciton call counts fall within an expected range (http://www.sqlalchemy.org/trac/browser/sqlalchemy/trunk/test/profiling/zoomark_orm.py
for example).
The other avenue in which we are very interested is applying native
extensions to SQLA at key performance areas. Two that we have in mind
are result set processing and attribute instrumentation. This is a
longer road to travel (in particular since we are in need of a C
developer with some time to kill) but hopefully we'll get a resource
for this. There is another ORM called Storm which is in the process
of creating C extensions for their own product (the development of
which is fully funded by Storm's parent company Canonical, a luxury
SQLAlchemy unfortunately does not have), so that's something else you
might want to check out. A comparison of SQLAlchemy and Storm's
speeds (with the pre-C-extension version of Storm) is at http://techspot.zzzeek.org/?p=17
, where you'll see that Storm is generally a little faster than SQLA
but SQLA loads large numbers of rows more quickly (note that 0.5 is
even faster than the 0.4 tested in that post). I'm very satisfied
with those results since SQLA is philosophically more feature rich
than Storm, yet the speed of both products (sans any native
extensions) are not very different.
In general, as long as speed is "decent", our users don't have any
issues with it. For this reason, while the C-based extension route
would be helpful, it's not a super-burning issue either. Any web
application, no matter what platform it's running on, is scaled up to
higher capacity through second level caching and via the addition of
more processing support horizontally. There's plenty of applications
running SQLAlchemy, SQLObject, Storm, etc. at this point, all of which
introduce significant overhead versus raw SQL, and in general they
don't seem to be falling over.
expr.translating:
-a: py-func -> makeExpresion -> expr
-b: expr -> walk( Translator( context) ) -> SA column expression
context needs plain vars, and vars-classes/mappers
-c: context binding: classes / mappers
-d: context binding: plain vars
-e: apply column-expression as select over a query
-f: sql.bindparams
way0: all everytime: a c, d,b,e,sql - no bindparams whatsoever
way1: a c, store; then d,b,e,sql - no bindparams whatsoever
way2: a b c, store; then d/bindparams, e,sql
way3: a b c, e, store; then d/bindparams, sql
way4: a b c, e, sql, store; then d/bindparams
i guess similar breakdown can be done on the SA-query-building itself,
with the most important division being query-into-statement,
statement+bindparams+exec, instantiation.
i admit it's a bit hard to build the model and app in a way that
allows usage of bindparams in their real power, which is: the whole
query statement is like a function and the bindparams are arguments
to that function. note i'm not talking about the constants that SA
automaticaly makes into bindparams, i'm talking about real
parameters.
i also must admit that i'm not too much sql-er and try to avoid it as
much as possible; still, if speed means replacing a 1000 hierarchical
per-node queries with one huge awfuly twisted query that gives those
1000 in one long shot, so be it (well but it takes.. weeks to
invent). i guess if SA was say 10 times faster i would not bother
doing it until i run it on some networked server and find that the
reality is about 10,000 nodes and it didn't realy matter if SA was
faster or slower - it's just a wrong way.
not that i like overheads either... but in the long run i think it's
correctness and consistency and re/usability that is important. What
was running in 5 secs yesterday will run in 3 secs next year on
same-price-equipment.
it would be quite interesting if u try these, in various independent
combinations, and compare the results:
a) the query-into-statement "compilation", then binding+executing
that 1000 times
b) keep the instances cached
c) have just _one_ query returning same thing that current
1000-queries do, in one long shot - and run that once.
i guess combining a) with c) is meaningless, hence the useful
combinations seems: nothing, a, a+b, c, c+b
it may show interesting sides of the query execution
but give them more than 1000, maybe 3000 (~15sec).
ciao
svilen