Performance of data mapping and plain access

david mugnai

unread,

Dec 27, 2006, 2:57:13 PM12/27/06

to sqlalchemy

Hi all,
while trying the ORM side of sqlalchemy, I noticed a remarkable speed
difference
between using the ORM machinery and the "plain" db access.

The attached test consists of 1000 inserts and 1000 selects; its
purpose is
just to test the overhead of the data mapping system.

The ORM version takes about 160% more time than the plain version in
user-land
time, and 60% in wall clock time:

ORM
Starting test
Total time: 77.8519508839

real 1m18.526s
user 0m22.277s
sys 0m0.392s

PLAIN
Starting test
Total time: 47.6633858681

real 0m48.319s
user 0m8.605s
sys 0m0.216s

I'm using the trunk version of SA and PostgreSQL 8.1.4 (Ubuntu Edgy
package).

Is this difference to be expected? What can be done to reduce it, if
anything?

I attach three files:
- userhandling1.py (the plain version)
- userhandling2.py (the ORM version)
- test-userhandling.py (the test program)

Thanks in advance.

david

Michael Bayer

unread,

Dec 27, 2006, 3:29:47 PM12/27/06

to sqlalchemy

the ORM is going to be slower in all cases since there is the overhead
of creating new object instances and populating them, as well as
initializing their attribute instrumentation and also a copy of their
attributes for the purposes of tracking changes when you issue a
flush() statement. this strategy was copied from that of hibernate's,
and provides the greatest stability and predictability with regards to
tracking history changes.

we do have some performance tests in the test/ directory which ive used
in the past for profiling and improving these operations, and they are
actually a lot better than they've been in the past. if your tests are
useful, I might add them as well (but note that your attachments didnt
come through, so try again).

one thing that could make ORM loads much faster would be if you knew
the objects would not need to be flushed() at a later point, and you
disabled history tracking on those instances. this would prevent the
need to create a copy of the object's attributes at load time.

while theres no option available for that right now its something that
could be added. that would probably give you a 20-40% speed boost at
least if not more.

Lele Gaifax

unread,

Dec 27, 2006, 3:55:25 PM12/27/06

to sqlal...@googlegroups.com

Michael Bayer wrote:
>
> the ORM is going to be slower in all cases since there is the overhead
> of creating new object instances and populating them, as well as
> initializing their attribute instrumentation and also a copy of their
> attributes for the purposes of tracking changes when you issue a
> flush() statement.

Couldn't an approach like dbrow (http://opensource.theopalgroup.com/)
bring some benefit in this area? It surely overlaps here and there with
SA, but expecially thanks to the C impl (there's also a pure Python one)
it's pretty fast in delivering lots of instances of the same class.

ciao, lele.

Michael Bayer

unread,

Dec 27, 2006, 5:13:25 PM12/27/06

to sqlalchemy

dbrow seems to be exactly what ResultProxy already does (which has no
speed problems).

the answer is in optimizing the "copy" made of each object's state upon
load. step 1 is to make that an optional step, do some speed tests to
insure that it speeds things up when disabled, step 2 is to possibly
make that "copy" only take place when you actually modify the loaded
object, so that the load operation is fast.

Michael Bayer

unread,

Dec 28, 2006, 1:37:48 AM12/28/06

to sqlalchemy

ive committed in r2174 some speed enhancements, not including the
abovementioned change to deferring the on-load "copy" operation (which
is a more involved change), that affords a 20% speed improvement in
straight instance loads and a 25% speed improvement in instances loaded
via eager loaders...mostly due to the removal of an expensive
__getattribute__ call that was involved as well as an AttributeError
being thrown/caught for each instance created.

deferring the on-load "copy" operation which i mentioned earlier it
turns out would probably knock another 12% off the loading time, so its
not as dramatic as I thought originally.

david mugnai

unread,

Dec 28, 2006, 4:55:29 AM12/28/06

to sqlalchemy

Michael Bayer wrote:
[snip]

> actually a lot better than they've been in the past. if your tests are
> useful, I might add them as well (but note that your attachments didnt
> come through, so try again).

I forgot the attachments, sorry. Please find them here:

http://d2.spacespa.it:8000/temp/test-userhandling.py
http://d2.spacespa.it:8000/temp/userhandling1.py
http://d2.spacespa.it:8000/temp/userhandling2.py

> one thing that could make ORM loads much faster would be if you knew
> the objects would not need to be flushed() at a later point, and you
> disabled history tracking on those instances. this would prevent the
> need to create a copy of the object's attributes at load time.

How can I disable the history tracking?

[snip]

Strangely, your last commit doesn't seem to bring any speed
improvements. Last measurements:

user handling with ORM
Starting test
Total time: 76.4469230175

real 1m17.088s
user 0m22.097s
sys 0m0.348s

user handling without ORM
Starting test
Total time: 44.6131839752

real 0m45.185s
user 0m8.561s
sys 0m0.228s

I'd also like to know how to improve the userhandling2.py file. I have
some doubts about the life-cycle of session objects. Is it better to
create and destroy them at each transaction, or to make them long
lived?

thanks in advance
david

Sébastien LELONG

unread,

Dec 28, 2006, 9:06:48 AM12/28/06

to sqlal...@googlegroups.com, Michael Bayer

> one thing that could make ORM loads much faster would be if you knew
> the objects would not need to be flushed() at a later point, and you
> disabled history tracking on those instances. this would prevent the
> need to create a copy of the object's attributes at load time.

This reminds me a functionality I was looking for a few weeks ago. I'd liked
to be able to load read-only objects from a database. Read-only objects were
(are still) useful for me when retrieving data from a "precious" database
(production db), ensuring no update/insert/delete operations would occur.
Another scenario is an app which can actually do these operations, and
another app which musn't. While this can be done using grants, it could be
useful (and easier) if things could be done at the ORM level.

Is there any other way to get those read-only objects, without implementing
this functionality ?

Cheers,

--
Sébastien LELONG
sebastien.lelong[at]sirloon.net

Michael Bayer

unread,

Dec 28, 2006, 11:29:11 AM12/28/06

to sqlalchemy

if its truly an issue of security then grants would be more
appropriate. since anything the ORM does to "prevent" a write
operation can be easily overridden, since its Python. simpliest thing
would be to use a Session that has flush() overidden. or an engine that
overrides execute() to check for INSERT/UPDATE/DELETE statements and
throws an error. not a feature id think is appropriate for the core of
SA though.

Sébastien LELONG

unread,

Dec 28, 2006, 12:12:47 PM12/28/06

to sqlal...@googlegroups.com, Michael Bayer

> [...] simpliest thing

> would be to use a Session that has flush() overidden. or an engine that
> overrides execute() to check for INSERT/UPDATE/DELETE statements and

> throws an error [...]

I tried the ReadOnlySession class which overrides the flush() func. Works like
a charm, this adds a "security net" in my apps as I cannot actually even add
any grants, so this is very useful for me. Thanks.

Moreover, in the 2nd scenario, if app #1 can insert rows and app #2 can't,
this means there must be different db users (per app) with specific grants
for each, which adds complexity and may results in an "evil maintainance"...

Michael Bayer

unread,

Dec 29, 2006, 11:31:53 AM12/29/06

to sqlalchemy

the method that I improved in my checkin is related to the mapper's
construction of a newly loaded instance, and the test case that
improves 20% focuses most of its time loading a list of 2500 items.
the test you have here spends a lot of time doing lots of other things,
such as saving items (about 60% of its time is spent in save operations
rather than loads), creating new sessions, firing off lazy load
callables, etc. so its work is too diffuse to illustrate the one
bottleneck i sped up a bit. if you make your two relations lazy=False,
it knocks a few seconds off the total time.

as far as lifecycle, its dependent on your application's workings. for
a typical web application its easy - a session for each web request.
for other kinds of apps like GUI apps, you have to identify what
constitutes a "session". it may or may not be efficient to have a new
session for each transaction. it depends on how much freshness you
need, what kind of concurrency is present, etc.

there is no option to disable the attribute-copy-on-load operation; you
can make it happen by removing the body of the "commit()" method in
sqlalchemy/orm/attributes.py line 642 and replacing it with just
"pass". but then your session flushes wont work, and neither will your
test program. if you created a test program that just did loads, it
still doesnt speed things up too much, particulary if you have few
attributes on your objects.

Reply all

Reply to author

Forward