The attached test consists of 1000 inserts and 1000 selects; its
purpose is
just to test the overhead of the data mapping system.
The ORM version takes about 160% more time than the plain version in
user-land
time, and 60% in wall clock time:
ORM
Starting test
Total time: 77.8519508839
real 1m18.526s
user 0m22.277s
sys 0m0.392s
PLAIN
Starting test
Total time: 47.6633858681
real 0m48.319s
user 0m8.605s
sys 0m0.216s
I'm using the trunk version of SA and PostgreSQL 8.1.4 (Ubuntu Edgy
package).
Is this difference to be expected? What can be done to reduce it, if
anything?
I attach three files:
- userhandling1.py (the plain version)
- userhandling2.py (the ORM version)
- test-userhandling.py (the test program)
Thanks in advance.
david
we do have some performance tests in the test/ directory which ive used
in the past for profiling and improving these operations, and they are
actually a lot better than they've been in the past. if your tests are
useful, I might add them as well (but note that your attachments didnt
come through, so try again).
one thing that could make ORM loads much faster would be if you knew
the objects would not need to be flushed() at a later point, and you
disabled history tracking on those instances. this would prevent the
need to create a copy of the object's attributes at load time.
while theres no option available for that right now its something that
could be added. that would probably give you a 20-40% speed boost at
least if not more.
Couldn't an approach like dbrow (http://opensource.theopalgroup.com/)
bring some benefit in this area? It surely overlaps here and there with
SA, but expecially thanks to the C impl (there's also a pure Python one)
it's pretty fast in delivering lots of instances of the same class.
ciao, lele.
the answer is in optimizing the "copy" made of each object's state upon
load. step 1 is to make that an optional step, do some speed tests to
insure that it speeds things up when disabled, step 2 is to possibly
make that "copy" only take place when you actually modify the loaded
object, so that the load operation is fast.
deferring the on-load "copy" operation which i mentioned earlier it
turns out would probably knock another 12% off the loading time, so its
not as dramatic as I thought originally.
I forgot the attachments, sorry. Please find them here:
http://d2.spacespa.it:8000/temp/test-userhandling.py
http://d2.spacespa.it:8000/temp/userhandling1.py
http://d2.spacespa.it:8000/temp/userhandling2.py
> one thing that could make ORM loads much faster would be if you knew
> the objects would not need to be flushed() at a later point, and you
> disabled history tracking on those instances. this would prevent the
> need to create a copy of the object's attributes at load time.
How can I disable the history tracking?
[snip]
Strangely, your last commit doesn't seem to bring any speed
improvements. Last measurements:
user handling with ORM
Starting test
Total time: 76.4469230175
real 1m17.088s
user 0m22.097s
sys 0m0.348s
user handling without ORM
Starting test
Total time: 44.6131839752
real 0m45.185s
user 0m8.561s
sys 0m0.228s
I'd also like to know how to improve the userhandling2.py file. I have
some doubts about the life-cycle of session objects. Is it better to
create and destroy them at each transaction, or to make them long
lived?
thanks in advance
david
This reminds me a functionality I was looking for a few weeks ago. I'd liked
to be able to load read-only objects from a database. Read-only objects were
(are still) useful for me when retrieving data from a "precious" database
(production db), ensuring no update/insert/delete operations would occur.
Another scenario is an app which can actually do these operations, and
another app which musn't. While this can be done using grants, it could be
useful (and easier) if things could be done at the ORM level.
Is there any other way to get those read-only objects, without implementing
this functionality ?
Cheers,
--
Sébastien LELONG
sebastien.lelong[at]sirloon.net
I tried the ReadOnlySession class which overrides the flush() func. Works like
a charm, this adds a "security net" in my apps as I cannot actually even add
any grants, so this is very useful for me. Thanks.
Moreover, in the 2nd scenario, if app #1 can insert rows and app #2 can't,
this means there must be different db users (per app) with specific grants
for each, which adds complexity and may results in an "evil maintainance"...
as far as lifecycle, its dependent on your application's workings. for
a typical web application its easy - a session for each web request.
for other kinds of apps like GUI apps, you have to identify what
constitutes a "session". it may or may not be efficient to have a new
session for each transaction. it depends on how much freshness you
need, what kind of concurrency is present, etc.
there is no option to disable the attribute-copy-on-load operation; you
can make it happen by removing the body of the "commit()" method in
sqlalchemy/orm/attributes.py line 642 and replacing it with just
"pass". but then your session flushes wont work, and neither will your
test program. if you created a test program that just did loads, it
still doesnt speed things up too much, particulary if you have few
attributes on your objects.