1244187446.32 model start
1244187446.62 model stop
0.3 second just to set up the model! I can live with 0.05 for it, may be even
0.1, but 0.3 for _each_ GET or POST request is a bit too much, don't you
think?
That is for not too complex model - 17 tables, averaging 8.6 SQLFields per
one. On another web2py project it takes 0.38...0.42 second each time :(
I tried compiling my app and measuring again:
1244187625.31 model start
1244187625.69 model stop
Not any better. In fact, it's even worse, but since results vary from run to
run I suspect that it is just the same perfomance.
Massimo, as I know you've been working on new model for some time already.
Is there any hope of having a faster model? I suspect more lazy evaluation
should do the magic, but I didn't do any research yet.
Frankly speaking when I first discovered the fact that web2py always
_executes_ model, controller, view, I thought that it may be a perfomance
hog. Until I actually did that check I thought that it will execute db.py
each time it changes on-disk and then just keep built structures somewhere
around, probably pickled. May be it is still possible to use that approach to
some extent?
Or may be I am just completely missing the point. Please comment.
--
Sincerely yours
Alexey Nezhdanov
> 2) is_integer is a fast call, but with 1.1k (!) calls ...
Replaced it with my own version:
integer_pat=re.compile('[0-9]+$')
is_integer=lambda x: integer_pat.match(x)
it's about 2.3 times faster. C version would be even better.
> 3) sqlhtml_validators is an obvious bug... 8/9 of work is wasted.
> ... I suspect that dictionary building may be moved into module namespace
did that with some tweaks (same validators get reused). Dangerous approach
may introduce hidden dataflow, but that could be prevented by somehow making
these validators read-only.
Got another 9.5% model time speedup. (0.096s vs 0.106s). Though this figures
get more and more meaningless as model init time reduces it's share in total
page generation time. Probably I'll just switch to time measuring of
urllib.urlopen().
...OK, switched. Here are stats. Each line is in the format
model_init_time/urlopen_measured_time - average over 100 calls
These timings are not comparable with my previous emails because I did some
additional arrangements that should make these times more than usual, but
more stable across tests. Also application is compiled this time.
1) no optimisations
0.1219/0.1649
2) skip CREATE TABLE in SQLTable init
0.0865/0.1261
3) + three steps, described above in this mail.
0.0439/0.0855
4) + lazy tables init in model
This one is application specific. For instance in my application it causes
4 tables out of 16 to be initialised in model and 2 more - in controller. This
is for front page. Other pages will have different figures. Yet, here are
timings:
0.0194/0.0622
5) And finally, for mere comparison, timings of (4) again, but this time with
non-compiled source.
0.0208/0.1216
Some conclusions:
1) always compile your app for production. It saves you about 50% CPU.
2) With very little effort it was possible to speedup web2py app further 2.65
times, dropping page generation times from tenths of second to hundredths.
I hope at least some of that work will find it's way into mainline. I'm
attaching all patches that I used to this mail. These patches are against
quite old web2py - Version 1.61.4 (2009-04-21 10:02:50) (I didn't like how
newer versions looked like so I stick to initially installed version).
> try:
> query=t._create(...)
> finally:
> sql_locker.release()
> Only in this way, any exception (if any) inside t._create(...) can
> show its correct break point, rather than the line "raise e".
yes, and even better would be
with sql_locker:
query=....
just two lines - instead of 5 with the same breakpoint benefit.
On the other hand - it requires doing
from __future__ import with_statement
and that WILL break python 2.4 compartibility once again.
> Because lambda is considered as much slower than built-in functions,
> AFAIK.
--
Sincerely yours
Alexey Nezhdanov
P.S. Let me run some tests now.
Thank you for incorporating these changes. I knew that they couldn't be there
as is - because I knew that I possible breaking other people's apps. But
these changes were too effective to not try.
I need to go now. I'll provide backwards-compartible validators patch when I
return (something like 5 hours from now).
In any case - we collected almost all low-hanging fruits. Any further
optimisation will give gains of range 2-5%, not times anymore.
Except lazy tables which still may give tens of percents. BTW - Massimo, you
said something about SQLLazyTable class. You should implement SQLLazyField
too then. Or may be just wait a day or two until I do deeper profiling.
> Please check out the latest trunk.
>
> Massimo
>
P.S. Here is small fix to is_integer. We forgot about negative numbers:
Also this version is tiny bit faster
is_integer=re.compile('-?[\d]+).match
First of all, let me remind you about is_integer bug. It doesn't allow
negative numbers. Correct code is:
is_integer=re.compile('-?[\d]+).match
Now. I've run more timings, more profiling and more code tweaks.
This time it was Core2 Duo @2.20GHz with 1G RAM. Ubuntu 8.04.
Testing strategy: 20 sets of 1000 GET requests to the main page. Same
application as before. For each set I have average GET request time.
For a group of 20 sets I get minimum average, global average and
maximum average. Also I have average model init time for 20k requests
(no splitting into each set).
1) Upstream sql.py
model 0.01646
min 0.02979
avg 0.03267
max 0.03597
2) Optimised SQLTable.__init__ and SQLField.__init__ (patches
attached, backwards compartible, though you may want to remove my
testing marks)
model 0.01380
min 0.02697
avg 0.03003
max 0.03189
So - about 8% speedup (by avg time) here.
3) Now lazy tables (patch attached).
Since I expect that you will not find any objections to my patches for
(2) here are cumulative results, i.e. optimised __init__s + lazy
tables
...
hmmm. It doesn't improves results significantly anymore. It looks like
these patches are interchangeable. I've run my test three times,
results are:
model 0.01009
min 0.02555
avg 0.0297105
max 0.033360
model 0.01012
min 0.02369
avg 0.02983
max 0.03340
model 0.00966
min 0.02415
avg 0.02850
max 0.03208
So it makes some improvement about 1%-5% but somehow it also makes
testing results more disperse. Do not know if it worths including.
Massimo, your opinion?
4) Also I've tried to further speedup validator generation w/o losing
backwards compartibility, but improvement was about 1% and below the
margin of error (even less noticeable than with lazy tables).
Revision 875:
model 0.04167
min 0.05579
avg 0.0559455
max 0.05647
Revision 882:
model 0.01646
min 0.02979
avg 0.03267
max 0.03597
882+optimised inits:
model 0.01380
min 0.02697
avg 0.03003
max 0.03189
882+optimised inits + lazy tables:
model 0.01009
min 0.02555
avg 0.0297105
max 0.033360
model 0.01012
min 0.02369
avg 0.02983
max 0.03340
model 0.00966
min 0.02415
avg 0.02850
max 0.03208
--
Sincerely yours
Alexey Nezhdanov
> Massimo
Will this speedup also has an effect on GAE? IMO one uploads no .pyc files?
Markus
Long story short - I got too much noise into my results. It seems that trunk
currently only about 40% faster than stable, not 70%.
As it appears - it is very important if I have any print statements in my
application. I had couple in the model, then I commented them out.
Somehow that decreased(!!!) perfomance by about 1/3.
I'll not be doing any more work today. Have to work out new plan. Probably we
won't hit 100% barrier :(
I'm sorry guys. So much excitement...