1244187446.32 model start
1244187446.62 model stop
0.3 second just to set up the model! I can live with 0.05 for it, may be even
0.1, but 0.3 for _each_ GET or POST request is a bit too much, don't you
think?
That is for not too complex model - 17 tables, averaging 8.6 SQLFields per
one. On another web2py project it takes 0.38...0.42 second each time :(
I tried compiling my app and measuring again:
1244187625.31 model start
1244187625.69 model stop
Not any better. In fact, it's even worse, but since results vary from run to
run I suspect that it is just the same perfomance.
Massimo, as I know you've been working on new model for some time already.
Is there any hope of having a faster model? I suspect more lazy evaluation
should do the magic, but I didn't do any research yet.
Frankly speaking when I first discovered the fact that web2py always
_executes_ model, controller, view, I thought that it may be a perfomance
hog. Until I actually did that check I thought that it will execute db.py
each time it changes on-disk and then just keep built structures somewhere
around, probably pickled. May be it is still possible to use that approach to
some extent?
Or may be I am just completely missing the point. Please comment.
--
Sincerely yours
Alexey Nezhdanov
> 2) is_integer is a fast call, but with 1.1k (!) calls ...
Replaced it with my own version:
integer_pat=re.compile('[0-9]+$')
is_integer=lambda x: integer_pat.match(x)
it's about 2.3 times faster. C version would be even better.
> 3) sqlhtml_validators is an obvious bug... 8/9 of work is wasted.
> ... I suspect that dictionary building may be moved into module namespace
did that with some tweaks (same validators get reused). Dangerous approach
may introduce hidden dataflow, but that could be prevented by somehow making
these validators read-only.
Got another 9.5% model time speedup. (0.096s vs 0.106s). Though this figures
get more and more meaningless as model init time reduces it's share in total
page generation time. Probably I'll just switch to time measuring of
urllib.urlopen().
...OK, switched. Here are stats. Each line is in the format
model_init_time/urlopen_measured_time - average over 100 calls
These timings are not comparable with my previous emails because I did some
additional arrangements that should make these times more than usual, but
more stable across tests. Also application is compiled this time.
1) no optimisations
0.1219/0.1649
2) skip CREATE TABLE in SQLTable init
0.0865/0.1261
3) + three steps, described above in this mail.
0.0439/0.0855
4) + lazy tables init in model
This one is application specific. For instance in my application it causes
4 tables out of 16 to be initialised in model and 2 more - in controller. This
is for front page. Other pages will have different figures. Yet, here are
timings:
0.0194/0.0622
5) And finally, for mere comparison, timings of (4) again, but this time with
non-compiled source.
0.0208/0.1216
Some conclusions:
1) always compile your app for production. It saves you about 50% CPU.
2) With very little effort it was possible to speedup web2py app further 2.65
times, dropping page generation times from tenths of second to hundredths.
I hope at least some of that work will find it's way into mainline. I'm
attaching all patches that I used to this mail. These patches are against
quite old web2py - Version 1.61.4 (2009-04-21 10:02:50) (I didn't like how
newer versions looked like so I stick to initially installed version).
> try:
> query=t._create(...)
> finally:
> sql_locker.release()
> Only in this way, any exception (if any) inside t._create(...) can
> show its correct break point, rather than the line "raise e".
yes, and even better would be
with sql_locker:
query=....
just two lines - instead of 5 with the same breakpoint benefit.
On the other hand - it requires doing
from __future__ import with_statement
and that WILL break python 2.4 compartibility once again.
> Because lambda is considered as much slower than built-in functions,
> AFAIK.
--
Sincerely yours
Alexey Nezhdanov
P.S. Let me run some tests now.
Thank you for incorporating these changes. I knew that they couldn't be there
as is - because I knew that I possible breaking other people's apps. But
these changes were too effective to not try.
I need to go now. I'll provide backwards-compartible validators patch when I
return (something like 5 hours from now).
In any case - we collected almost all low-hanging fruits. Any further
optimisation will give gains of range 2-5%, not times anymore.
Except lazy tables which still may give tens of percents. BTW - Massimo, you
said something about SQLLazyTable class. You should implement SQLLazyField
too then. Or may be just wait a day or two until I do deeper profiling.
> Please check out the latest trunk.
>
> Massimo
>
P.S. Here is small fix to is_integer. We forgot about negative numbers:
Also this version is tiny bit faster
is_integer=re.compile('-?[\d]+).match
First of all, let me remind you about is_integer bug. It doesn't allow
negative numbers. Correct code is:
is_integer=re.compile('-?[\d]+).match
Now. I've run more timings, more profiling and more code tweaks.
This time it was Core2 Duo @2.20GHz with 1G RAM. Ubuntu 8.04.
Testing strategy: 20 sets of 1000 GET requests to the main page. Same
application as before. For each set I have average GET request time.
For a group of 20 sets I get minimum average, global average and
maximum average. Also I have average model init time for 20k requests
(no splitting into each set).
1) Upstream sql.py
model 0.01646
min 0.02979
avg 0.03267
max 0.03597
2) Optimised SQLTable.__init__ and SQLField.__init__ (patches
attached, backwards compartible, though you may want to remove my
testing marks)
model 0.01380
min 0.02697
avg 0.03003
max 0.03189
So - about 8% speedup (by avg time) here.
3) Now lazy tables (patch attached).
Since I expect that you will not find any objections to my patches for
(2) here are cumulative results, i.e. optimised __init__s + lazy
tables
...
hmmm. It doesn't improves results significantly anymore. It looks like
these patches are interchangeable. I've run my test three times,
results are:
model 0.01009
min 0.02555
avg 0.0297105
max 0.033360
model 0.01012
min 0.02369
avg 0.02983
max 0.03340
model 0.00966
min 0.02415
avg 0.02850
max 0.03208
So it makes some improvement about 1%-5% but somehow it also makes
testing results more disperse. Do not know if it worths including.
Massimo, your opinion?
4) Also I've tried to further speedup validator generation w/o losing
backwards compartibility, but improvement was about 1% and below the
margin of error (even less noticeable than with lazy tables).
Revision 875:
model 0.04167
min 0.05579
avg 0.0559455
max 0.05647
Revision 882:
model 0.01646
min 0.02979
avg 0.03267
max 0.03597
882+optimised inits:
model 0.01380
min 0.02697
avg 0.03003
max 0.03189
882+optimised inits + lazy tables:
model 0.01009
min 0.02555
avg 0.0297105
max 0.033360
model 0.01012
min 0.02369
avg 0.02983
max 0.03340
model 0.00966
min 0.02415
avg 0.02850
max 0.03208
--
Sincerely yours
Alexey Nezhdanov
> Massimo
Will this speedup also has an effect on GAE? IMO one uploads no .pyc files?
Markus
Long story short - I got too much noise into my results. It seems that trunk
currently only about 40% faster than stable, not 70%.
As it appears - it is very important if I have any print statements in my
application. I had couple in the model, then I commented them out.
Somehow that decreased(!!!) perfomance by about 1/3.
I'll not be doing any more work today. Have to work out new plan. Probably we
won't hit 100% barrier :(
I'm sorry guys. So much excitement...
Joe Barnhart wrote:
> Got it. Please do not call it "is_integer". Names are important.
> Having a name that implies one thing but does another is a big no-no.
>
>
How about "is_id"?
--
PhilK
Ubuntu 8.04 installed on ASUS A6Q00Km (Turion 64 1.6GHz, SiS motherboard, 1.5G
RAM).
> > And as Massimo pointed out - we don't really check for an int - we check
> > for valid id. Function should be remamed, yes.
>
> Right, it's more like is_convertible_to_int. Now that we are more
> clear about what we're looking for, it sounds exactly like
> string.isdigit() to me.
yes. I thought that there should be such function but failed to find it. Thank
you for pointing this out.
> bear@black:/tmp$ python -m timeit "s=[str(i) for i in range(10000)]"
> "import re" "integer_pat=re.compile('[+-]?[0-9]{1,15}$')"
> "is_integer=integer_pat.match"
> 100 loops, best of 3: 3.64 msec per loop
Hmm? No loop...
> bear@black:/tmp$ python -m timeit "s=[str(i) for i in range(10000)]"
> "import re" "integer_pat=re.compile('[+-]?[0-9]{1,15}$')"
> "is_integer=integer_pat.match" "for i in s:" " is_integer(i)"
> 100 loops, best of 3: 9.87 msec per loop
roughly the same time here.
> bear@black:/tmp$ python -m timeit "s=[str(i) for i in range(10000)]"
> "for i in s:" " i.isdigit()"
> 100 loops, best of 3: 4.63 msec per loop
> Which gives a 6.2x (!) faster result.
I'm sorry, how do you get 6.2x?
9.87/ 4.63 =2.13 here...
And again - you still doing a lot of excess work. First, you populating the
list, then you iterate over it. You don't need to do either. Look at my
scripts attached.
Anyways, my testing results are as follows:
I get:
9.84s this time for regex-based 4M calls,
7.70s for function based on 'isdigit'
7.68s for lambda based on 'isdigit'
6.24s for direct 'isdigit' call.
So, Massimo, I think it worths replacing globally each
is_integer(str(x))
to
str(x).isdigit()
If you still prefer having is_integer function then here is it:
is_integer = lambda x: str(x).isdigit()
In this case please rename it to something like is_id.
> > > bear@black:/tmp$ python -m timeit "s=[str(i) for i in range(10000)]"
> > > "import re" "integer_pat=re.compile('[+-]?[0-9]{1,15}$')"
> > > "is_integer=integer_pat.match"
> > > 100 loops, best of 3: 3.64 msec per loop
> >
> > Hmm? No loop...
>
> Timeit loops, it is looping over your code. So, the above means timeit
> ran the code 100 times.
>
> > I'm sorry, how do you get 6.2x?
> > 9.87/ 4.63 =2.13 here...
>
> Subtract the init (3.6msec) time from both. just wanted to avoid the
> extra cost of the import. However, even when doing it 'more right' I'm
> getting a significantly larger number (although unlike my firrst
> example this does not take into account the varying length of
> strings).
I see. I'm not sure if we are allowed to do that (substraction) though.
> blinky:~# python -m timeit -s "import re" -s "integer_pat=re.compile
> ('[0-9]+$')" -s "is_integer=integer_pat.match" "is_integer('123')"
> 1000000 loops, best of 3: 0.574 usec per loop
> blinky:~# python -m timeit "'123'.isdigit()"
> 10000000 loops, best of 3: 0.141 usec per loop
>
> The problem of your methodology (timing a total run) is that you're
> averaging out results and polluting the results with load noise from
> the OS and other apps. Timeit (among other neat tricks) counts the
> timing of the *BEST* iterations. If one was slower than the other,
> that means that the other got interrupted somewhere (disk, app,
> multitaskting, etc).
>
> See http://docs.python.org/library/timeit.html , it's quite useful.
I was wondering during my previous testing if I should use only 'minimal'
time. Thank you for the link, will read.