performance issue time to first byte

BlueShadow

unread,

Apr 9, 2013, 5:04:58 PM4/9/13

to web...@googlegroups.com

Hi I'm trying to improve my pageloadtimes. And so far I'm doing pretty good. I decreased the number of quests by merging some css files and some javascript. I added expire headers to pretty much every file I serve. But the most time it takes to load a page is the time to first byte.
I checked my site with webpagetest.org it takes 0.747s for the first byte. and quite a few requests take 500 ms or more before the first byte flows. Is there any way to find out why its taking so long? And a way to improve tose numbers.
thanks

BlueShadow

unread,

Apr 10, 2013, 7:59:27 AM4/10/13

to web...@googlegroups.com

I played a little with mod_pagespeed which decreases everything but the ttfb.
Here is my waterfall diagram:
it doesn't look like a waterfall at all more like a drippling creek :(
any suggestions how I can get rid of most of those green bars?

Niphlod

unread,

Apr 10, 2013, 8:56:39 AM4/10/13

to web...@googlegroups.com

just a note....

http://blog.cloudflare.com/ttfb-time-to-first-byte-considered-meaningles

Anthony

unread,

Apr 10, 2013, 9:23:47 AM4/10/13

to web...@googlegroups.com

But in the articule, ttfb was trivial (particularly relative to download time). In the above waterfall, the ttfb values are large (in absolute magnitude as well as relative to the full request time).

Anthony

Niphlod

unread,

Apr 10, 2013, 9:31:35 AM4/10/13

to web...@googlegroups.com

I'm keen to think that TTFB in his case is due to buffering (or an inbetween proxy).

Overexxagerating, either the user takes 4 seconds to download the full page (which is transmitted as soon as the webserver outputs something) or he takes 2 seconds waiting for the webserver to buffer (TTFB) and 2 seconds to download "at full speed" the content (cause it's buffered yet).
User will have the page after 4 seconds, no matter what.

Kevin Bethke

unread,

Apr 10, 2013, 9:32:45 AM4/10/13

to web...@googlegroups.com

Thats exactly what i'm saying anthony. Those total times are enormous considering the whole page including images is now a little under 300kb. And i actually dont care if its called ttfb or something else i just want to have quick response times.

--

---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/Suuc7DdjDn0/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Niphlod

unread,

Apr 10, 2013, 9:33:52 AM4/10/13

to web...@googlegroups.com

edit: the real problem on that graph is that there's no concurrency: I don't know if it's a feature of webpagetest.org, but apparently there's no new connection until the previous one is finished.

Kevin Bethke

unread,

Apr 10, 2013, 9:43:24 AM4/10/13

to web...@googlegroups.com

There are 6 files which they start to load at the same time. Or what do you did you mean?

Am 10.04.2013 15:34 schrieb "Niphlod" <nip...@gmail.com>:

edit: the real problem on that graph is that there's no concurrency: I don't know if it's a feature of webpagetest.org, but apparently there's no new connection until the previous one is finished.

Niphlod

unread,

Apr 10, 2013, 9:49:08 AM4/10/13

to web...@googlegroups.com

probably nothing, just check that they are not sequential. probably is just how they draw their graphs.
On the TTFB note: did you try timing it without gzip compression turned on, just to check ?

Kevin Bethke

unread,

Apr 10, 2013, 9:56:19 AM4/10/13

to web...@googlegroups.com

Before i had the mod-deflate i had times around 3.5 secs the ttfbs were about the same +- 50 ms

Am 10.04.2013 15:49 schrieb "Niphlod" <nip...@gmail.com>:

probably nothing, just check that they are not sequential. probably is just how they draw their graphs.
On the TTFB note: did you try timing it without gzip compression turned on, just to check ?

Ricardo Pedroso

unread,

Apr 10, 2013, 10:01:01 AM4/10/13

to web...@googlegroups.com

As always without knowing nothing about your application is hard to give more precise answers.

But, first of all, I would do a test in localhost to have a reference unit.

For now let's keep the network variable out.

If you have a shell account in your server do:

curl -o /dev/null -s -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total} \n" http://localhost/

be sure that curl is hitting you homepage and not some redirect.

In the same host/web2py you can also create a minimal application with nothing but a controller/default.py that only as:

def index():

return 'OK'

and then time it with curl and webpagetest.

Note: When timing with curl do it 3 or 4 time consecutively to the same url

Know you have some measures to go on... check this thread if not already:

https://groups.google.com/d/topic/web2py/JgkMaQ_VlXs/discussion

Ricardo

--

---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.

BlueShadow

unread,

Apr 10, 2013, 11:30:10 AM4/10/13

to web...@googlegroups.com

Thanks Ricardo for that idea with curl. I did that 5 times the ttfb is almost the total time. the total time having 0 to 1 ms more than the ttfb. the average time is 201 ms which seems to me pretty big considering its a localhost request.
when I did that test application(welcome app) I got an average about 80 ms which seems pretty slow too (localhost no db requests...).
when I do the test with webpagetest I get around 300 ms for the TTFB, for each file the green bar(ttfb for that file is ~160 ms

BlueShadow

unread,

Apr 10, 2013, 11:41:00 AM4/10/13

to web...@googlegroups.com

migrations are false and lazy tables are true.
I tried to do something with cache.ram (for another app) which had no effekt at all perfhaps I did it wrong.
consider moving code to (imported) modules #I started on it when I realised I was using request.folder in my sitemap funktion which I currently have in a module (which isn't that smart I know thats one of the first things I will change)

I don't know how to deal with those two points:
5. Add session.forget for methods which don't use the session object. #when I dont use session.something I can add the session.forget to the beginning of the controler?

6. Enable connection pooling depending on the database. #no idea what to do here

haven't tried any of those:

compile the app #haven't tried

controller-specific models #I only use two controlers

database indexes #don't know what it is

Anthony

unread,

Apr 10, 2013, 12:01:06 PM4/10/13

to web...@googlegroups.com

Your waterfall showed only static files, which are served before web2py hits any of your application code, so most of those optimizations shouldn't have an impact. Are the high TTFB values only for the static files? Is web2py even serving the static files, or is the web server (e.g., Apache) doing it directly?

Anthony

LightDot

unread,

Apr 10, 2013, 12:05:49 PM4/10/13

to web...@googlegroups.com

First thing I'd do in this case is compile the app, then test again. ;) Don't try other things before compiling.

Next thing I'd do is look into caching, it's usually very much worth it. Depends on your app, though.

What db are you using? How is the connection pooling set now? You can find out what db indexes are on google, that would be good thing to look into, if not for this one, then for your next project.

Ricardo Pedroso

unread,

Apr 10, 2013, 2:10:30 PM4/10/13

to web...@googlegroups.com

On Wed, Apr 10, 2013 at 4:30 PM, BlueShadow <kevin....@gmail.com> wrote:

when I did that test application(welcome app) I got an average about 80 ms which seems pretty slow too (localhost no db requests...).

Did you create empty application or you reuse the welcome application?

Either way, your server seems to be slow, on my laptop with the minimal app I get 5/6ms and

with the welcome app I get 40/45ms

Ricardo

Ricardo Pedroso

unread,

Apr 10, 2013, 2:16:02 PM4/10/13

to web...@googlegroups.com

On Wed, Apr 10, 2013 at 4:41 PM, BlueShadow <kevin....@gmail.com> wrote:

migrations are false and lazy tables are true.
I tried to do something with cache.ram (for another app) which had no effekt at all perfhaps I did it wrong.
consider moving code to (imported) modules #I started on it when I realised I was using request.folder in my sitemap funktion which I currently have in a module (which isn't that smart I know thats one of the first things I will change)

I dont see nothing wrong in using request.folder, in a module you will use

from gluon import current

current.request.folder

I don't know how to deal with those two points:
5. Add session.forget for methods which don't use the session object. #when I dont use session.something I can add the session.forget to the beginning of the controler?

In your homepage you are serving 10 images, at least, in fast_download controller, here it's safe

to put session.forget() if not already. It will allow concurrent request from the same browser.

6. Enable connection pooling depending on the database. #no idea what to do here

pool_size - it's an argument to DAL

db = DAL(...., pool_size=x)

haven't tried any of those:
compile the app #haven't tried

As LightDot said, do it.

Ricardo

Anthony

unread,

Apr 10, 2013, 3:34:33 PM4/10/13

to web...@googlegroups.com

In your homepage you are serving 10 images, at least, in fast_download controller, here it's safe

to put session.forget() if not already. It will allow concurrent request from the same browser.

Good point. If those images are being served dynamically from a controller because they were uploaded (as opposed to being static assets), then the session file locking will cause each request to be handled serially (only a problem if using file based sessions -- no locking for db or cookie based sessions). In that case, you can add session.forget(response) to the download function (be sure to pass in "response", otherwise, it won't actually unlock the file and will be no help). This is not necessary with static files, as they are served before the session is connected or the app code is executed.

Anthony

Jonathan Lundell

unread,

Apr 10, 2013, 3:39:14 PM4/10/13

to web...@googlegroups.com

Also (with dynamically served images), if the images are cacheable by the browser, you probably need to set the appropriate cache headers on the response.

BlueShadow

unread,

Apr 10, 2013, 3:42:05 PM4/10/13

to

thanks for all the suggestions I'm trying to work at one after the other:
kompiling it made it a lot worse: http://www.webpagetest.org/result/130410_9P_14QQ/
the results now vary a lot from a total of 4 secs to over 11 secs. how do I undo the compile. or what did I do wrong?

BlueShadow

unread,

Apr 10, 2013, 3:42:49 PM4/10/13

to

the images are cached with Expire header (access + 1 month) which works quite well.

Ricardo Pedroso

unread,

Apr 10, 2013, 5:14:24 PM4/10/13

to web...@googlegroups.com

On Wed, Apr 10, 2013 at 8:39 PM, BlueShadow <kevin....@gmail.com> wrote:
> thanks for all the suggestions I'm trying to work at one after the other:
> kompiling it made it a lot worse:
> http://www.webpagetest.org/result/130410_9P_14QQ/
> the results now vary a lot from a total of 4 secs to over 11 secs.

Very weird... I never saw this happened.

If you want I'm willing to try to give you a help on this, just send
me an email off-list.
Optimizations are a subject that I like.

Ricardo

Niphlod

unread,

Apr 10, 2013, 5:17:39 PM4/10/13

to web...@googlegroups.com

let us know what you find : I just tested my production site and (although it runs with uwsgi+nginx and is behind SSL) the max time of TTFB don't go over 150ms.

LightDot

unread,

Apr 10, 2013, 5:39:58 PM4/10/13

to web...@googlegroups.com

Well, this is completely unexpected. I really can't think of what would cause the compiled code to run slower than the uncompiled one. Intriguing!!

Anyway, you probably used the admin to compile the app? There is a menu entry "Remove Compiled" there, same place as the "Compile" was.

On Wednesday, April 10, 2013 9:39:55 PM UTC+2, BlueShadow wrote:

thanks for all the suggestions I'm trying to work at one after the other:
kompiling it made it a lot worse: http://www.webpagetest.org/result/130410_9P_14QQ/

BlueShadow

unread,

Apr 10, 2013, 5:44:17 PM4/10/13

to web...@googlegroups.com

yeah I found it. right now pretty much everything is strange to me. but I continue to work on it.

LightDot

unread,

Apr 10, 2013, 5:55:36 PM4/10/13

to web...@googlegroups.com

I tested a couple of my sites, to get some feel as to what to expect. TTFB from the first run, tested across half of Europe:

1st site: 57ms to 74ms
2nd site: 57ms to 202ms

First one is a corporate style web page, the second one has a lot of content, thumbnails, etc. Both using apache + mod_wsgi in daemon mode, postgresql, compiled, cache, etc. etc.

BlueShadow

unread,

Apr 10, 2013, 6:02:14 PM4/10/13

to web...@googlegroups.com

about that time is my aim.

Derek

unread,

Apr 11, 2013, 1:10:05 AM4/11/13

to web...@googlegroups.com

my performance fluctuates depending on the network. i have an Internal site that is a basic wsgi site, and i couldn't figure out why i was getting 200ms times. then one day i worked late, everyone else had gone home. all of a sudden i was getting 10ms. so it may have nothing to do with your configuration. try a basic wsgi app,see if that goes faster.

Derek

unread,

Apr 11, 2013, 1:10:23 AM4/11/13

to web...@googlegroups.com

BlueShadow

unread,

Apr 14, 2013, 7:34:39 PM4/14/13

to

thanks to the awesome help of Ricardo Pedroso my performance increased by a factor of 100 (at least).

Well what were the problems:

first of all my vserver cpu is very slow. (600MHz 1 Core) Which is just rediculous in those times my mobile phone has a faster processor.

apache-modwsgi-web2py is pretty processor intensive especially if you ad a mod_pagespeed.

what did we do (decending order of performance increase):

we enabled a lot of caching,

removed code from models to modules

splitted controlers (from over 20 funktions to less than 7 each)

compiled application

used html instead of python helper functions

The last thing we did was switching from apache to nginx: which is just awesome:

just to show the diffrence I list some of our test results:

after optimization in apache:

apache bench:

ab -n 100 -c 2 domain.com

This is ApacheBench, Version 2.3

...
Server Software: Apache/2.2.22

Document Path: /
Document Length: 27429 bytes

Concurrency Level: 2
Time taken for tests: 17.198 seconds
Complete requests: 100
...

Requests per second: 5.81 [#/sec] (mean)
Time per request: 343.961 [ms] (mean)
Time per request: 171.981 [ms] (mean, across all concurrent requests)
Transfer rate: 158.22 [Kbytes/sec] received

Connection Times (ms)
                                min        mean    [+/-sd]                 median max
Connect:             0             0            0.0          0             0
Processing:        138         343         81.8       332        639
Waiting:               138         341         82.3       328         638
Total:                    138         343         81.8       332         639

Percentage of the requests served within a certain time (ms)
50% 332
66% 358
75% 365
80% 396
90% 456
95% 493
98% 634
99% 639
100% 639 (longest request)

with nginx(and yes the size is smaller because I compressed the pngs)

ab -n 100 -c 2 domain.com

...

Server Software: nginx/1.1.19

...

Document Length: 25420 bytes

Concurrency Level: 2

Time taken for tests: 10.821 seconds

Complete requests: 100

...

Requests per second: 9.24 [#/sec] (mean)

Time per request: 216.427 [ms] (mean)

Time per request: 108.214 [ms] (mean, across all concurrent requests)

Transfer rate: 233.14 [Kbytes/sec] received

Connection Times (ms)

min mean [+/-sd] median max

Connect: 0 0 0.0 0 0

Processing: 25 215 42.9 207 387

Waiting: 25 214 42.6 207 387

Total: 25 215 42.9 207 387

Percentage of the requests served within a certain time (ms)

50% 207

66% 210

75% 217

80% 265

90% 272

95% 278

98% 286

99% 387

100% 387 (longest request)

webpagetest.org gives now results which are not that good but I suspect that it depends on their server cpu utilization a lot. and after I tried some other pages which I knew where good before I come to the conclusion that this site isn't reliable at all at the moment. the compilation which made things worse I mentioned earlier is probably a result of that too.

loadimpact.com tells me that the server can handle 100 users at the same time before the server starts to fail. So a pretty good improvement. unfurtuanatly I don't have results from before from loadimpact.

I hope it helps some people which are looking for more performance especially with slow cpus

Derek

unread,

Apr 15, 2013, 2:55:27 PM4/15/13

to web...@googlegroups.com

That looks like a nice increase. You might be able to get more than 100 users by using an evented wsgi instance. You can launch web2py with 'anyserver.py' using gevent, gunicorn, mongrel2. if using gevent, you could try the monkey.patch_all(). Also, with gevent, I like to add in sleep(0) between database access and serialization, and every row of serialization. Also, you can look at adding the gevent backdoor, which is useful if you want to troubleshoot while it's in development.

Reply all

Reply to author

Forward