caching questions

172 views
Skip to first unread message

Pierre

unread,
Apr 25, 2016, 5:38:19 AM4/25/16
to web2py-users
Hi everyone,

I discovered this in the book:


"The select method has an optional cacheable argument, normally set to False. When cacheable=True the resulting Rows is serializable but The Rows lack update_record and delete_record methods.

If you do not need these methods you can speed up selects a lot by setting the cacheable attribute:


rows = db(query).select(cacheable=True)"

what's the difference (what makes it faster) between a normal select and the above select ?

given the dynamic nature of a database app is the caching mechanism reserved for static/pseudo-static content ?



Anthony

unread,
Apr 25, 2016, 8:19:32 AM4/25/16
to web2py-users
On Monday, April 25, 2016 at 5:38:19 AM UTC-4, Pierre wrote:
Hi everyone,

I discovered this in the book:


"The select method has an optional cacheable argument, normally set to False. When cacheable=True the resulting Rows is serializable but The Rows lack update_record and delete_record methods.

If you do not need these methods you can speed up selects a lot by setting the cacheable attribute:


rows = db(query).select(cacheable=True)"

what's the difference (what makes it faster) between a normal select and the above select ?

It just takes less time to construct each Row object because it doesn't contain those extra attributes.
 
given the dynamic nature of a database app is the caching mechanism reserved for static/pseudo-static content ?

The "cacheable" argument doesn't actually do any caching, it just makes it so the resulting Rows object itself is cacheable (should you choose to cache it). It can still be useful to set cacheable=True for the increased speed, even if you aren't doing any caching.

Anthony 

 

Pierre

unread,
Apr 25, 2016, 10:09:38 AM4/25/16
to web2py-users

what I was wondering is the type of data of a dynamic app demanding caching ?
what typically needs caching in such a context ?

Anthony

unread,
Apr 25, 2016, 11:59:17 AM4/25/16
to web2py-users
On Monday, April 25, 2016 at 10:09:38 AM UTC-4, Pierre wrote:

what I was wondering is the type of data of a dynamic app demanding caching ?
what typically needs caching in such a context ?

It depends on the nature of the data. It can be beneficial to cache any data that changes significantly less frequently than the typical interval between requests for those data. You might also want to cache things that are expensive to compute/retrieve, even if the cached version will often be a little stale. For example, if you are displaying a Twitter feed, you might want to cache the feed for 15-30 minutes so you don't have to do a slow HTTP request to a third-party API on every request.

Anthony

Pierre

unread,
Apr 25, 2016, 2:37:01 PM4/25/16
to web2py-users
suppose I would allow every user to cache some of its individualized selects. wouldn't that be too expensive in terms of consumed cache.disk/cache.ram ?
Perhaps caching is more suitable for shared data ?

Anthony

unread,
Apr 25, 2016, 3:07:09 PM4/25/16
to web2py-users
On Monday, April 25, 2016 at 2:37:01 PM UTC-4, Pierre wrote:
suppose I would allow every user to cache some of its individualized selects. wouldn't that be too expensive in terms of consumed cache.disk/cache.ram ?

Perhaps. Caching highly individualized selects also won't have much benefit, as you are not likely to need to serve the cached data many times. Note, you can also periodically clear cached items.

Anthony

Pierre

unread,
Apr 26, 2016, 11:24:52 AM4/26/16
to web2py-users
Anthony, you wrote this some time ago :

>Are you using nginx/uwsgi? If so, I believe cache.ram would not be shared across the different uwsgi worker processes. You might consider switching to the >Redis cache.
>Anthony

I know nothing about the Redis cache. Wikipedia says this is NOSQL and web2py is  SQL so  is this 'cache-business' science fiction ?

Richard Vézina

unread,
Apr 26, 2016, 1:34:55 PM4/26/16
to web2py-users
If you cache db().select() results cache.ram should be allrigths as every user/session can be operate in isolation without issue... This wouldn't be the case for things that need to be up to date for all the users at the same time, global variables for instance even if they are the result of bad dessing in the first place they are sometimes required... Though you may not realize great performance improvement with cache.ram if user (the same user) don't call your select often... So depend of what the results of the select contains you may be better setup redis cache for greater performance improvement and better memory usage (as single select instance requires less memory than session cache.ram for each user).

If you consider redis cache you have to set a proper time expiration so you improve performance and keep accessed in a relative meaningful up to date state considering the criticality of these data...

Richard


--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anthony

unread,
Apr 26, 2016, 2:23:45 PM4/26/16
to web...@googlegroups.com
No, it's real science. ;-)

First, web2py is not only SQL. The DAL, on the other hand, primarily supports SQL data stores, though it also supports the Google App Engine data store as well as MongoDB. In any case, nothing stops you from using any NOSQL data store with web2py (though in some cases, you may not be able to use the DAL with it).

Second, the above has nothing to do with caching -- we can implement any caching mechanism desired. web2py has supported Memcached for a very long time, and Redis support has been available for a while as well.

Anthony

Pierre

unread,
Apr 27, 2016, 7:00:53 AM4/27/16
to web2py-users
I'm impressed Anthony...

well all of these - memcache-redis - seem to require lot's of technicality and probably to set-up your own deployment machine. I am not very enthusiastic about that option since the internet if full of endless  technical setup-config issues. Given what's being said, I see only two kinds of page of my app I could cache : general informations and maybe forms (no db().select()) all shared and uniform datas. There should be a simple way to achieve such a simple thing whatever the platform: pythonanywhere,vs......Is there one ?



"science fiction is so real that reality became science fiction"

Anthony

unread,
Apr 27, 2016, 8:04:07 AM4/27/16
to web2py-users
On Wednesday, April 27, 2016 at 7:00:53 AM UTC-4, Pierre wrote:
I'm impressed Anthony...

well all of these - memcache-redis - seem to require lot's of technicality and probably to set-up your own deployment machine. I am not very enthusiastic about that option since the internet if full of endless  technical setup-config issues. Given what's being said, I see only two kinds of page of my app I could cache : general informations and maybe forms (no db().select()) all shared and uniform datas.

I'm not sure what you mean by caching forms, but you probably don't want to do that (at least if we're talking about web2py forms, which each include a unique hidden _formkey field to protect against CSRF attacks).
 
There should be a simple way to achieve such a simple thing whatever the platform: pythonanywhere,vs......Is there one ?

You can just use cache.ram. If running uWSGI with multiple processes, you will have a separate cache for each, but that won't necessarily be a problem (just not as efficient as it could be). You could also try cache.disk and do some testing to see how it impacts performance.

More generally, caching is something you do to improve efficiency, which becomes important as you start to have lots of traffic. But if you've got enough traffic where efficiency becomes so important, you should probably be willing (and hopefully able) to put in some extra effort to set up something like Memcached or Redis. Until you hit that point, don't worry about it.

Anthony

Niphlod

unread,
Apr 27, 2016, 4:44:14 PM4/27/16
to web2py-users
on these "philosophical" notes, boring for someone, exciting for yours truly, the are a few other pointers (just to point out that the concept of "caching" isn't really something to discard beforehand)...be aware that on "caching" there are probably thousands of posts and pages of books written on the matter.

As everything, it's a process that has "steps". Let's spend 10 seconds of silence on the sentence "premature optimization is the root of all evil". And another 10 on "There are only two hard things in Computer Science: cache invalidation and naming things". Let those sink in.

Ready? Let's go.

Step #0 : assessment

Consider an app that has a page that shows the records of a table that YOU KNOW (as you are the developer) gets refreshed once a day (e.g. the temperature recorded for LA for the previous day)
Or a page that shows the content of a row that never gets updated (e.g. a blogpost)
Given that the single most expensive operation on a traditional webapp is the database (just think to web2py requesting the data, the database reading it from disk, preparing, sending it over the wire, web2py receiving it) developers should always find a way to spend the least possible time on those steps.
Optimizing queries (and/or database tuning, normalization, etc). Reducing the number of needed queries to render a page. Requesting just the amount of data needed (paging). Those are HUUUUGE topics (again, zillion of posts, books, years of expertise to master, etc etc etc). But - of course - not having to issue a query at all shortcircuits all of the above!
Still at step #0, as users come by your app, every request made to those pages triggers the roundtrip to the database back and forth, always for the same data, over and over.
Granted, 50 reqs/sec won't certainly hurt performances, but once they get to 500, it'll become pretty obvious that "a" shortcircuit could save LOTS of processing power.
When you face the problem of scaling to serve more concurrent requests, either you do spawning more processes, or adding servers.
Adding frontend servers is easy: the data is transactionally consistent as long as you have a single database instance. You put a load balancer in front of frontends (it's relatively inexpensive) and go on.
Scaling databases adding servers is NEVER easy (againt, the interwebs and libraries are full of evidences, and a big part of nosql "shiny" features are indeed horizontal scaling, with pros and cons).

Step #1: local caching
Back to your app without cache...wouldn't be better to avoid calling the db for the same data 500 times per second ? Sure. Cache it.

Assuming you cache the database results, web2py still needs to compute the view, but that step, in regards of the shortcircuit, is orders of magnitude less expensive. (yes, if you cache views, you're sidestepping web2py's rendering too, but let's keep as less variables as possible for the sake of this discussion)
And there you are, at the first iteration of step #1, using 1MB of RAM more to avoid hitting the database. 
Cache it for just an hour, do the math on the simple example of 50 req/s, and you saved 50*60*60 - 1 = 179999 roundtrips. You can use the extra savings to do 179999 roundtrips you actually NEED to in other places of your app, and having the same performances, without additional costs
Whoa! 

Step #300
You start caching here and there, and you use 500MB or RAM. You're using cache.ram, everything is super-speedy, no third-parties, just web2py features. 
Now, you need to serve 100 req/s, you spawn another process.... whoopsie .... 1GB of RAM. Or, another server, 500MB on the first and 500MB on the second... 500 are "clearly" wasted, as they are a copy of the "original" 500. 
And the second process (or server) still needs to do roundtrips if its local cache doesn't contain your already-cached-in-another-place query.
Also, something else "crepts in"...as your apps grows, you start loosing track of what you cached, when you cached, for how long it's needed to be cached... a record fetched on the first server at 8:00AM could be updated in the meantime and fetched on the second server (because it isn't in its local cache) at 8:02AM...you're effectively serving from cache different versions!

#Step 301
To sidestep both issues, you use redis or memcached: they sit outside of the web2py process and consume only 500MB. 
For one, two, a zillion processes. And they are a single source of truth. And they are as speedy as cache.ram (or at least, they have the same magnitude).

Richard Vézina

unread,
Apr 28, 2016, 10:28:28 AM4/28/16
to web2py-users
On Wed, Apr 27, 2016 at 4:44 PM, Niphlod <nip...@gmail.com> wrote:
on these "philosophical" notes, boring for someone, exciting for yours truly, the are a few other pointers (just to point out that the concept of "caching" isn't really something to discard beforehand)...be aware that on "caching" there are probably thousands of posts and pages of books written on the matter.

As everything, it's a process that has "steps". Let's spend 10 seconds of silence on the sentence "premature optimization is the root of all evil". And another 10 on "There are only two hard things in Computer Science: cache invalidation and naming things". Let those sink in.

Ready? Let's go.

Step #0 : assessment

Consider an app that has a page that shows the records of a table that YOU KNOW (as you are the developer) gets refreshed once a day (e.g. the temperature recorded for LA for the previous day)
Or a page that shows the content of a row that never gets updated (e.g. a blogpost)
Given that the single most expensive operation on a traditional webapp is the database (just think to web2py requesting the data, the database reading it from disk, preparing, sending it over the wire, web2py receiving it) developers should always find a way to spend the least possible time on those steps.
Optimizing queries (and/or database tuning, normalization, etc). Reducing the number of needed queries to render a page. Requesting just the amount of data needed (paging). Those are HUUUUGE topics (again, zillion of posts, books, years of expertise to master, etc etc etc). But - of course - not having to issue a query at all shortcircuits all of the above!
Still at step #0, as users come by your app, every request made to those pages triggers the roundtrip to the database back and forth, always for the same data, over and over.
Granted, 50 reqs/sec won't certainly hurt performances, but once they get to 500, it'll become pretty obvious that "a" shortcircuit could save LOTS of processing power.
When you face the problem of scaling to serve more concurrent requests, either you do spawning more processes, or adding servers.
Adding frontend servers is easy: the data is transactionally consistent as long as you have a single database instance. You put a load balancer in front of frontends (it's relatively inexpensive) and go on.
Scaling databases adding servers is NEVER easy (againt, the interwebs and libraries are full of evidences, and a big part of nosql "shiny" features are indeed horizontal scaling, with pros and cons).

Step #1: local caching
Back to your app without cache...wouldn't be better to avoid calling the db for the same data 500 times per second ? Sure. Cache it.

Assuming you cache the database results, web2py still needs to compute the view, but that step, in regards of the shortcircuit, is orders of magnitude less expensive. (yes, if you cache views, you're sidestepping web2py's rendering too, but let's keep as less variables as possible for the sake of this discussion)
And there you are, at the first iteration of step #1, using 1MB of RAM more to avoid hitting the database. 
Cache it for just an hour, do the math on the simple example of 50 req/s, and you saved 50*60*60 - 1 = 179999 roundtrips. You can use the extra savings to do 179999 roundtrips you actually NEED to in other places of your app, and having the same performances, without additional costs
Whoa! 



Step 1 to 300, where is step 2?? Just kidding, nice read thanks Simone...

:)

 
Step #300
You start caching here and there, and you use 500MB or RAM. You're using cache.ram, everything is super-speedy, no third-parties, just web2py features. 
Now, you need to serve 100 req/s, you spawn another process.... whoopsie .... 1GB of RAM. Or, another server, 500MB on the first and 500MB on the second... 500 are "clearly" wasted, as they are a copy of the "original" 500. 
And the second process (or server) still needs to do roundtrips if its local cache doesn't contain your already-cached-in-another-place query.
Also, something else "crepts in"...as your apps grows, you start loosing track of what you cached, when you cached, for how long it's needed to be cached... a record fetched on the first server at 8:00AM could be updated in the meantime and fetched on the second server (because it isn't in its local cache) at 8:02AM...you're effectively serving from cache different versions!

#Step 301
To sidestep both issues, you use redis or memcached: they sit outside of the web2py process and consume only 500MB. 
For one, two, a zillion processes. And they are a single source of truth. And they are as speedy as cache.ram (or at least, they have the same magnitude).

On Wednesday, April 27, 2016 at 2:04:07 PM UTC+2, Anthony wrote:
On Wednesday, April 27, 2016 at 7:00:53 AM UTC-4, Pierre wrote:
I'm impressed Anthony...

well all of these - memcache-redis - seem to require lot's of technicality and probably to set-up your own deployment machine. I am not very enthusiastic about that option since the internet if full of endless  technical setup-config issues. Given what's being said, I see only two kinds of page of my app I could cache : general informations and maybe forms (no db().select()) all shared and uniform datas.

I'm not sure what you mean by caching forms, but you probably don't want to do that (at least if we're talking about web2py forms, which each include a unique hidden _formkey field to protect against CSRF attacks).
 
There should be a simple way to achieve such a simple thing whatever the platform: pythonanywhere,vs......Is there one ?

You can just use cache.ram. If running uWSGI with multiple processes, you will have a separate cache for each, but that won't necessarily be a problem (just not as efficient as it could be). You could also try cache.disk and do some testing to see how it impacts performance.

More generally, caching is something you do to improve efficiency, which becomes important as you start to have lots of traffic. But if you've got enough traffic where efficiency becomes so important, you should probably be willing (and hopefully able) to put in some extra effort to set up something like Memcached or Redis. Until you hit that point, don't worry about it.

Anthony

--

Pierre

unread,
Apr 29, 2016, 4:58:34 AM4/29/16
to web2py-users

When you face the problem of scaling to serve more concurrent requests, either you do spawning more processes, or adding servers.
Adding frontend servers is easy: the data is transactionally consistent as long as you have a single database instance. You put a load balancer in front of frontends (it's relatively inexpensive) and go on.

How am I suppose to spawn more processes ? PythonAnywhere has this concept of workers. The more workers the more you need to pay. Is spawning more processes related to buying more workers ? what kind of a hosting service/company is good at dealing with scaling and caching ?. I don't think Pythonanywhere allows Redis or Memcache.......


thanks guys

10 seconds of silence isn't enough...........but silence is as 'workers' it's expensive......


Niphlod

unread,
Apr 29, 2016, 7:28:56 AM4/29/16
to web2py-users
pythonanywhere is a PaaS, and one of the simpler... start worrying when your site isn't speedy enough.
Fortunately you won't have to change a single thing in your app if no the cache initialization...

Pierre

unread,
Apr 29, 2016, 8:38:57 AM4/29/16
to web2py-users
ok
I'll check if I can afford the single worker version to start with and I hope this one knows how to spawn
Is the web2py book at pythonAnywhere cached ? If I didn't miss too many things, it's a good candidate for caching

Niphlod

unread,
Apr 29, 2016, 8:55:21 AM4/29/16
to web2py-users
pythonanywhere has a totally different set of limits but if you choose a PaaS, you don't get to worry about nitty gritty details .

you're right, the book is an optimal candidate for caching and if you see its code it's actually cached everywhere .... although the book isn't a good example because it never had a database in the first place
Reply all
Reply to author
Forward
0 new messages