If you just need to store data for a single user across requests, you can store it in the session (each user has a separate session). But if you need to store data from multiple users and then process it all together, you can store it in the cache: http://web2py.com/books/default/chapter/29/4#cache.
That might be just slightly unsafe, don't you think, if a piece of data gets evicted before it's needed? I'd prefer to limit caching to data that can be recreated when it's not found in the cache.An alternative might be a memory-based SQLite database, taking care not to let it leak (a consideration regardless of the implementation)
If you just need to store data for a single user across requests, you can store it in the session (each user has a separate session). But if you need to store data from multiple users and then process it all together, you can store it in the cache: http://web2py.com/books/default/chapter/29/4#cache.That might be just slightly unsafe, don't you think, if a piece of data gets evicted before it's needed? I'd prefer to limit caching to data that can be recreated when it's not found in the cache.An alternative might be a memory-based SQLite database, taking care not to let it leak (a consideration regardless of the implementation).
If you just need to store data for a single user across requests, you can store it in the session (each user has a separate session). But if you need to store data from multiple users and then process it all together, you can store it in the cache: http://web2py.com/books/default/chapter/29/4#cache.Anthony
user_data = cache.ram('user_data', lambda:dict(), time_expire=None)
# add the data from this user, this should also update the cached dict?
user_data[this_user_id] = submitted_data
if len(user_data) == some_pre_defined_number:
# get data out of the dictionary 'user_data' by user Ids
...
# process them and persist results to DB
...
# reset 'user_data' dict by removing it from cache
cache.ram('user_data', None)
something = cache.ram('something', lambda:'blah blah', time_expire=1)something = cache.ram('something', lambda:'blah blah', time_expire= make_it_expire)cache.ram(key, None)user_data = cache.ram('user_data', lambda:dict(), time_expire=None)
# add the data from this user, this should also update the cached dict?
user_data[this_user_id] = submitted_data
cache.ram('user_data', lambda: user_data, time_expire=[some positive value])The above implies that the 'if' check is made upon every request by individual users, which is somewhat inefficient, but I am not sure if there is other better way(s) to implement this logic within web2py.
1. when exactly does an object expire in cache? My understanding is that, as long as the 'time_expire' value of the current request is greater than the gap between the time of current request and the time at which the requested object was last updated in cache, then the object behaves as if it never expires in cache. This is illustrated by an example in the book, if I understand it correctly. Therefore, the only perceivable way (to me) for an object to expire is to supply a 'time_expire' value in a request that is smaller than the gap between the time of this request and the time when the requested object was updated in cache. In other words, if I do:something = cache.ram('something', lambda:'blah blah', time_expire=1)effectively, 'something' never expires in cache until I call something like the following after t seconds (with t > make_it_expire):something = cache.ram('something', lambda:'blah blah', time_expire= make_it_expire)
2. An follow-up question is that, when an objects eventually expires in cache, does it mean it is completely removed from the cache? i.e. equivalent to:cache.ram(key, None)or something else happens?
3. The manual says that by setting 'time_expire = 0', it forces the cached object to be refreshed. Am I correct in understanding this as the cached object will be immediately set to whatever is supplied by the second argument of the 'cache.ram' call? Thus 'refreshed' here means 'updated'.
In general, the approach David suggests (https://groups.google.com/d/msg/web2py/4cBbis_B1i0/KkKnNwUw8lcJ) is probably preferable, but below are answers to your questions...user_data = cache.ram('user_data', lambda:dict(), time_expire=None)
# add the data from this user, this should also update the cached dict?
user_data[this_user_id] = submitted_dataThe above would not update the dict in the cache -- you'd have to do that explicitly:cache.ram('user_data', lambda: user_data, time_expire=[some positive value]
Regarding the pros and cons of this approach (in comparison to David's database approach), I wonder what are the potential pitfalls/risks of the cache approach. For examples, but not limited to,1. Is there any consistency issue for the data stored in web2py cache?
2. Is there any size limit on the data stored in web2py cache?
3. Is it thread-safe? For instance, if I have two threads A and B (two requests from different users) trying to access the same object (e.g. 'user_data' dict) stored in the cache at the same time, would that cause any problem? This especially concerns the corner case where A and B bear the very last two pieces of data expected to meet 'some_pre_defined_number'.
Regarding the pros and cons of this approach (in comparison to David's database approach), I wonder what are the potential pitfalls/risks of the cache approach. For examples, but not limited to,1. Is there any consistency issue for the data stored in web2py cache?Yes, this could be a problem. The nice thing about using a db with transactions is that you can ensure any operations get rolled back if an error occurs. In web2py, each request is wrapped in a transaction, so if there is an error during the request, any db operations during that request are rolled back. The other issues is volatility -- if your server goes down, you lose the contents of RAM, but not what's stored in the db (or written to a file).
3. Is it thread-safe? For instance, if I have two threads A and B (two requests from different users) trying to access the same object (e.g. 'user_data' dict) stored in the cache at the same time, would that cause any problem? This especially concerns the corner case where A and B bear the very last two pieces of data expected to meet 'some_pre_defined_number'.That's a good point -- your current design would introduce a potential race condition problem (I was originally thinking each request would add separate entries to the cache, not update a single object). Of course, a db could have a similar problem if you were just repeatedly updating a single record (rather than inserting new records each time).
user_data = cache.ram('user_data', lambda:dict(), time_expire=None)
# add the data from this user, this should also update the cached dict?
user_data[this_user_id] = submitted_dataThe above would not update the dict in the cache -- you'd have to do that explicitly:
I've looked deeper into this, and in web2py doc: http://www.web2py.com/examples/static/epydoc/web2py.gluon.cache.CacheInRam-class.html, it mentions: This is implemented as global (per process, shared by all threads) dictionary. A mutex-lock mechanism avoid conflicts.Does this mean that when each request thread is accessing and modifying the content (e.g. a dictionary in my case) of the cache, every other cache is blocked and has to wait till the current request thread finishes with it. If so, it seems to me that the race condition we fear as above should not happen? Please correct me if I get this wrong. Thanks.
I've looked deeper into this, and in web2py doc: http://www.web2py.com/examples/static/epydoc/web2py.gluon.cache.CacheInRam-class.html, it mentions: This is implemented as global (per process, shared by all threads) dictionary. A mutex-lock mechanism avoid conflicts.Does this mean that when each request thread is accessing and modifying the content (e.g. a dictionary in my case) of the cache, every other cache is blocked and has to wait till the current request thread finishes with it. If so, it seems to me that the race condition we fear as above should not happen? Please correct me if I get this wrong. Thanks.Well, because cache.ram returns a reference to the object rather than a copy, I think you're OK in terms of avoiding conflicts when updating the dict (if you're just adding new keys). However, you might have to think about what happens when you hit the required number of entries. What if one request comes in with the final entry, but while that request is still processing (before it has cleared the cache), another request comes in -- what do you do with that new item? It may be workable, but you'll have to think carefully about how it should work.
Thanks for confirming this. The only kind of updates would be adding new key-value pairs. Having confirmed that this sort of updates will be consistent on the cache, I am now a little worried about the performance of doing so, given the mention of the mutex-lock design of the cache.If I understand this correctly, each thread (from a request) will lock the cache so that all other threads (requests) will have to wait. I intend to store multiple dictionaries (say 10) in the cache, and each dictionary will handle the data from a fixed set of users (say 30 of them) for a given period of time. If the cache truly behaves as above, then when one thread is updating the cache, all the other 10 * 30 - 1 = 299 threads will be blocked and will have to wait. This might drag the efficiency of the server-side.