Using memcache in order to avoid datastore access

56 views
Skip to first unread message

Efi Merdler-Kravitz

unread,
Aug 5, 2011, 7:43:29 PM8/5/11
to google-a...@googlegroups.com
Hello everybody,

I'm creating an application that pulls information from multiple sources stores it and notifies a client that an update had occurred. The client might decide to fetch the information in any time. In many scenarios my application will do multiple pulls (and thus update the datastore multiple times) however the client will fetch the information only once.

I have here a problematic situation, I want to store as much as I can in cache because the data is being replaced rapidly, however I still need to store it in case it is evicted (reading from the datastore is much faster then fetching the information again).

My questions:
1. Is there a way for me to know whether an item in memcache will be evicted soon ? (I know that JCache listeners are not fully supported).
2. If so can I know which item?

How would you solve the above problem? Do you think that wrapping memcache with my own mechanism and running it in a backend is a good solution ?

Efi

Tim Hoffman

unread,
Aug 5, 2011, 8:31:18 PM8/5/11
to google-a...@googlegroups.com
Basically you can't tell.

Also you memcache capacity is finite so you by stuffing it with stuff in case you need it, will inevitably evict something else.

You basic design pattern needs to be 

Check in memcache 
if not their fetch from the datastore
if you think the data will be re-used stick it in memcache

Various scenarios may lend them selves to preloading the cache (but note the point at the top)
If you can identity a users requirements specifically, when they log in 
you could fire of a task in the background which could pre-load the cache with stuff you think they will look at shortly, but
you still have to deal with cache misses.

I dont think running your own mechanism in the backend would be particularly useful unless you can guarantee a huge cache hit rate
as it is a finite resource. 

Also think about how you can fetch the data from the datastore more efficiently.  Getting data by key
can be very fast.  Given the client will only fetch the data once, and you don't know when they will do it
keeping the data hot in memcache seems an expensive excercise, especially if you can optimize the 
data in the datastore for the client so they can fetch it with a single db.get()

How are detecting the data change that triggers the notification?

Just my 2c

Rgds

Tim

Ikai Lan (Google)

unread,
Aug 5, 2011, 8:38:01 PM8/5/11
to google-a...@googlegroups.com
Agree with Tim. The way Memcache evicts items is on an LRU - least recently used - basis.

Are you familiar with the concept of a "working set"? The idea is that the majority of your data reads for a given window will go to a small minority of your total dataset. This is why caching in general works so well. Faster caches have less storage, but are faster and cost more, but because you only ever work with a very small part of your entire dataset most of the time, it doesn't matter. With an LRU based cache, if something is retrieved, it is *usually* likely that object will be used again in the near future, and you gain the benefits of caching.

It sounds like in your scenario, this is what will happen. If someone comes and requests some data that isn't frequently accessed, you win because you rarely pay that cost of hitting the datastore. If someone comes and requests data that is frequently accessed, again, you win because that data will, in an overwhelming majority of cases, be served from the cache. Where you lose is when your data access is totally random.

So I guess my advice is this: build it first, use caching, graph everything and watch cache hits over time. When it's a problem, it's a problem and you can deal with it then, but my intuition tells me that it probably won't be for a while.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/QQMA5FBKJWMJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Ikai Lan (Google)

unread,
Aug 5, 2011, 8:39:16 PM8/5/11
to google-a...@googlegroups.com
Clarification for last post: LRU means that if an item has not been used for a long time, it will likely be evicted, whereas an item that was recently used is less likely to be evicted. 

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Efi Merdler-Kravitz

unread,
Aug 5, 2011, 9:06:35 PM8/5/11
to google-a...@googlegroups.com
For some reason I can't see your answers here Tim and Ikay.

You raised some good points.

Tim, if I'll use your basic pattern:
Check in memcache 
if not their fetch from the datastore
if you think the data will be re-used stick it in memcache

then I don't see any reason to use memcache directly, instead I'll use objectify (or something similar), makes sense ?

Correct me if I'm wrong but fetching from the datastore is not the problem, writing to it is heavier, so making my queries faster probably will not ease my problems. 

Let's say I can ease my assumptions, say that whenever I update the client if it doesn't fetch the information in X seconds then I'm willing to remove the item from cache and move it to the datastore, in case the data fetched then there is no need to store it at all (it is cached in the client). Is there a way to implement such a behavior ?

Thanks for your help guys.

Tim Hoffman

unread,
Aug 5, 2011, 9:15:41 PM8/5/11
to google-a...@googlegroups.com
Hi

You care correct, writing is the costly component.  Doing a get by key is normally veru quick.,

>Let's say I can ease my assumptions, say that whenever I update the client if it doesn't fetch the information in X seconds then I'm willing to remove the item >from cache and move it to the datastore, in case the data fetched then there is no need to store it at all (it is cached in the client). Is there a way to implement >such a behavior ?

I don't think this is correct.  You would always write to the datastore and then cache.  If you stick it in memcache and plan on persisting it in the datastore
later you may well lose data.

It might help to think about your writes and processing as seperate phases.  Write raw data, then use a task or perspective search to aggregate and 
prepare the data for the client, then notify.

Rgds

T
Reply all
Reply to author
Forward
0 new messages