Invalidation Semantics and Caching Strategy for a Redis Cache ??

1,011 views
Skip to first unread message

Abhishek Sharma

unread,
Apr 14, 2015, 5:48:29 AM4/14/15
to redi...@googlegroups.com
I just started to explore Redis last week and I am looking to use it as cache layer to improve my application's overall performance and scalability. 

Current Scenario:
I have a monolithic java application that uses hibernate as ORM. Also have Eh-cache enables as a 2nd level cache and MySQL backend.
Current have some in-process custom caching of information to speed up certain key operations.

Where do I want to get to:
I want to use Redis to built an intermediate cache layer such that it could run separate server as opposed to one monolithic java application. Looking to improve performance and scalability by caching. 
Also, looking to disable eh-cache (2nd level hibernate cache) and just use Redis instead. 

My Questions:
1. Most of my data access scenarios don't need very strong data consistency (between current cache and DB, I could tolerate eventual consistency (couple of seconds)). I am wondering what could be some good strategies for caching. Read/Write Through or Cache-Aside ? 
    I do have a few scenario where strong consistency is a requirement but only a few scenarios. 

2. Also, it would be great if I can get some pointers for cache invalidation. Well, this closely related to the caching strategy I use but I wondering if there was a redis use-case story with a scenario similar to mine. I would like to know what worked and what did not. 

3. Based on my explanation I was hoping some of the experts in this group could also advice me on the DOs and DONTs and things I should watch out for. 

Thanks so much in advance. 

Dvir Volk

unread,
Apr 14, 2015, 6:59:08 AM4/14/15
to redi...@googlegroups.com
Hi Abishek,

I don't know Hibernate that much, but I'll try to give a general answer about redis caching:

0. First of all - If you want very simplistic key/blob caching and do not need disk backup of your cache - redis and memcache provide the similar performance, and memcahce is easier to scale to multiple machines (I'll get to the redis side of this soon). I'm using redis and not memcache in production, but do take a minute to understand why you are choosing one and not the other.

What redis gives you in the caching area that memcache doesnt:
  • Master/slave and cluster-based replication - although if you're doing automatic LRU eviction this is problematic. 
  • Disk backup of your data so you can reboot the cache without losing data.
  • "Smart" caching beyond key/blob - you can use redis' advanced data structures to create more robust caching, depending on your scenario.

2. Dos and don'ts:
  • Do use a sharding client so you can scale to multiple machines if the cache should grow. Not sure about the state of sharded Java clients, but that's how I do it with Python.
  • Don't use master/slave replication if you need an LRU cache, as access to slaves will not change LRU state.
  • Do limit the upper bound of your cache capacity so if you overflow your cache, you won't get OOM-killed.
  • Use time based expiration for your cached records
  • Use the above two settings alongside LRU or another eviction policy (redis has a few, volatile-lru is what I'm using for caching) to make sure that even if you exceed you cache's capacity, it will continue to operate normally.
  • Do monitor redis' state using the INFO command in your monitoring system.
  • Don't forget to set very low timeouts on your client so that network problems with your cache won't clog your app.

3. Re caching method:
Redis can't really do read-through caching, as it can't perform operations on your behalf, it's just a key/value store. You can of course abstract it in your code so it looks like a read-through cache but you'll still need to do at least one GET request for each cache fetching and two requests if the cache record does not exist and you need to write it.

Using redis in Python, for example, I have a decorator that does that behind the scenes. e.g.:

@redis_cache(ttl=600)
def loadUser(userId):
    talk_to_sql_database()

which means the method is only called when the record is not in the cache (note that python decorators do not work like Java annotations). it looks like a read-through cache to the code, but it's basically cache-aside.


4. Re cache invalidation:
The most obvious method, if you need to invalidate all or a large group of your cache keys at once, without deleting all your data, is to append a version number to the cache keys, and just increment it in your configuration when you want to invalidate. this can be per cache key group. e.g. in my above python example, I can do:

@redis_cache(ttl=600, version=123)
def loadUser(userId):
    talk_to_sql_database()

which would yield cache keys looking like "loadUser:user100:123". incrementing the version would automatically make all "loadUsers" entries be invalidated, but not other records.
If you have TTL on all cache records, there is no need for manual "garbage collection" on invalidated records - redis will delete them automatically in the background in time.

heck, that was long! I hope I helped you :)

Dvir




--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages