Shared dicts vs. Redis performance

ish...@umbc.edu

unread,

Aug 21, 2014, 3:03:20 PM8/21/14

to openre...@googlegroups.com

Quick question regarding ngx.shared dicts and Redis.

For caching purposes, would there be significant latency/performance improvements if I were to use, say, a 1 GB nginx shared dict as my first level of caching and then fall back to Redis with lua-resty-redis for everything else, and to store semi-persistent data? Or would performance be approximately the same using Redis listening locally on a Unix socket (though Redis may be separated into its own server eventually once I need load distribution)? I know that logically speaking a shared dict is probably going to have less overhead, but I'm wondering if the difference could be considered significant.

I do know profiling would give me the best answer to this, but I'm just thinking about how to architecture some other parts of my application as it eventually becomes bigger and wanted to get some general guidance.

Thanks,

Ian

Yichun Zhang (agentzh)

unread,

Aug 21, 2014, 3:19:20 PM8/21/14

to openresty-en

Hello!

On Thu, Aug 21, 2014 at 12:03 PM, isheff1 wrote:
> For caching purposes, would there be significant latency/performance
> improvements if I were to use, say, a 1 GB nginx shared dict as my first
> level of caching and then fall back to Redis with lua-resty-redis for
> everything else, and to store semi-persistent data?

The benefit of an extra caching layer depends on the cache hit rate of
your app's use pattern.

We use 2 levels of caching (lua-resty-lrcache and ngx.shared.DICT)
before our memcached-like data service based on sockets in production,
for example. Because the locality and cache hit rate is good enough.

> Or would performance be
> approximately the same using Redis listening locally on a Unix socket
> (though Redis may be separated into its own server eventually once I need
> load distribution)?

Theoretically, shm outperforms sockets a *lot* due to the lack of
expensive socket-related system calls and almost no context switches.
But in reality there can be more complications with shm like
serialization/de-serialization overhead and also locking overhead
involved with many nginx worker processes busy accessing the same shm
zone at exactly the same time. How much such ngx.shared.DICT overhead
can be really depends on your actual data set and use patterns, and
can be very different from app to app.

> I know that logically speaking a shared dict is probably
> going to have less overhead, but I'm wondering if the difference could be
> considered significant.
>

It'll be very hard to do reasonable predictions without knowing data
set and use patterns well enough. And yeah, computer engineering is
hard :) And that's why profiling and experiments are always
recommended.

I think the rule of thumb is that ngx.shared.DICT is usually more
efficient than socket services if we are not doing something special
or stupid :)

> I do know profiling would give me the best answer to this, but I'm just
> thinking about how to architecture some other parts of my application as it
> eventually becomes bigger and wanted to get some general guidance.
>

See above :)

Regards,
-agentzh

Vladislav Manchev

unread,

Aug 21, 2014, 3:19:48 PM8/21/14

to openre...@googlegroups.com

You can get significantly better performance by using shared dicts, but as always - it depends on your use case.

If you do not wish to persist data in any way or you're willing to sync the shared dicts back to disk yourself (not a good idea in my opinion) then that would be the faster approach.

Not sure if you're aware, but you can basically put only numbers and strings in a shared dict (as well as booleans and nil) [1]. If you need a Lua table for example then you'll need to serialize/deserialize it on your own which will definitely have some overhead (I would say rather significant).

In such cases you can use lua-resty-lrucache [2], but still persistence is something you'll have to take care of if you need it.

One more thing you should think of is if you need to share this data between nginx worker processes. Shared dicts will be helpful in case you need all workers to share data, but lua-resty-lrucache can't share across the OS process boundary.

[1]: https://github.com/openresty/lua-nginx-module/#ngxshareddictset
[2]: https://github.com/openresty/lua-resty-lrucache

Best,
Vladislav

--
You received this message because you are subscribed to the Google Groups "openresty-en" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openresty-en...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yichun Zhang (agentzh)

unread,

Aug 21, 2014, 3:20:02 PM8/21/14

to openresty-en

Hello!

On Thu, Aug 21, 2014 at 12:19 PM, Yichun Zhang (agentzh) wrote:
> We use 2 levels of caching (lua-resty-lrcache and ngx.shared.DICT)

Sorry, typo here. It should be lua-resty-lrucache:

https://github.com/openresty/lua-resty-lrucache

Regards,
-agentzh

ish...@umbc.edu

unread,

Aug 21, 2014, 4:41:15 PM8/21/14

to openre...@googlegroups.com

Thanks for the great responses agentzh and Vladislav.

>We use 2 levels of caching (lua-resty-lrucache and ngx.shared.DICT)

before our memcached-like data service based on sockets in production,
for example.

If I might ask, what sort of things do you store in the LRU cache? Do you store all scalar data in shared dicts and Lua tables in the LRU cache?

Is storing even a scalar value like an integer in the LRU cache more performant than putting it in ngx.shared.DICT? Do you use the LRU cache as your very first cache layer, and only store data in a shared dict if it isn't a table and absolutely needs to be shared among all worker processes, so that it won't be recomputed needlessly?

Separate workers aside, is one or the other better for handling very frequent cache check and/or cache hit rates? And are there any other variables that come into play when deciding which of the 2 to use as storage?

Again, just trying to get some general ideas.

I think I'm likely going to use shared dicts instead of Redis for most of my ephemeral caching, but I'd definitely like to learn as much as I can about all of the different options.

>It'll be very hard to do reasonable predictions without knowing data

set and use patterns well enough. And yeah, computer engineering is
hard :) And that's why profiling and experiments are always
recommended.

Absolutely. Right now, the load my app gets is low enough that I could get away with using Redis for everything with absolutely no problems, but I expect the load to increase significantly in the future. If I start running into caching performance or memory problems I'll definitely profile my current solution and compare with a few alternatives before coming here.