HSET vs SET memory usage?

2,275 views
Skip to first unread message

AcidZombie24

unread,
Oct 8, 2012, 6:00:57 AM10/8/12
to redi...@googlegroups.com
I just wrote a question but i thought i should include it here. I read both places so answer whereever you feel like http://stackoverflow.com/questions/12779372/hset-vs-set-memory-usage

I was reading [this article](http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value-pairs) which mentions storing 1Million keys in redis will use 17GB of memory. However when switching to hashes chunking them at 1k each (ex: `HSET "mediabucket:1155" "1155315" "939"`) allows them to store 1M in 5GB which is a pretty large savings.

I read [redis memory-optimization](http://redis.io/topics/memory-optimization) but I don't quite understand the difference. It says HGETs are not quite O(1) but close enough and mentions more cpu usage when using hsets. I don't understand why there would be more cpu usage (sure trading time for space. But how/what?). It mentions 'encoding' but not how they encode it.

It also mentions only string but I have no idea what only string means. Is it the hash field? Does it mean the hash field? I don't see anything about it in [HSET](http://redis.io/commands/hset). What exactly would be encoded and why would the encoding be more efficient then using a SET?

How is it possible `HSET "mediabucket:1155" "1155315" "939"` is more efficient then `SET "mediabucket:1155315" "939"`? There less data in SET (1155315 an d 1155 is used rather then 1155315). I personally would try using binary keys however I don't think that has to do with why HSETs are more efficient.

Felix Gallo

unread,
Oct 8, 2012, 6:38:13 AM10/8/12
to redi...@googlegroups.com
AcidZombie24 writes:
> [has many questions about the internals of redis]

Generally I find that the best answer to these questions is to read
the source code, which is available on github
(https://github.com/antirez/redis) and is written in tight but
straightforward, flat and unaccented C. In the course of this post
you have many broad and vague questions about how things work, and
really the only way to ever get a sensible and satisfactory answer in
that general case is to immerse yourself in the material.

> How is it possible `HSET "mediabucket:1155" "1155315" "939"` is more
> efficient then `SET "mediabucket:1155315" "939"`? There less data in SET
> (1155315 an d 1155 is used rather then 1155315). I personally would try
> using binary keys however I don't think that has to do with why HSETs are
> more efficient.

As per the Instagram article you linked, they are radically different
data structures; for small sizes, redis can be commanded to save
dictionary-style data in a space-efficient way, rather than in a hash
table. Check out zipmap.c.

F.

>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/8dtVVna5Qt8J.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.

Sripathi Krishnan

unread,
Oct 8, 2012, 8:02:56 AM10/8/12
to redi...@googlegroups.com
> How is it possible `HSET "mediabucket:1155" "1155315" "939"` is more
> efficient then `SET "mediabucket:1155315" "939"`? There less data in SET
> (1155315 an d 1155 is used rather then 1155315). I personally would try
> using binary keys however I don't think that has to do with why HSETs are
> more efficient.

`SET "mediabucket:1155315" "939"` stores the data in the global dictionary. A dictionary entry has several overheads. Each key and value is wrapped in a structure called robj. Storing keys in a HashTable requires pointers, and each pointer has an overhead.

For small Hashes, Redis uses a representation called ZipList (ZipMap was used in earlier versions, but is now deprecated).  Think of a ZipList like an array. Key value pairs are flattened and stored consecutively - key1, value1, key2, value2 and so on. There are some more fields for housekeeping, but that is the general idea.

Because key/values are stored consecutively, you eliminate pointers and other indexing mechanisms. This saves memory. But on the other hand, finding an element becomes more costly. Beyond a certain limit, ZipLists are inefficient. In the Instagram article you linked, they increased this limit and decided to tradeoff CPU for memory.

More details : 
  1. https://github.com/antirez/redis/blob/unstable/src/ziplist.c
  2. What are the underlying data strucutres used for redis - See the second answer, it explains the internals so that you can make CPU/Memory trade-offs
  3. Class MemoryCallback in redis-rdb-tools : The inline comments explain how to calculate the memory used by a particular redis data structure. 

--Sri
Reply all
Reply to author
Forward
0 new messages