New expire algorithm: analysis

Salvatore Sanfilippo

unread,

Jun 20, 2019, 5:27:47 AM6/20/19

to redi...@googlegroups.com

Hi all,

I spent several weeks implementing a new way to store Redis keys in
memory, and implementing a better expire algorithm. Unfortunately the
result is a mix of good and bad things, so I'm submitting my work to
the community to understand what to do, if to throw it away, if to
work more towards making it production ready, or if to use a mixed
algorithm like proposed at the end of my analysis.

Here is the link to the analysis document:
https://gist.github.com/antirez/b2eb293819666ee104c7fcad71986eb7

You can find the implementation in the Redis Github repository, under
the branch "new-keyspace".

Feedbacks welcome.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.

Bogdan Storozhuk

unread,

Jun 20, 2019, 11:59:50 AM6/20/19

to Redis DB

Hi Salvatore,

Thank you very much for the description of your algorithm and performance analysis. It was a real joy to read it.
I have a few questions about the usage of a radix tree for data that is already sorted by nature like timestamps:
Of course, I understand the this is the most straightforward solution to use tree structure for such a task, but maybe we can come up with some specialized structure that can employ or knowledge about data distribution.
Maybe it would be easier to use some queue-like structure to maintain the order of keys to be discarded?
An example of such queue I'd suggest some doubly linked list of big fixed size arrays, such structure can help to reduce allocations and improve cache locality. Also, you can compact timestamps within each big chunk by prepending min and max time to chunk header and omitting timestamps for each item within the chunk.
Also, these chunks can be pooled for future reuse to reduce allocations.
Please forgive me if this comment is not relevant due to my lack of knowledge of requirements or other restrictions, that is just my idea I got during the reading of your post.

Bogdan Storozhuk

unread,

Jun 20, 2019, 12:07:59 PM6/20/19

to Redis DB

But of course, such algorithm won't work in case of user user-specified expiration times. :-(

Bogdan Storozhuk

unread,

Jun 20, 2019, 1:21:13 PM6/20/19

to Redis DB

For user-specified expiration times, maybe something like a binary heap or even Heap on Top (https://pdfs.semanticscholar.org/b1cf/c50749203cf2de7fcd13c60dccfa0c988a10.pdf) can be considered?

Geoffrey Hoffman

unread,

Jun 20, 2019, 8:09:29 PM6/20/19

to redi...@googlegroups.com

This may not useful from a non-C-programmer user of Redis since 2010. I had this thought awhile back. You’ve gotten us all thinking in “Redis-like” ways and we aspire to keep Redis codebase and API simple, familiar and elegant as you do.

What if you combined the ideas of sorted set and list - I’ll call it a zsetlist - such that the Redis process looks at the front most value (the 0th item tuple) of a special named (or config specified key), the score would be the timestamp, but each entry tuple would also be items in a list. If the timestamp of the 0th item is elapsed, publish an event for users who have asked about this feature countless times, expire the associated key and pop that value from the zsetlist, leaving the next item for the next expire interval. The config could have added

EXPIRE_ZSETLIST_NAME=my:expires
EXPIRE_PUB_CHANNEL=xchan
EXPIRE_ZSETLIST_INTERVAL=60
EXPIRE_BATCH_SIZE=1000

If the 0th item timestamp is elapsed, Redis would fetch the next EXPIRE_BATCH_SIZE elements of the zsetlist to see if their times are also ready for expiring.

It would be cool to be able to easily ZSLRANGE some portion of the expire zsetlist key and find out the soon expiring keys and when. And I guess it would be helpful to have a configureable solution that doesn’t require a ton of code rewrites to existing Redis data structures and operations.

I think a zsetlist could be useful in completely different ways, too, for my application. Of course, I realize most of this is wild conjecture and hyperbole, and may not even be workable for a variety of reasons, besides that it doesn’t really address your analysis.

Having no real knowledge until now or Redis expiry internals, I’d rather a solution that adds a configurable new data type that supports expiry needs as well as other use cases, than a risky upgrade that has pros and cons and only under the hood changes.

Xingbo Wang

unread,

Jun 21, 2019, 2:54:42 AM6/21/19

to redi...@googlegroups.com

I really enjoyed the read. Thanks for sharing.

For using radix tree store expired time, maybe time wheel could be another option? https://lwn.net/Articles/646950/

Essentially, not all timers have to be handled with the same level of accuracy. Item that expires minutes or hours away could be expired with lower accuracy to reduce the space to store them. However, the inaccuracy expire will also change the existing behavior. Nevertheless, it is another trade off.

Thanks,

Shawn

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
To view this discussion on the web visit https://groups.google.com/d/msgid/redis-db/3BAF6226-17B8-42B8-A594-B50E88376676%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

--

Best Regards,

Xingbo

Salvatore Sanfilippo

unread,

Jun 21, 2019, 4:20:10 AM6/21/19

to redi...@googlegroups.com

Hello, indeed the expire times are user defined and we need to insert
in-the-middle. So actually the radix tree tends to perform quite well
in that case, not sure we can get a big boost from using a different
data structure, especially because ordering *just* for expire time is
not enough, we need to also store the key name in some form. However
what it is possible to do that I didn't try in order to make the Radix
tree more efficient in that case, is to use a radix tree to just index
the expire times, and every single expire time than has a linked list
structure with the keys that will expire in that exact time. But there
are tradeoffs here as well because we need to remove the keys when an
expire is redefined or removed at all, so the key object would need to
hold the pointer to the linked list node. In general I've the feeling
that the radix tree that we are using now is quite optimized already
for time and space (is faster than the hash table we use for many
workloads, for instance), and is more a problem if understanding if
this is the right design, or is better to use the tree just to augment
the old design with more information. I'm going to attempt writing a
branch with such "augmenting" idea in order to compare the two...

On Thu, Jun 20, 2019 at 6:08 PM Bogdan Storozhuk
<storoz...@gmail.com> wrote:
>
> But of course, such algorithm won't work in case of user user-specified expiration times. :-(
>

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/redis-db.

> To view this discussion on the web visit https://groups.google.com/d/msgid/redis-db/30919cd6-f5fd-465b-ac49-a49e3508be25%40googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

Salvatore Sanfilippo

unread,

Jun 21, 2019, 4:21:51 AM6/21/19

to redi...@googlegroups.com

Yep, currently we are not paying much for the space in the radix tree,
but because the key object is now bigger. Also there are indeed other
tradeoffs in a more complex design.
Maybe the optimizations we need in the design are not much higher
level, but more brutal like using the bit to understand what kind of
key to store, profiling the radix tree in that specific use case and
speedup the implementation, and so forth.

> To view this discussion on the web visit https://groups.google.com/d/msgid/redis-db/CAF5SkhrvpYj1RGROMyZfBBcbfPtYpdH08SR0VbeJECSWMdmmow%40mail.gmail.com.

> For more options, visit https://groups.google.com/d/optout.

--

Thomas Love

unread,

Jun 21, 2019, 8:16:53 AM6/21/19

to Redis DB

I've used a "hashed wheel" approach to schedule timeout in a network server. Have you assessed this kind of structure? Being basically a hash I would guess it would involve less change in performance characteristics compared to using a tree.

http://www.cs.columbia.edu/~nahum/w6998/papers/sosp87-timing-wheels.pdf

It obviously involves a translation from absolute to relative time offsets. If startup cost is a problem then it could perhaps be bootstrapped incrementally with the sampling technique, but I guess in the common case Redis process lifetime is a large multiple of key TTL.

Thomas

Reply all

Reply to author

Forward