Ensuring data consistency with Redis

Bogdan Irimia

unread,

Sep 13, 2018, 7:27:40 AM9/13/18

to redi...@googlegroups.com

Hello, everybody

We are currently using Redis in our product (and we are very happy with
it) to store statistical data computed based on numerical parameters
received via HTTP requests.
We keep in Redis only processed data, without keeping all incoming
values. For example, if we receive 5 different values, in order to know
the maximum value of these 5 values we keep in Redis only one value
(which is the current maximum value), which is updated, if necessary,
every time a new value is received. In other words, we do "stream"
operations: MIN, MAX, SUM etc.

The problem we try to solve now is related with "average". In order to
know the average of the received values, we store in Redis the current
SUM and the current COUNT. When a new value is received, we increment
the SUM with the value received and we increment the count with 1. Then,
anytime we need the average, we extract the SUM and COUNT and make the
division.

We need these related operations - incrementing the SUM and the COUNT -
to be consistent: if the incrementation of one of them fails (for
various reasons), the other operation should be aborted (or rolled back,
if it is already executed).
Now... I know there is no rollback support in Redis transactions.

What options do you think we have to ensure that either both operations
succeed or none of them is executed (all or nothing)?

Thank you

Kind regards,
Bogdan Irimia

Itamar Haber

unread,

Sep 13, 2018, 7:44:05 AM9/13/18

to Redis DB

Hello Bogdan,

> if the incrementation of one of them fails (for various reasons)

What reasons?

Anyway, I'd look into Lua for writing an ~5 lines server-side script (see the `EVAL` command) to maintain this business rule as well as to reduce network latency.

The script will ensure the atomicity and allows you to handle failures (if any occur for various reasons :)) in it.

Cheers,

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--

Itamar Haber
Technicalist Evangenly

Phone: +972.54.567.9692

Bogdan Irimia

unread,

Sep 13, 2018, 7:48:30 AM9/13/18

to redi...@googlegroups.com

Yes, Redis scripts was one of my ideas too.
Reasons operations might fail:
- Redis is out of memory (this actually happened)
- connection dropped between client and Redis (the client is OpenResty - NginX+Lua)
- Redis crash, server crash/reset, Redis process killed

What level of consistency will the script ensure? What if the Redis process is killed exactly when the script is being run?

Thank you

Itamar Haber wrote:

To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

Itamar Haber

unread,

Sep 13, 2018, 8:34:14 AM9/13/18

to Redis DB

TL;DR go ahead and try it - it should work™

OOM is definitely a concern. But:

- You're not allocating a bazzilion bytes, only incrementing existing keys

- From a deployment perspective, your Redis' RAM consumption should be monitored and managed. If OOM is not acceptable, scale it/do house cleaning so it won't happen.

- Scripts provide some safety mechanisms to protect against that (evolving work)

A dropped connection is a client's concern. If the operation, i.e. EVAL, was sent and received before the drop, then the Redis server will complete it regardless of the connection's state.

While Redis is running the script (almost) everything literally stops. In case of a Redis crash, server crash or process killed, if you don't have persistence enabled you'll come back up clean. With persistence, you'll recover to the most recent state you have depending on the config, but the script's effects won't be in it as these are persisted only upon the script's return.

That said, please feel free to explore this subject and report any aberrant behavior.

You're welcome :)

hva...@gmail.com

unread,

Sep 13, 2018, 12:28:26 PM9/13/18

to Redis DB

My suggestion is for a different approach: don't keep the derivitaves of the samples, keep the samples and let the clients perform the max/min/median calculations.

I.e., keep the samples in Redis in a List. Add a new sample with RPUSH, delete an old sample with LPOP - both are atomic operations. Clients that want a statistic for the samples fetch the list with "LRANGE key 0 -1" and do the calculations on the samples themselves.

The client routines that add or remove a sample from the list can also post sum and count in other keys just like your code does now, and the clients that don't need the most accurate average can read the sum/count keys. But those clients that absolutely, positively must have the most precise average of the samples can fetch the list of samples and calculate it themselves. Also, a script can periodically check the count/sum against the samples and fix incorrect ones.

This might not be useful for you, but I wanted to suggest it in case you hadn't considered it.

Bogdan Irimia

unread,

Sep 14, 2018, 7:13:01 AM9/14/18

to redi...@googlegroups.com

Thank you very much for your suggestion. Indeed, keeping the values would offer us the greatest flexibility (and precision), but the amount of data won't fit in an affordable amount of RAM.
For example, we are receiving messages every minute from various datasources. We need to keep values for up to 3 years, so that's 1.5 million values per parameter. We need to support more than 500 datasources, with tens of parameters (maybe more than 100, in fact). So that requires too much memory, so we decided we should keep only processed data - representative for various intervals with different resolutions.

We do keep the "raw" values in files and we can load them when really needed, but that's a different discussion.

hva...@gmail.com wrote:

--

You received this message because you are subscribed to the Google Groups "Redis DB" group.

To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

Reply all

Reply to author

Forward