storing timestamp-value series in redis

4,481 views
Skip to first unread message

Marius M.

unread,
Feb 21, 2014, 6:34:11 AM2/21/14
to redi...@googlegroups.com

I need to store some timeseries data into redis. I have unix timestamps and that that i need to associate a value (timestamp <-> value)

I tried sorted sets having the timestamp as score (so i can do zrange on the timestamps) and the value as member.

127.0.0.1:6379> ZADD timeserie 1392141527245 10 1392141527275 12 1392141527100 10
(integer) 2
127.0.0.1:6379> zscan timeserie 0
1) "0"
2) 1) "10"
   2) "1392141527245"
   3) "12"
   4) "1392141527275"
127.0.0.1:6379>

But i hit a problem, members are nonrepeating while my values can be the same for different timestamps. Any idea how to approach this? Is another data type better?

Dvir Volk

unread,
Feb 21, 2014, 7:41:30 AM2/21/14
to redi...@googlegroups.com
There isn't a single type in redis that tackles this (there was long ago talk about adding something like that), but you can achieve it with a combination of two (assuming your timestamps are sparse):
1. a sorted set that just holds the existing timestamps in the database. This is assuming not every possible timestamp is present. This will allow range queries. 
2. SETs (or sorted sets if you want secondary indexing) of all records with a given timestamp.

thus you first query the sorted set for the range you want, extract the existing timestamps, and do SMEMBERS on each. you can store the keys to the secondary sets as the values.

so you have your sorted set that looks something like
{ k1: ts1, k2: ts2 }
and a lot of secondary sets:
k1: { r1, r2, r3 }
k2: { r4, r5, r6 }

you can even do the query in a single Lua function and save the double rountrip.



--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.



--
Dvir Volk
Chief Architect, Everything.me

Whit Armstrong

unread,
Feb 21, 2014, 10:53:46 AM2/21/14
to redi...@googlegroups.com
it would be really nice to have a time series specific type.

sorted sets don't really do the job since a time series can have many
repeated values.

ideally, the datatype would be similar to a hash, but with a sortable
unique index, instead of fields.

I've never looked at the internals of redis, but I'm happy to put some
time into working on this if there is sufficient interest.

-Whit

Josiah Carlson

unread,
Feb 21, 2014, 10:58:34 AM2/21/14
to redi...@googlegroups.com
Another option is to use a single ZSET, but include a suffix that defines an event identifier. It's how I do autocomplete: {k1_<id>: ts1, k2_<id>: ts2, ...} . Whether or not you use the identifier otherwise is up to you.

 - Josiah

Jay Johnston

unread,
Feb 25, 2014, 10:48:13 AM2/25/14
to redi...@googlegroups.com
What about appending some random string to the value, to make each unique, then ignoring it when you retrieve it?

Josiah Carlson

unread,
Feb 25, 2014, 11:04:05 AM2/25/14
to redi...@googlegroups.com
You can use the random string, but generating short random strings that don't collide is very difficult unless you keep an explicit record of them (to not repeat). Look into "birthday collisions" or the "birthday paradox", which is applicable here, and mean that if you (for example) need 1 million such throwaway identifiers, you more or less need to use at least a 40 bit random identifier. With Redis, you can just use a shared counter for your id, which offers the same sort of "give me a unique identifier that I paste on the end of my member", but using 20 bits instead of 40. If you need a billion, that's 30 bits instead of 60. If you *can* have a shared counter, it *always* wins over random unless you have an explicit need for random data.

 - Josiah



--

Jay Johnston

unread,
Feb 25, 2014, 11:53:15 AM2/25/14
to redi...@googlegroups.com
Great idea about using a shared counter.


--
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/3BurJghRCk4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.

Josiah Carlson

unread,
Feb 25, 2014, 2:29:06 PM2/25/14
to redi...@googlegroups.com
That was my implication with the use of <id> earlier, though I might have been more explicit. :)

 - Josiah

Whit Armstrong

unread,
Feb 25, 2014, 3:12:56 PM2/25/14
to redi...@googlegroups.com
I'm not sure I get your schema.

How would it look if I needed to store values for USDJPY like this:

20140217 101.92
20140218 102.36
20140219 102.31
20140220 102.28
20140221 102.51
20140224 102.51

How would it look w/ ZSET using the dates as the 'score' such that
subsets can be selected. (notice the last two values are repeating)

-Whit

Josiah Carlson

unread,
Feb 25, 2014, 4:13:53 PM2/25/14
to redi...@googlegroups.com
The Lua script:
local id = redis.call('INCR', KEYS[1])
redis.call('ZADD', KEYS[2], ARGV[1], ARGV[2]..'_'..id)

How to call the Lua script:
EVAL <script> 2 <id key> <ZSET key> <date> <value>


The use of a Lua script is to allow for a single network round trip.

 - Josiah

Whit Armstrong

unread,
Feb 25, 2014, 4:23:08 PM2/25/14
to redi...@googlegroups.com
Pls forgive my ignorance. I haven't used the lua facilities of redis yet.

So, the point is that you use the call:
'id = redis.call('INCR', KEYS[1])'

to get a unique id that you then append to the value of ARGV[2] to
force it to be a unique value within the ZSET?

-Whit


On Tue, Feb 25, 2014 at 4:13 PM, Josiah Carlson

Josiah Carlson

unread,
Feb 25, 2014, 5:17:25 PM2/25/14
to redi...@googlegroups.com
Yes, where ARGV[2] is the "duplicate member" in the ZSET. I appended an underscore and the id because there is no guarantee that your members like "102.51" won't gain extra digits to the right over time, or have them already, but not displayed because you only sent a partial dataset.

 - Josiah

Keith Frost

unread,
Feb 26, 2014, 12:33:08 PM2/26/14
to redi...@googlegroups.com
The shared counter idea is nice. In the past, for time series data in a zset, I have just prepended the timestamp to the values to make the set members, as well as using the timestamp for the score.
Keith Frost

Josiah Carlson

unread,
Feb 26, 2014, 1:10:00 PM2/26/14
to redi...@googlegroups.com
That works if you can guarantee with reasonable certainty that your composite (timestamp, value) is unique. I've never been in that situation, which is why I stick with the id.

The id-based technique is also useful when confronted with problems where you want to map non-unique values to ids, while having them sorted by the score then by the values. In particular, I use it for prefix/suffix matching - I use the score as a 7-character pre-selection method, then use the member to complete the match and to give me the group of ids that match the prefix/suffix match query (there's some fun bits here: https://github.com/josiahcarlson/rom/blob/master/rom/index.py#L308).

 - Josiah


On Wed, Feb 26, 2014 at 9:33 AM, Keith Frost <keith....@gmail.com> wrote:
The shared counter idea is nice.  In the past, for time series data in a zset, I have just prepended the timestamp to the values to make the set members, as well as using the timestamp for the score.
Keith Frost
Reply all
Reply to author
Forward
0 new messages