Inputs on data structuring

35 views
Skip to first unread message

Sriram Girivasan

unread,
May 6, 2015, 7:38:52 AM5/6/15
to redi...@googlegroups.com
Hi

I am planning to maintain information on redis for my use case as follows: 

1. hashset : name -  sessionID   ("session:1" for example)
                   i. field1 - version
                   ii. field2 - IP Address
                   ....
                   viii. field8 - Received Time

Command ----  hmset session:1 version 0 IPAddress 192.168.1.1 messageType 1 .... recvdTime "Thu Apr 30 18:46:15 IST 2015" 

2. Another HashSet: name - Sessions
                           
Command ----  hmset Sessions b039408940983 "session:1" b039408940984 "session:2" ...  



My problem is : I need to remove/delete sessions if its 'recvdTime' is less than a threshold ( i.e. if the session has been in the database for longer than a specified amount of time)

How do I go about organising the data? 

i. 1 Approach was to set the TTL. But i heard it works only for normal key/value (& not for keys inside hashes or hashes themselves)

ii. 2nd approach was to sort the hashset based on recvdTime (if time is stored as a long value) OR use a separate sorted set with score as recvdTime. But then, i need to find the session till which i have to remove from the hashset

Thanks in advance!

Regards,
G Sriram


Itamar Haber

unread,
May 6, 2015, 7:58:56 AM5/6/15
to redi...@googlegroups.com
There all kinds of tradeoffs that you need to choose from, some which are quite relevant to your task. I'll try addressing the major ones below in the order they were insinuated:
* General: Hash field names can be shortened to save significant amounts of RAM
* Consider using a dedicated (String) key - e.g. sessionID:time - for storing the session's last touch. You'll be able to TTL that one normally and use its existence (or lack of) in managing your access to the session data. This is somewhat similar to your approach #1.
* What is the Sessions Hash used for? I can see that it maps a hash-like id to a human-legible name but why do that? You can use the hash-id as your sessionID for the same result with less time/space complexity.
* The only possible use of Sessions that I can come up with is to have a quick way to retrieve all sessions in the database (instead of doing a SCAN for example). When this type of querying is needed, a better option is to use a regular Set instead.
* SORT (approach #2a), if usable here, won't be efficient in terms of complexity IMO
* Alternatively, Sorted Sets (@2b) are really good for keeping track of time so another common pattern is to map times to IDs with them and manually "expire" set members (periodically or upon accessing it). 

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.



--

Itamar Haber | Chief Developers Advocate
Redis Watch Newsletter - Curator and Janitor
Redis Labs - Enterprise-Class Redis for Developers

Mobile: +1 (415) 688 2443
Mobile (IL): +972 (54) 567 9692
Email: ita...@redislabs.com
Skype: itamar.haber

Blog  |  Twitter  |  LinkedIn


Sriram Girivasan

unread,
May 8, 2015, 2:16:57 AM5/8/15
to redi...@googlegroups.com
Thanks a lot for your response.

I'm using the Session Hash because I am maintaining another relation between IPAddress and the sessionIDs that are supported by that device. So dont want to use that long sessionID (be90839090..) everywhere.

Just for example:
hmset Session:1 .. . .  IPAddress 192.168.1.1  .....
sadd 192.168.1.1 Session:1 Session:3 ...

And, say I am using sorted sets with session ID as the member and the recvdTime as the score. 
Is manually looking up this set by order (from earliest session), finding whether the session has expired and then removing the hashset a costly operation? How does this compare with using a DB and making a query like "Remove session from Sessions where (currentTime - session.recvdTime) > a value"

Thanks,
G Sriram

Itamar Haber

unread,
May 8, 2015, 8:44:40 AM5/8/15
to redi...@googlegroups.com
Inline

On Fri, May 8, 2015 at 9:16 AM, Sriram Girivasan <sriramg...@gmail.com> wrote:
Thanks a lot for your response.

I'm using the Session Hash because I am maintaining another relation between IPAddress and the sessionIDs that are supported by that device. So dont want to use that long sessionID (be90839090..) everywhere.

Just for example:
hmset Session:1 .. . .  IPAddress 192.168.1.1  .....
sadd 192.168.1.1 Session:1 Session:3 ...


Ok, now it makes sense - thanks for completing the puzzle :)
 
And, say I am using sorted sets with session ID as the member and the recvdTime as the score. 
Is manually looking up this set by order (from earliest session), finding whether the session has expired and then removing the hashset a costly operation? 
 
If you do the following you get away pretty cheap in terms of computational complexity (at the cost of some space complexity):

1. Add session
O(log(N)): ZADD session:times <time> <sid>

2. Get session time
O(1): ZSCORE session:times <sid>

3. All inactive sessions
Read: O(log(N)+M): ZRANGEBYSCORE sessions:times -inf <now - expiry_value>
Delete: O(log(N)+M): ZREMRANGEBYSCORE sessions:times -inf <now - expiry_value>

4. "Expire" a session
a. O(log(N)): ZREM sessions:times <sid>
b. O(1): DEL <sid>
c. O(1): SREM <ip> <sid>

Session expiry (4) parts can be called at different times, depending on your requirements and the data, to build practically every kind of expiry strategy.

How does this compare with using a DB and making a query like "Remove session from Sessions where (currentTime - session.recvdTime) > a value"

 
Comparing this to another database, whether SQLish or not, really depends on the other database, how the data is modeled and about a gazillion other things. I'd venture a guess and say that since Redis is pretty efficient and blazing fast, it will be the clear winner any time but without actually comparing concrete use cases ("benchmarking") any answer would be essentially meaningless :)
Reply all
Reply to author
Forward
0 new messages