How to get all items with the same prefix

Skip to first unread message

Chen Wang

unread,
Aug 21, 2014, 4:12:12 PM8/21/14
to redi...@googlegroups.com
Folks,
I was originally using list, and call lrange to get all items. But since I also need to remove an item from the list, and it takes much longer time. I thus decide to switch to use keys with prefix instead:
just put all the messages with a prefix:
key           value 
prefix:1      msg1
prefx:2       msg2
...

My question is, what is the most efficient way to get all the messages with the same prefix? Do I need to get all the Keys first, and then do a mget? What if it has millions of records? is there a way to do paging like retrieval?
Thanks,
Chen

Josiah Carlson

unread,
Aug 21, 2014, 4:19:42 PM8/21/14
to redi...@googlegroups.com
If you are using the latest version of Redis 2.8, look into ZRANGEBYLEX.

Each of the entries in this ZSET should be of the form: <prefix><separator><message>, and scores can all be 0. From there, you just use ZRANGEBYLEX to get entries with a given prefix, then you can extract your messages as necessary.

 - Josiah



--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Chen Wang

unread,
Aug 21, 2014, 5:39:07 PM8/21/14
to redi...@googlegroups.com
Thanks Josiah.
but if I use ZSET the removal takes much longer than DEL, especially when there are millions entries in the set, right? So If the removal happens almost for every message, and the remaining messages(unremoved message)  are only thousands, then I would still better use PREFIX:KEY Value, and use KEYS | MGET to loop through all the left over messages, right?
Thanks,
Chen

Jan-Erik Rediger

unread,
Aug 21, 2014, 5:46:29 PM8/21/14
to redi...@googlegroups.com
Don't use KEYS. KEYS is slow. KEYS will block your server, especially if
you have millions of keys. KEYS needs to go trough each and every key
that is in your database, no matter if it actually matches your pattern
or not. That's what makes it slow.

If you really want to iterate the keyspace, use SCAN[1], it will give
you some keys in each iteration, but won't block your instance for that
long.

ZRANGEBYLEX will be slow, too, if you have really a lot keys in your
set.

Best is to test which approach works better for you.

[1] http://redis.io/commands/scan
> >> email to redis-db+u...@googlegroups.com <javascript:>.
> >> To post to this group, send email to redi...@googlegroups.com
> >> <javascript:>.

Chen Wang

unread,
Aug 21, 2014, 7:03:46 PM8/21/14
to redi...@googlegroups.com, jan...@fnordig.de
On second thought, maybe I could use HSET? 
HSET procesingmessages id1 messagejson
HSET procesingmessages id2 messagejson
Then I would use HGETALL in one shot, and deletion a single message by id should be O(1) as well.
Will that work?
Thanks,
Chen

Chen Wang

unread,
Aug 21, 2014, 7:11:40 PM8/21/14
to redi...@googlegroups.com, jan...@fnordig.de
It seems that I cannot specify the hash key..so HSET might have problem if I have millions of messages..

Josiah Carlson

unread,
Aug 21, 2014, 8:14:17 PM8/21/14
to redi...@googlegroups.com
How about instead of us going back and forth a dozen times trying to figure out what you are trying to do, you give us some information about what you are doing.

What goal are you trying to accomplish overall? How much data do you have? How often do you query? How often do you get new data? How often do you delete old data? How much data are you adding/deleting at a time? What types of prefixes are you searching over? Have you grouped the set of operations you need to perform into your own informal API? Can you disclose that API?

 - Josiah

Chen Wang

unread,
Aug 21, 2014, 10:14:08 PM8/21/14
to redi...@googlegroups.com
The case is actually simple: i am using redis as a recovery queue.
I read data from Kafka, putting each messages into Redis(what I call processing queue), and also add a expire lock on it:

pipeline.set(Helper.getProcessingRedisKey(signal.topic, key),message);

pipeline.setex(Helper.getLockKey(key), 60, Integer.toString(60));

 then the message will only be removed once it is processed successfully in the downstream thread.

Then I have a daemon process, monitoring the processing queue, if any item does not have a lock on it,it means the downstream thread might have crashed. Thus I need to reprocess it.

Note that a day's data is divided into 10 minutes block, and each 10 minutes can contain at most 2million messages. Since one day has 24*6 = 144 blocks (of 10 minutes) I am using different prefix for each 10 minute block. The daemon wakes up periodically, say every 1hour, and check all the past 10 min block. For each block:
check the processing queue (scan)  if any item does not have a lock on it, resubmit it.

So as you can see, in worst case the processing queue for a single 10 minutes block can contain as many as 2million messages without lock.(all downstream process died) But normally they would be only in 1000~10000. But again, what if the daemon dies, and restarts at the end of day: which means it will need to scan all the past 144 blocks. If each block contains 1k, then again it will be 144k messages. 

Performance is my main concern. The prefix is this: "processing_Thursday_100:981568d7-4008-4ce9-8606-a55cb3e3a927"
processing_dayofWeek_No.OfBlock:messageId

Let me know what you think
Chen

Josiah Carlson

unread,
Aug 22, 2014, 12:50:27 AM8/22/14
to redi...@googlegroups.com
I described how you can implement a 1+ queue in Redis that is efficient for what you are doing 2 days ago: https://groups.google.com/forum/#!topic/redis-db/PVVfoh9rnBs

If you have questions about what I described in that reply, feel free to ask. With very slight modifications to what I described, it's not difficult to add support for both the key and message. I can describe how you would do that if you are having problems on your own. You can reply in this thread or the other thread. If you've got different semantics than what that queue can offer, describe where it doesn't fit that queue, and I am sure I can modify the queue to support your requirements.

 - Josiah
Reply all
Reply to author
Forward
0 new messages