Redis delete by Pattern is dead Slow

2,817 views
Skip to first unread message

Senthil Kumar

unread,
Dec 15, 2015, 8:06:05 AM12/15/15
to Redis DB
Hi All , I'm working on particular requirement where i need to delete keys in Redis by passing pattern.. 

deleteBypattern works well when the  dbsize = < 2 Million Keys

After loading 90 Million ( 48GB ) Keys into Redis if i try to execute deleteByPattern , the operation is dead slow.. If i use some Async client (Redission ) , after few seconds  IO netty channel throws exception response has been skipped due to timeout .. Same delete Operation  has some performance issue when redis keys increased to 1 Lakhs.. 


Primarily im using Jedis Java Client for all REDIS Commands execution. 

P.S   Set and Get Operations works well.


What could be the cause for deletion ??



Regards,
Senthilkumar

Marc Gravell

unread,
Dec 15, 2015, 8:49:13 AM12/15/15
to redi...@googlegroups.com

There is no "delete by pattern" in Redis. To do what you describe involves using KEYS or (preferably) SCAN to iterate the entire keyspace, and lots of DEL commands. KEYS and SCAN *are* inherently slow on large keyspaces (although SCAN at least allows this to be spread over multiple calls). Personally, I would question why you need a scenario that would involve "delete by pattern". I'd be tempted to partition by database and use FLUSHDB instead (although to be fair: database numbers are technically now obsolete, iirc)

Marc

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Senthil Kumar

unread,
Dec 17, 2015, 12:49:08 PM12/17/15
to Redis DB
Thanks Marc..
delete by pattern can be achieved with lua script ..As you mentioned both scan and keys iterate entire Key space .. In large size key space this will create performance problem.,,


My Original requirement is to maintain  tree structure in redis.. Example ..

Key  & values would be 

/user/senthil/dir1/a.txt      { 20 MB}
/user/senthil/dir1/c.txt      { 20 MB}
/user/senthil/dir1/b.txt      { 20 MB}
/user/senthil/dir1             { 60 MB}
/user/senthil                    { 60 MB}


here i want to delte dir1 means  , i have to perform two steps

1) Delete  /user/senthil/dir1  key 
2) Find all childs of /user/senthil/dir1   directory and delete it.. 

If there any API to fetch keys by pattern /user/senthil/dir1*  would help me delete or unlink ( unlink is new command which is available in unstable version )..



Regards,
Senthil

Marc Gravell

unread,
Dec 17, 2015, 2:35:11 PM12/17/15
to redi...@googlegroups.com

> delete by pattern can be achieved with lua script

That may be the most horrifying use of Lua scripting I've ever heard of. That's like... take the 2 easiest ways to block a Redis server, and **combine them**.

IMO the trick here is to explicitly index the sub-keys so you can just list them directly.

Itamar Haber

unread,
Dec 17, 2015, 3:07:39 PM12/17/15
to redi...@googlegroups.com
I agree with Marc on the principle of indexing keys in advance, but it could be that in this case it is an overkill.

@Senthil - please correct me if I failed to understand, but you are really trying to work with data that's basically paths and their sizes. If that is indeed the case, you can use a Sorted Set to store everything (yeah, all 90M will fit in one Set if you want and will take less memory than KV pairs, but you may want to manage several sets [one for each filesystem/mountpoint/...]). 

    ZADD myfilesystem 0 /user/senthil/dir1/a.txt:20MB 0 /user/senthil/dir1/b.txt:20MB 0 /user/senthil/dir1/c.txt:20MB 0 /user/senthil/dir1:60MB 0 /user/senthil:60MB

If you set things up this way, you can use lexicographical ranges so to "delete a directory", just call `ZREMRANGEBYLEX myfileystem [/usr/senthil/dir1 [/usr/senthil/dir1\xff`.

Itamar Haber | Chief Developer Advocate
Redis Watch Newsletter | Curator and Janitor
Redis Labs | ~ of Redis

Mobile: +1 (415) 688 2443
Office: +1 (650) 461 4652
Mobile (IL): +972 (54) 567 9692
Office (IL): +972 (3) 720 8515 Ext. 123
Email: ita...@redislabs.com
Twitter: @itamarhaber
Skype: itamar.haber

Blog  |  Twitter  |  LinkedIn


Senthil Kumar

unread,
Dec 18, 2015, 1:50:12 AM12/18/15
to Redis DB
Thanks Itamar Haber. Looks like Sorted Set would fit for my requirement ..

Sorry i didnt explain my question properly ..

Here is my data structure:

Key ==> path 
Value ==> { List of values like SIZE , who is parent }

Ex:    /user/senthil/data/cust_data.txt                {"size" : "20MB" , "parentPath":"/user/senthil/data"}
         /user/senthil/data                                      {"size" : "20MB" , "parentPath":"/user/senthil"}
         /user/senthil                                              {"size" : "20MB" , "parentPath":"/user"}
         /user                                                         {"size" : "20MB" , "parentPath":"/"}  

Parent Path is to Query and build the Tree structure ..

If i Jump to Sorted Set from Key Value Pairs , data structure Would be  

Adding Elements :
 ZADD hdfsfilesystem  0 /user/senthil/data/cust_data.txt:20MB:/user/senthil/data
 ZADD hdfsfilesystem  0 /user/senthil/data:20MB:/user/senthil
 ZADD hdfsfilesystem  0 /user/senthil:20MB:/user
 ZADD hdfsfilesystem  0 /user:20MB:/

How do i update  element ??  Example .. Currently   /user/senthil/data/cust_data.txt  size is 20 MB .. If suppose i want to update this Size to 40 MB .. How to update this in Single Call ??
Also how do i query   path /user/senthil/data/cust_data.txt ?? ..



Regards,
Senthil

Itamar Haber

unread,
Dec 18, 2015, 12:05:52 PM12/18/15
to redi...@googlegroups.com
Inline

On Fri, Dec 18, 2015 at 8:50 AM, Senthil Kumar <senthi...@gmail.com> wrote:
Thanks Itamar Haber. Looks like Sorted Set would fit for my requirement ..

Sorry i didnt explain my question properly ..

Here is my data structure:

Key ==> path 
Value ==> { List of values like SIZE , who is parent }


Can we agree that "Value" is some JSON document? Good :)
 
Ex:    /user/senthil/data/cust_data.txt                {"size" : "20MB" , "parentPath":"/user/senthil/data"}
         /user/senthil/data                                      {"size" : "20MB" , "parentPath":"/user/senthil"}
         /user/senthil                                              {"size" : "20MB" , "parentPath":"/user"}
         /user                                                         {"size" : "20MB" , "parentPath":"/"}  

Parent Path is to Query and build the Tree structure ..

I'm unsure that I understand the need parentPath - you can get it from the path IMO - but never mind.
 

If i Jump to Sorted Set from Key Value Pairs , data structure Would be  

Adding Elements :
 ZADD hdfsfilesystem  0 /user/senthil/data/cust_data.txt:20MB:/user/senthil/data
 ZADD hdfsfilesystem  0 /user/senthil/data:20MB:/user/senthil
 ZADD hdfsfilesystem  0 /user/senthil:20MB:/user
 ZADD hdfsfilesystem  0 /user:20MB:/


I would use the following mapping for the members: Key, Value ==> Key:Value, ex:

127.0.0.1:6379> ZADD fs 0 '/user/senthil/data/cust_data.txt:{"size" : "20MB" , "parentPath":"/user/senthil/data"}'
(integer) 1
 
How do i update  element ??  Example .. Currently   /user/senthil/data/cust_data.txt  size is 20 MB .. If suppose i want to update this Size to 40 MB .. How to update this in Single Call ??
 
Also how do i query   path /user/senthil/data/cust_data.txt ?? ..



If by "query" you mean getting the "Value" for that path - that's doable with:

127.0.0.1:6379> ZRANGEBYLEX fs [/user/senthil/data/cust_data.txt: [/user/senthil/data/cust_data.txt\xff
1) "/user/senthil/data/cust_data.txt:{\"size\" : \"20MB\" , \"parentPath\":\"/user/senthil/data\"}"

However, the first thing you need to note is that set members can't be updated - it just doesn't make sense given the definition of a set. To mimic an update you will need to remove the old member and add the new one so there are at least two calls, i.e.:

127.0.0.1:6379> ZREM fs "/user/senthil/data/cust_data.txt:{\"size\" : \"20MB\" , \"parentPath\":\"/user/senthil/data\"}"
(integer) 1
127.0.0.1:6379> ZADD fs 0 '/user/senthil/data/cust_data.txt:{"size" : "40MB" , "parentPath":"/user/senthil/data"}'
(integer) 1

To make life easier, Lua scripts at this point are really helpful. Here's a one that just does the update:

    local path = ARGV[1]
    local name = ARGV[2]
    local val = ARGV[3]

    local cur = redis.call('ZRANGEBYLEX', KEYS[1], '[' .. path .. ':', '[' .. path .. '\xff')
    local cval = cur[1]:sub(path:len()+2)
    local json = cjson.decode(cval)

    json[name] = val

    redis.call('ZREM', KEYS[1], cur[1])
    redis.call('ZADD', KEYS[1], 0, path .. ':' .. cjson.encode(json))

You can run it like this:

    $ redis-cli --eval update.lua fs , /user/senthil/data/cust_data.txt size 40MB

If you need additional logic (e.g. summing up the size along the path), you'll need to code that as well. This approach is generic so you can basically store any list of values under the path. It does carry additional complexity, so if there are values that you're updating very frequently the costs of deserializing and serializing each update could add up. For numerical values, such as size, you'd be better off using a a different Sorted Set to keep just that as the score for each path. Scores can be updated (single call) and the "main" Sorted Set can still be used for range look ups.

Senthil Kumar

unread,
Dec 19, 2015, 3:14:22 AM12/19/15
to Redis DB
Thanks a lot for the detailed answer for all my question  Itamar Haber.. This would help me to solve the issue which im facing currently.. 

--Senthil

Senthil Kumar

unread,
Dec 19, 2015, 4:09:24 AM12/19/15
to Redis DB

I'll try with HashSet and will evaluvate the performance of pattern search (`ZREMRANGEBYLEX myfileystem [/usr/senthil/dir1 [/usr/senthil/dir1\xff` ) command ..

Thanks,
Senthil
Reply all
Reply to author
Forward
0 new messages