Understanding DeleteRange

81 views
Skip to first unread message

Christopher Zell

unread,
Apr 19, 2023, 2:36:12 AM4/19/23
to rocksdb
Hey folks,

sorry if this is already answered somewhere, I haven't found a good fitting answer yet.

I was searching for a way to delete several keys with a specific prefix. Right now we seek to the prefix and iterate over them to delete each key one by one, which is a performance issue.

I was reading about DeleteRange https://rocksdb.org/blog/2018/11/21/delete-range.html where exactly these use cases are described in the first section, e.g.

> MyRocks is a MySQL fork using RocksDB as its storage engine. Each key’s first four bytes identify the table or index to which that key belongs. Thus dropping a table or index involves deleting all the keys with that prefix.

Based on that I thought this API would be a good fit, but when checking the API https://javadoc.io/static/org.rocksdb/rocksdbjni/6.20.3/org/rocksdb/AbstractWriteBatch.html#deleteRange(byte%5B%5D,%20byte%5B%5D) it seems to expect a start and end key, so really only to delete between a range of keys.

I'm wondering if I only know the prefix, how I can delete with this prefix all keys or is there a different API? Anyone an idea?

Greets
Chris

Adam Retter

unread,
Apr 19, 2023, 5:19:27 AM4/19/23
to Christopher Zell, rocksdb
> Based on that I thought this API would be a good fit, but when checking the API https://javadoc.io/static/org.rocksdb/rocksdbjni/6.20.3/org/rocksdb/AbstractWriteBatch.html#deleteRange(byte%5B%5D,%20byte%5B%5D) it seems to expect a start and end key, so really only to delete between a range of keys.
>
> I'm wondering if I only know the prefix, how I can delete with this prefix all keys or is there a different API? Anyone an idea?

So depending on the design of your keys, your prefix and the range you
wish to delete you may still be able to achieve this with the
DeleteRange function.
The key is remembering that the end value for DeleteRange is
exclusive, and that when seeking for a key, if you provide a key that
does not exist, RocksDB will seek to the next one (in sorted string
order).

So for example, if you have keys like

1229AA
1230AA
1230AB
1230AC
1230AD
1231AA
1231AB

And you wanted to delete everything with the prefix 1230 you could do
(psuedocode):

prefix = "123"
DeleteRange(prefix + "000", prefix + "100")

Is that helpful?

If you are not sure what the startKey should be - you could seek for it first

--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Christopher Zell

unread,
Apr 19, 2023, 8:12:33 AM4/19/23
to rocksdb
Hey Adam,

thanks again for your quick and useful response!

Just to clarify, so I get it right. Do the keys need to have all the same length?

Lets say I have:

1229AA
1230A
1230ABCD
1230ACDEF
1230AD
1231AA
1231AB

I guess the order would be also different? Probably like this?

1230A
1229AA
1230AD
1231AA
1231AB
1230ABCD
1230ACDEF

Would the deleteRange still work with the prefix? 

So could I do:

(psuedocode):

prefix = "1230"
DeleteRange(prefix, prefix + 1)

So I guess it would either delete nothing, because there is no key with this, or delete all because all in the range?

What I actually want is to delete all which begin with: 1230, that the result is:

1229AA
1231AA
1231AB

Does this make sense?

Thank you and best regards
Chris 

Mark Rambacher

unread,
Apr 19, 2023, 10:40:20 AM4/19/23
to Christopher Zell, rocksdb
Chris,

I believe your original order is correct.  Why do you suspect "1230A" comes before "1229AA"?

I think your pseudo-code is off a little, but something similar should work:
prefix = "123"
DeleteRange(prefix + "0", prefix + "1")

Your original code would delete keys between "1230" and "12301".

The keys do not need to be the same length, only start with the same initial sequence.

/ Mark


--
You received this message because you are subscribed to the Google Groups "rocksdb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rocksdb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rocksdb/84947b89-2e8c-4db0-84d1-25dd88f91737n%40googlegroups.com.

Adam Retter

unread,
Apr 19, 2023, 11:24:44 AM4/19/23
to Christopher Zell, rocksdb
> Just to clarify, so I get it right. Do the keys need to have all the same length?

No.


> Lets say I have:
>
> 1229AA
> 1230A
> 1230ABCD
> 1230ACDEF
> 1230AD
> 1231AA
> 1231AB
>
> I guess the order would be also different? Probably like this?
>
> 1230A
> 1229AA
> 1230AD
> 1231AA
> 1231AB
> 1230ABCD
> 1230ACDEF

No. Unless, you have specified your own Comparator, then
BytewiseComparatorImpl will be used.

So the (sorted) order of your above keys would be:

1229AA
1230A
1230ABCD
1230ACDEF
1230AD
1231AA
1231AB

> Would the deleteRange still work with the prefix?
>
> So could I do:
>
> (psuedocode):
>
> prefix = "1230"
> DeleteRange(prefix, prefix + 1)
>
> So I guess it would either delete nothing, because there is no key with this, or delete all because all in the range?

I "guess" that would delete all the keys starting with "1230". - Why
not give it a try?

Christopher Zell

unread,
Apr 19, 2023, 2:46:41 PM4/19/23
to rocksdb
Hey both,

sorry I somehow had the wrong assumption and thought that the length is taken into account I guess (?), totally forget that it is byte-wise :)

>I "guess" that would delete all the keys starting with "1230". - Why not give it a try?

Yes totally! just wanted to check before I start. I will do some experiments now, thank you both :)

Greets
Chris
Message has been deleted
Message has been deleted
Message has been deleted

Christopher Zell

unread,
Apr 22, 2023, 3:02:21 AM4/22/23
to rocksdb
Hey everyone, 

I experimented now a bit with the #deleteRange and have some observations and follow-up questions.

First, it looks like that OptimisticTransaction doesn't support DeleteRange. It is not part of the Java API. Is it just a limitation of the Java API or is this in general the case?

 I was able to "work around" this via getCommitTimeWriteBatch, which has deleteRange. But then nothing happened. It looks like the changes to the commitTimeWriteBatch are not reflected in the transaction.
I found this comment, which might explain this. Unfortunately, there is no use_only_the_last_commit_time_batch_for_recovery option on the Java API. 

I thought getCommitTimeWriteBatch will return me the underlying WriteBatch. Or the method getWriteBatch which returns WriteBatchWithIndex, but it also doesn't support the deleteRange.

I tried to apply the commitTimeWriteBatch again to the transaction via rebuildFromWriteBatch
But then it fails with:  DeleteRangeCF not implemented

Do I oversee something or is not possible or not supported to use DeleteRange with OptimisticTransaction(DB)?

We are thinking about going back to using WriteBatch or WriteBatchWithIndex in general, instead of transactions, which might be another reason if this is the case.

Happy for any input you have.

Greets
Chris

Adam Retter

unread,
May 3, 2023, 7:36:44 AM5/3/23
to Christopher Zell, rocksdb
> First, it looks like that OptimisticTransaction doesn't support DeleteRange. It is not part of the Java API. Is it just a limitation of the Java API or is this in general the case?

It looks like it is unfortunately not implemented in the C++ API or
Implementation of any of the Transaction classes. It would have to be
implemented there first before it can be exposed via the Java API.

> I was able to "work around" this via getCommitTimeWriteBatch, which has deleteRange. But then nothing happened. It looks like the changes to the commitTimeWriteBatch are not reflected in the transaction.
> I found this comment, which might explain this. Unfortunately, there is no use_only_the_last_commit_time_batch_for_recovery option on the Java API.

I think you should avoid commitTimeWriteBatch entirely (as it bypasses
concurrency control) if you want to use the pre-provided Transaction
methods; otherwise you may as well just roll your own transaction
mechanisms on top of Snapshot and WriteBatchWithIndex.

Christopher Kujawa

unread,
May 3, 2023, 7:43:07 AM5/3/23
to rocksdb
Hey Adam,

thanks for confirming. Yes, we currently looking into whether we could migrate to WriteBatchWithIndex, but still, it also seems not to be supported there either.

See https://github.com/facebook/rocksdb/blob/main/java/src/main/java/org/rocksdb/WriteBatchWithIndex.java#L323-L329 and my previous comment, that we receiving: DeleteRangeCF not implemented when writing the batch.

Greets and thanks for your support,

Chris

Adam Retter

unread,
May 3, 2023, 8:54:53 AM5/3/23
to Christopher Kujawa, rocksdb
> thanks for confirming. Yes, we currently looking into whether we could migrate to WriteBatchWithIndex, but still, it also seems not to be supported there either.
>
> See https://github.com/facebook/rocksdb/blob/main/java/src/main/java/org/rocksdb/WriteBatchWithIndex.java#L323-L329 and my previous comment, that we receiving: DeleteRangeCF not implemented when writing the batch.

We (Evolved Binary) previously contributed a PR for this facility to
the RocksDB project here -
https://github.com/facebook/rocksdb/pull/8260
We would be happy to revive it if it was to be supported by the
RocksDB developers. Without such support we cannot get such changes
merged into RocksDB.
Reply all
Reply to author
Forward
0 new messages