Quick way to count duplicate prefix

Matt

unread,

Apr 18, 2023, 7:05:44 AM4/18/23

to rocksdb

Hey folks, is there a quick way to count duplicate prefix keys? Like seek to first position and to to last position and math the 2 values to get the count?

Dan Carfas

unread,

Apr 18, 2023, 8:33:44 AM4/18/23

to rocksdb

I asked this in the Speedb hive Discord and got the following answer:

In general, your approach seems fine.
Specifically:
- You need to have a prefix extractor.
- Set the prefix_same_as_start option in the read options given to the iterator used to seek
- Seek to the prefix, iterate via Next() until the iterator becomes invalid and count the number of valid Next() calls

You may find the unit test PrefixValid() in prefix_test.cc useful as it contains code that demonstrates this functionality.

You can find the Speedb hive here and (once you've registered) the link to the thread here if you have more questions or need additional info

Matt

unread,

Apr 18, 2023, 6:37:19 PM4/18/23

to Dan Carfas, rocksdb

Thanks Dan but was hoping we could use math to figure out the count (last-first or something like that) This approach appears to be a sequential count which isn't super fast.

--
You received this message because you are subscribed to the Google Groups "rocksdb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rocksdb+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rocksdb/b3930d93-f510-43af-b783-da8c6bf66bcbn%40googlegroups.com.

Dan Carfas

unread,

Apr 19, 2023, 3:26:32 AM4/19/23

to rocksdb

Another comment from the Speedb Hive:
I don't think there's a way to use math to deduce how many key values are in a prefix just by the first and last since theres no way to tell how many can be in the middle. unless you have some more logic regarding the structure of the keys.

another way that might be helpful is using sst_partitioner_factory . By using this experimental feature, you can partition the ssts based on your desired prefix which means you would only have to tell how many entries are in that sst.

On Wednesday, April 19, 2023 at 1:37:19 AM UTC+3 nguye...@gmail.com wrote:

Thanks Dan but was hoping we could use math to figure out the count (last-first or something like that) This approach appears to be a sequential count which isn't super fast.

On Tuesday, April 18, 2023, 'Dan Carfas' via rocksdb <roc...@googlegroups.com> wrote:
I asked this in the Speedb hive Discord and got the following answer:

In general, your approach seems fine.
Specifically:
- You need to have a prefix extractor.
- Set the prefix_same_as_start option in the read options given to the iterator used to seek
- Seek to the prefix, iterate via Next() until the iterator becomes invalid and count the number of valid Next() calls

You may find the unit test PrefixValid() in prefix_test.cc useful as it contains code that demonstrates this functionality.

You can find the Speedb hive here and (once you've registered) the link to the thread here if you have more questions or need additional info

On Tuesday, April 18, 2023 at 2:05:44 PM UTC+3 nguye...@gmail.com wrote:
Hey folks, is there a quick way to count duplicate prefix keys? Like seek to first position and to to last position and math the 2 values to get the count?

--
You received this message because you are subscribed to the Google Groups "rocksdb" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rocksdb+u...@googlegroups.com.

Reply all

Reply to author

Forward