Understanding Prefix Compression for Integer Data Type in WiredTiger

63 views
Skip to first unread message

Shreya Ballijepalli

unread,
Oct 4, 2022, 12:25:44 PM10/4/22
to wiredtiger-users
Hello everyone,

I wanted to understand how row-store key prefix compression works for integer data types in WiredTiger. As part of the source code, there is a packing done for each data type and specifically for integer data types, a variable-sized packing is done based on signed/unsigned integers and the range of the integer value. I wanted to understand this and how it is used in prefix compression. 

Thank you,

Shreya

Keith Bostic

unread,
Oct 5, 2022, 10:31:23 AM10/5/22
to wiredtiger-users
Prefix compression is done separately from packing and unpacking, that is, keys are packed or not, and then when rows are stored on the Btree leaf pages, common key prefix bytes are compressed out as they are written onto the page.

Packed integers are lexicographically sorted, that is, once an integer is packed, it can be sorted lexicographically (to simplify comparisons). I would expect that to lead to more prefix compression in some cases.

Shreya Ballijepalli

unread,
Oct 5, 2022, 12:49:55 PM10/5/22
to wiredtiger-users
Thank you for the reply. I want to understand how packing enables lexicographic comparison for integers.
It would be great if there are any reference documents for this.

Keith Bostic

unread,
Oct 5, 2022, 2:23:59 PM10/5/22
to wiredtiger-users
On Wednesday, October 5, 2022 at 9:49:55 AM UTC-7 Shreya Ballijepalli wrote:
I want to understand how packing enables lexicographic comparison for integers.
It would be great if there are any reference documents for this.

I'm afraid I don't know the details, and I don't know if there is any documentation (I'm not part of the WiredTiger group at MongoDB, so I can't say for sure).

alexande...@mongodb.com

unread,
Oct 23, 2022, 7:09:39 AM10/23/22
to wiredtiger-users
Hi Shreya and Keith,

Firstly, Keith: Thanks for the initial responses here - we really appreciate your ongoing support.

> I want to understand how packing enables lexicographic comparison for integers.
> It would be great if there are any reference documents for this.

Unfortunately, we don't have any text-based documentation of our integer packing implementation. However, the source code has a fairly succinct description of the encoding format that is used by WiredTiger here. It calls out that it is space optimized for small values and works for both signed and unsigned values.

Let me know if you have any questions - I'd be happy to answer, and update the description to be more helpful.

- Alex
Reply all
Reply to author
Forward
0 new messages