getLongSizeInBytes of RoaringBitmap - internal hard-coded values unclear

42 views
Skip to first unread message

Lawan Subba

unread,
Aug 14, 2019, 11:05:27 AM8/14/19
to Roaring Bitmaps
Hi All,

In the RoaringBitmap class, there is a method to find the size of a RoaringBitmap in bytes. This method calls the method getSizeInBytes in the classes: ArrayContainer, BitmapContainer and RunContainer.

--------------------------------------------------------------
0  public long getLongSizeInBytes() {
1    long size = 8;
2    for (int i = 0; i < this.highLowContainer.size(); i++) {
3      final Container c = this.highLowContainer.getContainerAtIndex(i);
4      size += 2 + c.getSizeInBytes();
5    }
6    return size;
7  }

--------------------------------------------------------------
0  public int getSizeInBytes() {
1    return this.cardinality * 2 + 4;
2  }

--------------------------------------------------------------
0  public int getSizeInBytes() {
1    return this.bitmap.length * 8;
2  }

--------------------------------------------------------------
0  public int getSizeInBytes() {
1    return this.nbrruns * 4 + 4;
2  }

According to my reading of the RoaringBitmap papers, the 8 bytes and 2 bytes at lines 1 and 4 of class RoaringBitmap are the sizes of the Cookie header(8 bytes) and the Descriptive header key(2 bytes). Please let me know if this is incorrect.

Furthermore, I am unclear about why 4 bytes are added to the sizes of the ArrayContainer and RunContainer?

Regards,
Lawan Subba

Daniel Lemire

unread,
Aug 14, 2019, 11:59:25 AM8/14/19
to Roaring Bitmaps
I am guessing that you refer to getLongSizeInBytes.

This means tries to find a reasonable estimate for the memory usage. It is a heuristic. If you need to account
for details (+/- 4 bytes here and there), then please do not use this method.

Please refer to the documentation of the method...

   * Estimate of the memory usage of this data structure. This can be expected to be within 1% of
   * the true memory usage in common usage scenarios.
   * If exact measures are needed, we recommend using dedicated libraries
   * such as ehcache-sizeofengine.
   *
   * In adversarial cases, this estimate may be 10x the actual memory usage. For example, if
   * you insert a single random value in a bitmap, then over a 100 bytes may be used by the JVM
   * whereas this function may return an estimate of 32 bytes.
   *
   * The same will be true in the "sparse" scenario where you have a small set of
   * random-looking integers spanning a wide range of values.

- Daniel
Reply all
Reply to author
Forward
0 new messages