Should I use HTreeMap or BTreeMap

644 views
Skip to first unread message

Ezequiel Surijon

unread,
Jul 4, 2016, 9:52:16 AM7/4/16
to MapDB
Hi,

I'm newbie with MapDB, I'm currently developing a system that requires a key/value durable storage of about 250G of records. I'm considering to use MapDB to provide such functionality but I have several doubts

This system is meant to process concurrently large amount of transactions. For this system a transaction means to lookup for a key and if such entry exists then replace that record by two new records. As difference from many scenarios in which read/write ratio is in order 100:1 in this case the system will have: one read, one delete, and two writes per transaction. Also less than a 10% of the DB will fit in memory so most of reads will miss cache. Due to this  definition of transaction, it will be necessary to synchronize access to the storage outside MapDB so this system does not require MapDB to be thread safe.
Record size is supposed to about few bytes. Record key is a compound key, made of two integers and a date. 

Based on above system requirements:

Is MapDB a superset of JDBM3? Should I prefer to use MapDB or JDBM3, since for example concurrency will be handled by an upper layer? Does MapDB suffer any significant penalty by supporting concurrency itself? Has MapDB better support than JDBM3? 

Regarding performance should I prefer to create a compound key and let MapDB to handle it or I better prefer to use a byte[] key and compose it in upper layer?   

Regarding durability which are the steps to ensure each transaction is persisted: enable transactions, map.put() and then db.commit() ? I'm asking that since I read that values serialization happens in background which I understand as map.put() returns before entries a written into disk.  

Due to the nature of record structure I think BTreeMap is the right choice. Since 250G of records means about 40 level in a binary tree I think I should increase maxNodeSize. Do you know a good value for this relation keys_per_node/levels_qty? What about compression it's worthy to enable compression if entry does not exceed 1Kb?    

I'm not quite sure how much memory MapDb consumes. As I mentioned before I will ser MapDB to not cache entries in memory, however refers to key values. What about the tree index, how much memory does it consumes? For example BerkeleyDB keeps the whole index in memory to accelerate lookup 

Regarding fragmentation in BTreeMap aside from storage does it  also affect lookup speed? 

Thanks in advance.

Scott Carey

unread,
Jul 7, 2016, 1:53:16 PM7/7/16
to MapDB
MapDB might not be the best for such write-intensive workloads.   However, I'm sure it can beat BDBs.

More comments inline, below:


On Monday, July 4, 2016 at 6:52:16 AM UTC-7, Ezequiel Surijon wrote:
Hi,

I'm newbie with MapDB, I'm currently developing a system that requires a key/value durable storage of about 250G of records. I'm considering to use MapDB to provide such functionality but I have several doubts

This system is meant to process concurrently large amount of transactions. For this system a transaction means to lookup for a key and if such entry exists then replace that record by two new records. As difference from many scenarios in which read/write ratio is in order 100:1 in this case the system will have: one read, one delete, and two writes per transaction. Also less than a 10% of the DB will fit in memory so most of reads will miss cache. Due to this  definition of transaction, it will be necessary to synchronize access to the storage outside MapDB so this system does not require MapDB to be thread safe.
Record size is supposed to about few bytes. Record key is a compound key, made of two integers and a date. 

Based on above system requirements:

Is MapDB a superset of JDBM3? Should I prefer to use MapDB or JDBM3, since for example concurrency will be handled by an upper layer? Does MapDB suffer any significant penalty by supporting concurrency itself? Has MapDB better support than JDBM3? 

MapDB 1.x as far as I know should be an improvement over JDBM3.  I have not used 3.0 yet.  I use it with concurrent reads and a few writes.
 
Regarding performance should I prefer to create a compound key and let MapDB to handle it or I better prefer to use a byte[] key and compose it in upper layer?

It depends.  I do my own serialization at this point, but that does cause more garbage (often unnecessary byte[] created on the heap). 
 

Regarding durability which are the steps to ensure each transaction is persisted: enable transactions, map.put() and then db.commit() ? I'm asking that since I read that values serialization happens in background which I understand as map.put() returns before entries a written into disk.  

You don't need to enable transactions unless you require independent ones that aggregate changes over multiple put/delete etc.   If you only want to ensure durability, transactions disabled is fine -- durability is guaranteed after commit() is called.  In this way you could do three or four put() calls, then call commit() to ensure they are persisted.  If you need those multiple put() calls to be atomic and commit or rollback together, you'll need transactions which are a bit more complicated to use.
 

Due to the nature of record structure I think BTreeMap is the right choice. Since 250G of records means about 40 level in a binary tree I think I should increase maxNodeSize. Do you know a good value for this relation keys_per_node/levels_qty? What about compression it's worthy to enable compression if entry does not exceed 1Kb?

I'm not quite sure how much memory MapDb consumes. As I mentioned before I will ser MapDB to not cache entries in memory, however refers to key values. What about the tree index, how much memory does it consumes? For example BerkeleyDB keeps the whole index in memory to accelerate lookup 

Disable the cache.  Use your own weak reference cache containing your own deserialized objeccts, if you need to.  (I use guava's Cache configured for weak values, and my own serialization, this cache holds what is deserialized from byte[]s -- it performed the best in high load performance tests on a dataset 10x larger than RAM).

At least with 1.0.x, if your dataset is lsignificantly arger than RAM, memory mapped files are  slower for some reason, don't use them.  3.0 may  have fixed this, I have not tested.


Regarding fragmentation in BTreeMap aside from storage does it  also affect lookup speed?

Yes, it affects storage and will affect lookup speed because it will touch more RAM / file regions on average.

For this sort of workload, HTreeMap is probalby going to perform better.  It will fragment less and have more consistent lock characteristics.  On the other hand, if you have too many keys it can run into birthday paradox issues since a 32 bit hash will collide more than you want.   Either way, you should be able to switch between both of them to test if you write your code right.   As long as you don't rely on ordered traversal of BTreeMap or the unique features of HTreeMap.

Thanks in advance.

Ezequiel Surijon

unread,
Jul 9, 2016, 10:23:32 PM7/9/16
to ma...@googlegroups.com

Scott, thanks for your dedicated answer.
I was wondering why you suggest Mapdb may not be the best choice

I ve been evaluating apache lucene, Berkeley db and sql lite as well. However I thought mapdb will be a better choice
Do you know any other libaraey I should consider? Please keep in mind I need to lookup for a record before to insert any new data

--
You received this message because you are subscribed to a topic in the Google Groups "MapDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mapdb/DkJz8Afq66s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mapdb+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christian MICHON

unread,
Jul 11, 2016, 10:30:03 AM7/11/16
to MapDB
Have a look either at persistit or at h2 mvstore.

Hope this helps

Ezequiel Surijon

unread,
Jul 11, 2016, 11:18:38 AM7/11/16
to ma...@googlegroups.com
Christian, 

Seems that conceptually both libraries are similar to MapdDB. Am I right? 
Besides implementation details, do you think any of those libraries has a feature which can make them a better choice?      



              Saludos y gracias,
              Ezequiel Surijon.

On Mon, Jul 11, 2016 at 11:30 AM, Christian MICHON <christia...@gmail.com> wrote:
Have a look either at persistit or at h2 mvstore.

Hope this helps

Christian MICHON

unread,
Jul 11, 2016, 10:22:24 PM7/11/16
to MapDB
Conceptually both are similar to mapdb.

Yet:
- persistit consumes almost no memory but requires configuration as per your needs like max disk space to be used. It's difficult to change configuration so you need to pa attention to this.
- h2 mvstore stores previous values as well. You can turn this off by configuring how many versions of values should be stored but I never did it. H2 mvstore consumes a lot of memory as mapdb.

Good luck

Jan Kotek

unread,
Jul 12, 2016, 5:20:00 AM7/12/16
to ma...@googlegroups.com
response inlined bellow

On Mon, 2016-07-04 at 06:52 -0700, Ezequiel Surijon wrote:
Hi,

I'm newbie with MapDB, I'm currently developing a system that requires a key/value durable storage of about 250G of records. I'm considering to use MapDB to provide such functionality but I have several doubts

This system is meant to process concurrently large amount of transactions. For this system a transaction means to lookup for a key and if such entry exists then replace that record by two new records. As difference from many scenarios in which read/write ratio is in order 100:1 in this case the system will have: one read, one delete, and two writes per transaction. Also less than a 10% of the DB will fit in memory so most of reads will miss cache. Due to this  definition of transaction, it will be necessary to synchronize access to the storage outside MapDB so this system does not require MapDB to be thread safe.
Record size is supposed to about few bytes. Record key is a compound key, made of two integers and a date. 

If key does not change, use BTreeMap with external values enabled. 



Based on above system requirements:

Is MapDB a superset of JDBM3? Should I prefer to use MapDB or JDBM3, since for example concurrency will be handled by an upper layer? Does MapDB suffer any significant penalty by supporting concurrency itself? Has MapDB better support than JDBM3? 

MapDB is rewritten JDBM3 to support concurrent threads. Locks have about 10% overhead on single-threaded use. There is an option to completely disable locks and to make MapDB thread unsafe.


Regarding performance should I prefer to create a compound key and let MapDB to handle it or I better prefer to use a byte[] key and compose it in upper layer?   

Use Object[] with proper serializer. Serializers in MapDB provide hashcode and comparator



Regarding durability which are the steps to ensure each transaction is persisted: enable transactions, map.put() and then db.commit() ? I'm asking that since I read that values serialization happens in background which I understand as map.put() returns before entries a written into disk.  

MapDB does not have background writer enabled by default. It is also not implemented in 3.0 yet. 
Transactions are disabled by defualt in 3.0


Due to the nature of record structure I think BTreeMap is the right choice. Since 250G of records means about 40 level in a binary tree I think I should increase maxNodeSize. Do you know a good value for this relation keys_per_node/levels_qty? What about compression it's worthy to enable compression if entry does not exceed 1Kb?    

Wrap value serializer in SerializerCompressionWrapper, that will compress values. Delta compression could be used on keys if you use BTreeMap.

Max Node Size depends on size of your data. I think starting with HTreeMap could be better


I'm not quite sure how much memory MapDb consumes. As I mentioned before I will ser MapDB to not cache entries in memory, however refers to key values. What about the tree index, how much memory does it consumes? For example BerkeleyDB keeps the whole index in memory to accelerate lookup 

MapDB does not consume any memory. It loads entries lazily. 

You could enable memory mapped files to allow OS to cache some pages in memory. 
It could be also benefitial to split keys (index) and values into two separate stores. 



Regarding fragmentation in BTreeMap aside from storage does it  also affect lookup speed? 

Yes, but BTreeMap fragmentation is only problem if you delete lot of data. Not if BTreeMap  size grows which is your case.



Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "MapDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.

Scott Carey

unread,
Jul 13, 2016, 6:47:44 PM7/13/16
to MapDB


On Saturday, July 9, 2016 at 7:23:32 PM UTC-7, Ezequiel Surijon wrote:

Scott, thanks for your dedicated answer.
I was wondering why you suggest Mapdb may not be the best choice


I may have mis-read your use case.  If you are deleting data and have a lot of 'key churn' MapDB will bloat, especially BTreeMap.  For long lived indexes with a lot of key churn it is more delicate.   It is also not the most optimized of the various choices on the write-path, if you are very write intensive.
 

I ve been evaluating apache lucene, Berkeley db and sql lite as well. However I thought mapdb will be a better choice


Oh it will kill berkeley db almost every time.  h2, or C libraries with wrappers (like LMDB) are the real competitors.  If you create and remove a LOT of keys (high volume writes/overwrites/deletes with small values) then something that is LSM tree based might be better.

MapDB is a great place to start, but your use case may require some tuning and tweaking to get it to work well for you.
 

Ezequiel Surijon

unread,
Jul 13, 2016, 10:37:56 PM7/13/16
to ma...@googlegroups.com

Scott. Thanks for your comments.

As it happens in many systems, peek load came in bursts for small time windows. I just have to ensure the system will be capable of perform well in such scenarios.

For example having a 250gb records store, and a burst of 100 k transactions ( one read, one delete and two writes), I think it will not affect to much index traversal. I think I can afford this kond of index degradation and run maintenance cleanups periodically when system is idle .

Reply all
Reply to author
Forward
0 new messages