Hi,I'm newbie with MapDB, I'm currently developing a system that requires a key/value durable storage of about 250G of records. I'm considering to use MapDB to provide such functionality but I have several doubtsThis system is meant to process concurrently large amount of transactions. For this system a transaction means to lookup for a key and if such entry exists then replace that record by two new records. As difference from many scenarios in which read/write ratio is in order 100:1 in this case the system will have: one read, one delete, and two writes per transaction. Also less than a 10% of the DB will fit in memory so most of reads will miss cache. Due to this definition of transaction, it will be necessary to synchronize access to the storage outside MapDB so this system does not require MapDB to be thread safe.Record size is supposed to about few bytes. Record key is a compound key, made of two integers and a date.Based on above system requirements:Is MapDB a superset of JDBM3? Should I prefer to use MapDB or JDBM3, since for example concurrency will be handled by an upper layer? Does MapDB suffer any significant penalty by supporting concurrency itself? Has MapDB better support than JDBM3?
Regarding performance should I prefer to create a compound key and let MapDB to handle it or I better prefer to use a byte[] key and compose it in upper layer?
Regarding durability which are the steps to ensure each transaction is persisted: enable transactions, map.put() and then db.commit() ? I'm asking that since I read that values serialization happens in background which I understand as map.put() returns before entries a written into disk.
Due to the nature of record structure I think BTreeMap is the right choice. Since 250G of records means about 40 level in a binary tree I think I should increase maxNodeSize. Do you know a good value for this relation keys_per_node/levels_qty? What about compression it's worthy to enable compression if entry does not exceed 1Kb?I'm not quite sure how much memory MapDb consumes. As I mentioned before I will ser MapDB to not cache entries in memory, however refers to key values. What about the tree index, how much memory does it consumes? For example BerkeleyDB keeps the whole index in memory to accelerate lookup
Regarding fragmentation in BTreeMap aside from storage does it also affect lookup speed?
Thanks in advance.
Scott, thanks for your dedicated answer.
I was wondering why you suggest Mapdb may not be the best choice
I ve been evaluating apache lucene, Berkeley db and sql lite as well. However I thought mapdb will be a better choice
Do you know any other libaraey I should consider? Please keep in mind I need to lookup for a record before to insert any new data
Hope this helps
Have a look either at persistit or at h2 mvstore.
Hope this helps
Yet:
- persistit consumes almost no memory but requires configuration as per your needs like max disk space to be used. It's difficult to change configuration so you need to pa attention to this.
- h2 mvstore stores previous values as well. You can turn this off by configuring how many versions of values should be stored but I never did it. H2 mvstore consumes a lot of memory as mapdb.
Good luck
Hi,I'm newbie with MapDB, I'm currently developing a system that requires a key/value durable storage of about 250G of records. I'm considering to use MapDB to provide such functionality but I have several doubtsThis system is meant to process concurrently large amount of transactions. For this system a transaction means to lookup for a key and if such entry exists then replace that record by two new records. As difference from many scenarios in which read/write ratio is in order 100:1 in this case the system will have: one read, one delete, and two writes per transaction. Also less than a 10% of the DB will fit in memory so most of reads will miss cache. Due to this definition of transaction, it will be necessary to synchronize access to the storage outside MapDB so this system does not require MapDB to be thread safe.Record size is supposed to about few bytes. Record key is a compound key, made of two integers and a date.
Based on above system requirements:Is MapDB a superset of JDBM3? Should I prefer to use MapDB or JDBM3, since for example concurrency will be handled by an upper layer? Does MapDB suffer any significant penalty by supporting concurrency itself? Has MapDB better support than JDBM3?
Regarding performance should I prefer to create a compound key and let MapDB to handle it or I better prefer to use a byte[] key and compose it in upper layer?
Regarding durability which are the steps to ensure each transaction is persisted: enable transactions, map.put() and then db.commit() ? I'm asking that since I read that values serialization happens in background which I understand as map.put() returns before entries a written into disk.
Due to the nature of record structure I think BTreeMap is the right choice. Since 250G of records means about 40 level in a binary tree I think I should increase maxNodeSize. Do you know a good value for this relation keys_per_node/levels_qty? What about compression it's worthy to enable compression if entry does not exceed 1Kb?
I'm not quite sure how much memory MapDb consumes. As I mentioned before I will ser MapDB to not cache entries in memory, however refers to key values. What about the tree index, how much memory does it consumes? For example BerkeleyDB keeps the whole index in memory to accelerate lookup
Regarding fragmentation in BTreeMap aside from storage does it also affect lookup speed?
--Thanks in advance.
You received this message because you are subscribed to the Google Groups "MapDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.
Scott, thanks for your dedicated answer.
I was wondering why you suggest Mapdb may not be the best choice
I ve been evaluating apache lucene, Berkeley db and sql lite as well. However I thought mapdb will be a better choice
Scott. Thanks for your comments.
As it happens in many systems, peek load came in bursts for small time windows. I just have to ensure the system will be capable of perform well in such scenarios.
For example having a 250gb records store, and a burst of 100 k transactions ( one read, one delete and two writes), I think it will not affect to much index traversal. I think I can afford this kond of index degradation and run maintenance cleanups periodically when system is idle .