MapDB newbie question.

297 views
Skip to first unread message

Venkatesh Venkataramanan

unread,
Apr 23, 2016, 3:29:45 PM4/23/16
to MapDB
Hello: 

I am a newbie to MapDB and have some basic questions on it's usage. I have a use case where I want to consume a MapDB file generated offline. I see in the mailing list that the simplest way would be do something like the following:

db = DBMaker.newFileDB("FileGeneratedOffline").transactionDisable().readOnly().make(); // open the stored file

ConcurrentMap<K, V> map = db.getHashMap(collection); // read data from file.

V value = map.get(K); // retrieve values.
This works great but I am seeing considerable latencies especially when the "FileGeneratedOffline" is big (in my case it can be 8-10GB). 

In some sense, I want to be able to load this file in memory (on/off heap). So I tried doing something like the following while creating the DB :

db = DBMaker.newFileDB("FileGeneratedOffline").__newMemoryDirectDB().transactionDisable().readOnly().make(); // open the stored file

My assumption was that adding the _newMemoryDirectDB would open the DBFile and store the results of fetches from the file into "direct memory". I am guessing this is 
totally wrong/incorrect but I wanted some of the MapDB experts out there to confirm the same :-).
If the above is wrong, is there any way I can have MapDB provide me a mechanism where I can "load" an existing MapDB file into direct memory? I am guessing the
only way would be to create a second db with _newMemoryDirectDB() and copy the contents manually from the original mapDB file at startup. Are there other
approaches? Would appreciate feedback from others as to what you may have done in your use cases.

Apologies if this question has already been asked in the mailing list (my cursory search didn't yield any).
Venkatesh

Dmitriy Shabanov

unread,
Apr 24, 2016, 4:17:15 AM4/24/16
to ma...@googlegroups.com
Hey,

On Sat, Apr 23, 2016 at 10:29 PM, Venkatesh Venkataramanan <vven...@gmail.com> wrote:
I am a newbie to MapDB and have some basic questions on it's usage. I have a use case where I want to consume a MapDB file generated offline. I see in the mailing list that the simplest way would be do something like the following:

db = DBMaker.newFileDB("FileGeneratedOffline").transactionDisable().readOnly().make(); // open the stored file

ConcurrentMap<K, V> map = db.getHashMap(collection); // read data from file.

V value = map.get(K); // retrieve values.
This works great but I am seeing considerable latencies especially when the "FileGeneratedOffline" is big (in my case it can be 8-10GB).

Do you mean delay on opening or fetching by key?

--
Dmitriy Shabanov

Jan Kotek

unread,
Apr 24, 2016, 4:36:35 AM4/24/16
to ma...@googlegroups.com

Hi,

 

in MapDB 1.0 and 2.0 there is no official way to load file into memory.

__XXX methods are internal, they were exposed by accident.

 

I updated documentation which covers caching for not yet stable 3.0

http://www.mapdb.org/doc/volume/

 

Jan

--
You received this message because you are subscribed to the Google Groups "MapDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



Venkatesh Venkataramanan

unread,
Apr 25, 2016, 3:52:29 PM4/25/16
to MapDB
Hello Dmitriy:

Thanks for getting back to me. The issue is not as much as with opening the file. It is more do with key lookups. My use case require me to lookup this Map for a variety of keys (number of lookups is of the order of a few tens-hundred keys) for processing every request. The look up's are quite slow that processing one request can end up taking multiple minutes.

Venkatesh

Venkatesh Venkataramanan

unread,
Apr 25, 2016, 3:53:37 PM4/25/16
to MapDB
Hello Jan:

Thanks for the pointers. Given there is no "official" way to load file into memory, are there any possible workarounds you or others have done to accommodate such use cases?

Thanks for your help,

Venkatesh

To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+unsubscribe@googlegroups.com.

Jan Kotek

unread,
Apr 25, 2016, 4:37:26 PM4/25/16
to ma...@googlegroups.com

3.0 will enter beta this week. If you need this urgently, I would recommend you to start using it.

 

Jan

To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "MapDB" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.

Venkatesh Venkataramanan

unread,
Apr 25, 2016, 5:18:43 PM4/25/16
to MapDB
Thanks Jan. I will definitely try out 3.0 once it gets into beta. If there is any suggestions for older builds that would be great as well.

Venkatesh

To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+unsubscribe@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "MapDB" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+unsubscribe@googlegroups.com.

Venkatesh Venkataramanan

unread,
Apr 25, 2016, 5:33:29 PM4/25/16
to MapDB
On a side note, I had another question related to MapDB usage. In an attempt to copy data from the file into memory, I did something like the following at my application startup:

1. db = DBMaker.newFileDB("FileGeneratedOffline").transactionDisable().readOnly().make(); // open the stored file
2. readOnlyDbInstanceCopy = DBMaker.newHeapDB().make();  // create a new MapDB instance on heap
3. ConcurrentMap<K, V> dataFromFile = db.getHashMap("myCollectionName");
4. ConcurrentMap<K, V> dataInMemory = readOnlyDbInstanceCopy.getHashMap("myCollectionNameReadOnly");
5. for (Map.Entry<K, V> entry : hashMap.entrySet() {
    dataInMemory.put(entry.getKey(), entry.getValue());
}
6. db.close(); // we have copied the file contents into memory, close the file DB.

My assumption at this point was that my contents from the file was loaded into a new MapDB instance in memory and so I could close the file MapDB instance. However, it seems like subsequent calls to read data from dataInMemory results in a FetchException (indicating that the File is Closed?). Is that expected? The only way I have been able to make this work is by not calling close on the original DB file. 

Thanks in advance for your help!

Cheers!
Venkatesh

daedalus

unread,
Apr 25, 2016, 9:30:01 PM4/25/16
to MapDB
Your latency problem Is most likely due to the time it takes for the disk to respond to each get() request which can take from microseconds to milliseconds rather than nanoseconds if it was already in memory. You are right in wanting to preload the DB into memory before doing reads to eliminate this problem (assuming you'r DB will fit in memory).

On linux you can use a program called vmtouch  to preload the entire DB file into the OS page cache memory before using it.

eg,  vmtouch -m 20G -t '/datastore/mydb'

-m refers to max file size (default 500M anything larger will be ignored). Even if your DB wont fit entirely in memory vmtouch will load as much as it can into memory so you will still benefit.

Make sure you are using the memorymapped option in mapdb.

You should see 0 disk hits for reads if you do the above assuming the OS does not page out some of the DB if free memory gets low. You can lock the DB in the OS memory page cache using the vmtouch -l option, the file will only be allowed to be paged out when the vmtouch process is killed.  

I don't know how you would do this on other OSes as I only use linux.  I have a 13GB mapdb DB with billions of KV pairs and it is fast as hell because a get() never hits the disk. You do not need to copy the data to a separate mapdb in memory db as you'r DB file is already in memory thanks to vmtouch.  

Most DBes fall down when the DB size exceeds available memory, one notable exception is RocksDB which is tuned for SSDs but RocksDB requires native library ATM and mapdb3 might beat it if jan can work his magic.

Hope this helps.




Jan Kotek

unread,
Apr 26, 2016, 1:54:22 AM4/26/16
to ma...@googlegroups.com

Hi,

 

would you post an stack trace? In memory heap store should be independent on file, once data are copied.

 

Jan

To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "MapDB" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "MapDB" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages