Hello Michael,
In the case you just described, DynamicDataStore won't really help
unless you have explicitly specified the initial capacity (roughly 2
times #keys). Due to the same reason, we developed another data store
called IndexedDataStore. We strongly suggest you try this one.
This basic idea of IndexedDataStore is to put keys in memory and put
data in I/O cache or on disk. This reduces data movement among
segments caused by hash collisions. The constructor interface is like
the following
IndexedDataStore(File homeDir,
int batchSize,
int numSyncBatches,
int indexInitLevel,
int indexSegmentFileSizeMB,
SegmentFactory indexSegmentFactory,
int storeInitLevel,
int storeSegmentFileSizeMB,
SegmentFactory storeSegmentFactory)
homeDir - the store home directory.
batchSize - update batch size (e.g. 10000, persist redo log every
10000 updates)
numSyncBaches - the number of update batches needed to sync redo logs
with indexes.
indexInitLevel - linear hashing level for indexes. (e.g.
indexInitLevel 11 gives a initial capacity of 2^11 * 64K, which is
2^27, roughly 128 million keys)
indexSegmentFileSizeMB - index segment file size in MB (e.g. 32)
indexSegmentFactory - index segment factory (e.g. MemorySegmentFactory
is the best option)
storeInitLevel - linear hashing level for real data store. (e.g.
storeInitLevel 4 gives you a capacity of 2^4*64K, which gives roughly
1 million data items )
storeSegmentFileSizeMB - store segment file size in MB (e.g. 256)
storeSegmentFactory - store segment factory (e.g.
WriteBufferSegmentFactory)
Give, the case you have described, I would suggest to construct the
following IndexedDataStore.
new IndexedDataStore(
Files.createTempDir(),
10000,
5,
11, 64, new MemorySegmentFactory(),
5, 256, new WriteBufferSegmentFactory()).
The indexInitLevel is critical to write performance. The number 11
specifies enough hash space at the index level.
Look forward to hearing new performance numbers from you.
Thanks.
-jingwei