initialCapacity

47 views
Skip to first unread message

Nick Leong

unread,
Jun 1, 2012, 9:39:44 PM6/1/12
to Krati
Hi all, I am new to Krati so hoping someone can help. I am not sure
if it is a bug or my problem, am seeing data lost when restarting the
datastore.

Since I am using Krati as an embedded datastore in my library, I have
no idea how big the datastore size is initially, so I use 1024 as the
initialCapacity param for the IndexedDataStore.

IndexedDataStore(homeDir, initialCapacity, 10000, 5,
16, new MemorySegmentFactory(),
32, new
WriteBufferSegmentFactory());

Then, I populated 10,000,000 entry in the store, and do a count by
iterating the keyIterator

long count = 0;
IndexedIterator<byte[]> itr = _store.keyIterator();
while(itr.hasNext()) {
itr.next();
count++;
}

Count was correct which give me 10,000,000 before I restart the
program. However, after closing the datastore gracefully and restart
the program, if I do the count again, some entry(around 30 to 40) will
be missing. I can consistently reproduce it using the test program
below.

However, if I populate only 10,000 with initialCapacity 1024, it
restarted ok with corrected count.

And if I set the initialCapacity to 10,000,000 and populate
10,000,000, it also restarted ok.


As a side note, I notice a config.properties file is created first
time I start a dataStore. What if these config params change when
restarting the program. for example, initially, I set the
initialCapacity to 1024 -> populate 10,000,000 entry -> close the
datastore -> shutdown the program -> change the initialCapacity to
10,000,000 -> restart the program -> do a count. Now I am getting
10,004,000 entry count...


I can live with slower performance initially and then tune the
parameters to get better performance later but missing data is not
acceptable.

Everything is single-threaded, I modified the KratiDataStore as my
test program here -

---------------

public class KratiIndexedDataStore {
private final DataStore<byte[], byte[]> _store;

public KratiIndexedDataStore(File homeDir, int initialCapacity)
throws Exception {
_store = createDataStore(homeDir, initialCapacity);
}

protected DataStore<byte[], byte[]> createDataStore(File homeDir,
int initialCapacity) throws Exception {

return new IndexedDataStore(homeDir, initialCapacity, 10000,
5,
16, new MemorySegmentFactory(),
32, new
WriteBufferSegmentFactory());
}

protected byte[] createDataForKey(String key) {
return ("Here is your data for " + key).getBytes();
}

public void populate(int num) throws Exception {
for (int i = 0; i < num; i++) {
String str = "key." + i;
byte[] key = str.getBytes();
byte[] value = createDataForKey(str);
_store.put(key, value);
}
_store.sync();
}

public void menu() throws Exception {
Scanner scanner = new Scanner(System.in);
while(true) {
System.out.print("Command >");
String input = scanner.nextLine();
String[] params = input.split(" ");
if(params[0].equalsIgnoreCase("populate")) {
try {
int num = Integer.parseInt(params[1]);
populate(num);
System.out.println("populate done");
}
catch (Exception ex) {
System.out.println("populate - usage: populate
num");
}
}
else if(params[0].equalsIgnoreCase("close")) {
try {
_store.close();
break;
}
catch (Exception ex) {
System.out.println("close - usage: close");
}
}
else if(params[0].equalsIgnoreCase("count")) {
long count = 0;
long start = System.currentTimeMillis();
IndexedIterator<byte[]> itr = _store.keyIterator();
while(itr.hasNext()) {
itr.next();
count++;
}
System.out.println("Capacity=" + _store.capacity() +
", count=" + count + ", time=" + (System.currentTimeMillis()-start));
}
}
}

public static void main(String[] args) {
try {
// Parse arguments: keyCount homeDir
File homeDir = new File(args[0]);
int initialCapacity = Integer.parseInt(args[1]);

// Create an instance of Krati DataStore
KratiIndexedDataStore store = new
KratiIndexedDataStore(homeDir, initialCapacity);

store.menu();

} catch (Exception e) {
e.printStackTrace();
}
}
}

Jingwei

unread,
Jun 2, 2012, 2:32:31 AM6/2/12
to Krati
Hi Nick,

The initialCapacity for IndexedDataStore should be given a reasonably
large value close to the estimated number of keys in your data sets.
This way, the automatic rehashing will be much more efficient. Once
you created an IndexedDataStore with an initialCapacity, you should
NOT modify that value again. If you changed that value for an existing
store, the hashing won't work and you will run into all sorts of weird
problems.

If you cannot estimate your data size, you can specify a relatively
large initialCapacity such as 1,000,000 or 5,000,000.

If you close the store and then reopen it, the count returned via
iterator is different from the expected value, you may found a bug
related to iterator. Did you have a smaller or larger count? Could you
verify key by key?

Thanks.

Jingwei

Nick Leong

unread,
Jun 2, 2012, 3:27:36 AM6/2/12
to Krati
I got a smaller count. I did verify key by key and some key is
missing. It is kind of random...

The issue can be reproduced everytime - run the program I sent
earlier using initialCapacity to may be 50000. And do the following
command -

Command >populate 5000000
populate done

Command >count
Capacity=262144, count=5000000, time=444

Command >close


*** Restart the program and do -

Command >count
Capacity=524288, count=4999051, time=427


Count is 4,999,051 after restart which is less than 5,000,000.

Jingwei

unread,
Jun 5, 2012, 12:40:21 AM6/5/12
to Krati
Hi Nick,

I have reproduced the symptoms you described. Not sure what exactly
caused this. It may be related to hashing reinitialization when re-
opening a closed store. Will spend some time to on this issue.

Thanks.

Jingwei

Jingwei

unread,
Jun 15, 2012, 5:58:28 PM6/15/12
to kr...@googlegroups.com
Hi Nick,

Krati 0.4.6 fixed the problem of missing keys associated with IndexedDataStore. It is a tricky bug and took me several days to have it fixed. New tests TestDataStoreOpenClose was added to test the number of keys upon open/close operations.

You can ran 'mvn test -Dtest=TestDataStoreOpenClose' to test it out.

Thanks.

Jingwei

realvalkyrie

unread,
Jun 18, 2012, 3:03:04 AM6/18/12
to kr...@googlegroups.com
When will Krati 0.47 release :)
Reply all
Reply to author
Forward
0 new messages