why does a database need multiple indices?

22 views
Skip to first unread message

shah...@gmail.com

unread,
Jun 2, 2012, 11:02:20 PM6/2/12
to bab...@googlegroups.com
I'm trying to understand why a database might need several indices. 

Given this code:

DatabaseManager dbm = databaseSystem.getDatabaseManager();
Database db = dbman.createDatabase("myDB",5);
and this code
BabuDBRequestResult<Object> result = db.singleInsert(2, "key".getBytes(), "value".getBytes(), context);
result
.get();

Why did I createDatabase with 5 indices? When an item is being inserted, index 2 is being specified, what does that really mean?

I am assuming BabuDB is basically a disk persistent hash map, or something like a B-tree library. How do indices come into play here?

I understand that the internal implementation uses several in-memory trees. As each tree fills up, it is 'sealed' and a new tree is started. While the new tree is being used, old trees, which are no longer being modified, are merged, sorted and written to disk. Does BabuDB 'index' refer to these in-memory trees? If so, why would I specify the index I want to use to insert data?

Are these indieces just worker threads? Do I specify the index to be used for singleInsert to make sure the order of inserts is correct? In other words, does having 5 indices mean five mailboxes in the actor model, where each mailbox retains the order of events, but the order is not guaranteed across mailboxes?

Jan Stender

unread,
Jun 4, 2012, 4:39:35 AM6/4/12
to bab...@googlegroups.com
Hi,


I'm trying to understand why a database might need several indices. 

Given this code:

DatabaseManager dbm = databaseSystem.getDatabaseManager();
Database db = dbman.createDatabase("myDB",5);
and this code
BabuDBRequestResult<Object> result = db.singleInsert(2, "key".getBytes(), "value".getBytes(), context);
result.get();

Why did I createDatabase with 5 indices? When an item is being inserted, index 2 is being specified, what does that really mean?

I am assuming BabuDB is basically a disk persistent hash map, or something like a B-tree library. How do indices come into play here?

It's a design decision of BabuDB that each database may have multiple indices, with databases being named and indices being numbered. Each index can be regarded as an individual persistent hash map.


I understand that the internal implementation uses several in-memory trees. As each tree fills up, it is 'sealed' and a new tree is started. While the new tree is being used, old trees, which are no longer being modified, are merged, sorted and written to disk. Does BabuDB 'index' refer to these in-memory trees? If so, why would I specify the index I want to use to insert data?

There's a difference between a BabuDB index and the internal index structures used inside of BabuDB. Each BabuDB index has its own _stack_ of in-memory trees, plus its persistent on-disk index. Internal index structures are only relevant for the internal architecture of BabuDB and not directly visible to your application.



Are these indieces just worker threads? Do I specify the index to be used for singleInsert to make sure the order of inserts is correct? In other words, does having 5 indices mean five mailboxes in the actor model, where each mailbox retains the order of events, but the order is not guaranteed across mailboxes?

Yes, exactly. Each index will be accessed by the same worker thread, which ensures that all insertions will be performed in the order in which they were initiated.

Best regards,
Jan

Reply all
Reply to author
Forward
0 new messages