BDB storage engine

Michael Weber

unread,

Aug 21, 2012, 11:37:22 PM8/21/12

to rav...@googlegroups.com

Ok, I have all of the storage actions done for the BDB storage engine, and I think they are functional. I am going through all of the tests in the test bed to try to make sure everything is working and I'm running into a problem. Take a very simple index test like BoostingDuringIndexing.CanGetBoostedValues. It inserts two documents and makes sure we can query the boost order correctly. With the new storage engine, both documents insert correctly, then when the the system goes to index them versus the builtin index Raven/DocumentsByEntityName and the test bed index, we get a deadlock in the IIndexingStorageActions.UpdateIndexingStats (almost every time).

The problem seems to be that both index threads try to update their stats count at the same time, and since the documents for the indexing stats are small, they both hit the same page of the db. And since BDB locks based on pages, we get a deadlock. Now, when we get a deadlock, we are supposed to abort the transaction and retry. I'm not sure exactly how to do that, and it sort of seems like IsWriteConflict is supposed to handle that.

If I handle IsWriteConflict and check to see if we had a deadlock and abort the transaction then we do move forward, but of course the indexing stats are not updated. And when the test goes to run the query, we get zero results, even though in the database it says that the index has been indexed up to the correct etag. And checking the lucene index with Luke shows that all documents are present.

I'm not exactly sure how to get past this problem. Attached is the log file for the unit test run.

BTW, I did at least do some bulk insert testing from the freedb database loader (5 minute run), this is with no performance work at all

ESENT

folk/f00d1312 142,561 00:04:59.9769034

total inputs: 142584

Done in 00:05:00.0250306

BDB

folk/af0ae30e 109,753 00:04:59.9160742

total inputs: 109776

Done in 00:05:00.0048137

log.txt

Michael Weber

unread,

Aug 22, 2012, 12:04:36 AM8/22/12

to rav...@googlegroups.com

Actually, now that I think about it in order to retry after a deadlock, we would just do the retries in TransactionalStorage.ExecuteBatch. That is assuming all actions queued by that batch message are idempotent. And that may work for future deadlocks we encounter, but it doesn't work for this scenario (without changes) since the DeadlockException will never reach ExecuteBatch since it's caught to produce "Failed to index documents for index......".

And in general for the deadlock retry to work from ExecuteBatch we need to make sure that no batched action ever catches our exception.

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 3:21:56 AM8/22/12

to rav...@googlegroups.com

Michael

You cannot re-execute the action, those are NOT guaranteed to be idempotent.

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 3:28:07 AM8/22/12

to rav...@googlegroups.com

Nice numbers for the perf, let us start with that.

We rely quite heavily on the fact that we can update separate rows on the indexing stats.

This is the key for being able to parellelize indexing.

Can you change the page size for that data?

Can you maybe force it to be on another page?

Michael Weber

unread,

Aug 22, 2012, 8:02:24 AM8/22/12

to rav...@googlegroups.com

Well I'm not sure how to fix the deadlock problem in general then... These are the most common deadlocks bu of course deadlocks are always possible anywhere and our only recourse is to abort the transaction and retry.

Michael Weber

unread,

Aug 22, 2012, 8:05:24 AM8/22/12

to rav...@googlegroups.com

I've tried that on the data leaf page but it still tries to hit the same page. Maybe because of a higher btree node. I can try inserting additional records to force the keys on different pages and see if that works.

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 8:38:47 AM8/22/12

to rav...@googlegroups.com

What about not updating the record?

But create a new one?

So you always look at the latest record, and you clean them out every once in a while?

Michael Weber

unread,

Aug 22, 2012, 8:59:47 AM8/22/12

to rav...@googlegroups.com

Yeah, that could work also, but are there ever two threads updating the same index? If that's true then any of these could be a problem.

I am going to look into using a BDB queue for the index table:

"Queue if your application requires high degrees of concurrency. Queue provides record-level locking (as opposed to the page-level locking that the other access methods use), and this can result in significantly faster throughput for highly concurrent applications."

Louis Haußknecht

unread,

Aug 22, 2012, 9:24:42 AM8/22/12

to rav...@googlegroups.com

Great to see this moving forward! However what about licensing?

http://en.wikipedia.org/wiki/Berkeley_DB states BDB needs to be licensed by oracle if used in non open-source products.

If one is developing a closed-source product with RavenDB and using BDB as storage engine, would you need to license both?

clayton collie

unread,

Aug 22, 2012, 9:27:02 AM8/22/12

to rav...@googlegroups.com

Would that not be an issue for HR, not individual developers ?

Michael Weber

unread,

Aug 22, 2012, 9:57:59 AM8/22/12

to rav...@googlegroups.com

I don't know much about the licenseing, but I found this from an oracle licenseing faq:

Berkeley DB is available under dual license:

* Public license that requires that software that uses the Berkeley DB code be free/open source software; and

* Closed source license for non-open source software.

If your code is not redistributed, no license is required (free for in-house use).

On Wednesday, August 22, 2012 9:24:42 AM UTC-4, Louis Haußknecht wrote:

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 11:18:18 AM8/22/12

to rav...@googlegroups.com

The same index, no.

But different indexes, all the time.

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 11:18:57 AM8/22/12

to rav...@googlegroups.com

Which pretty much solves the problem for us, yes.

Michael Weber

unread,

Aug 22, 2012, 11:50:45 AM8/22/12

to rav...@googlegroups.com

Ok -- that actually makes it easier, then what about having a separate file for each index? That would prevent deadlocks versus writes.

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 12:05:31 PM8/22/12

to rav...@googlegroups.com

Sure, that can work, but would it still be consistent across all files?

Michael Weber

unread,

Aug 22, 2012, 12:26:26 PM8/22/12

to rav...@googlegroups.com

Will what be consistent?

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 12:27:02 PM8/22/12

to rav...@googlegroups.com

Transactionally consistent between the different files?

Michael Weber

unread,

Aug 22, 2012, 12:37:04 PM8/22/12

to rav...@googlegroups.com

Ah, yes BDB is consistent across files. When you get a lock its for a file + page.

And I'm not actually talking about a physical file. It will be the virtual databases in the physical file. Like in http://mikecodespot.blogspot.com/2012/08/how-to-create-index.html

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 1:21:02 PM8/22/12

to rav...@googlegroups.com

Oh, okay

Michael Weber

unread,

Aug 22, 2012, 1:56:07 PM8/22/12

to rav...@googlegroups.com

Ok -- I've done that and they each index's stats are certainly on a different page now, and I don't seem to get a lock between different indexes now.

Now I'm getting another write/ write deadlock between what my guess is updating the stats and updating the etag/last time. Do this happen on different threads?

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 2:08:14 PM8/22/12

to rav...@googlegroups.com

You also need to make sure that reduce updates and map updates happens on different places.

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 2:08:37 PM8/22/12

to rav...@googlegroups.com

We do update the stats for the same index in parallel for map & reduce stats.

Michael Weber

unread,

Aug 22, 2012, 2:29:17 PM8/22/12

to rav...@googlegroups.com

Ok that combined with the fact you cannot delete sections of a database file means we will need one physical file per index and 3 sections in each file: the map stats, the reduce stats and the touch counts.

Then I can just delete the physical file when we delete an index. And it should keep everything separate.

Do the new mapped results, scheduled reduction and reduction results work similarly with respect to threading versus index name? Because I'm already starting to see problems from those tests as well.

Michael Weber

unread,

Aug 22, 2012, 3:48:17 PM8/22/12

to rav...@googlegroups.com

I've made some progress with this and am starting to like the way it's turning out, but I'm still getting two writes (one to stats, one to etag/timestamp) on separate thread. From my logging (5,0,6 is the internal record IDs for the map, reduce and touch tables)

2012-08-22 15:44:50.7529 11 Update(pagesByTitle2:5,0,6)

2012-08-22 15:44:50.7529 8 UpdateStats(pagesByTitle2:5,0,6)

Thread 11 tries to update the etag/timestamp while thread 8 is updating the stats (both for map index), thus, we are getting a deadlock on record #5

Michael Weber

unread,

Aug 22, 2012, 4:16:53 PM8/22/12

to rav...@googlegroups.com

Oh -- I think it's the MaxNumberOfParallelIndexTasks. I guess there can be more than one thread updating the stats. I'm not sure how to fix the deadlocks assuming that there is more than one thread updating the stats at the same time.

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 6:52:55 PM8/22/12

to rav...@googlegroups.com

For maps and reduces at the same level for the index, you don't have concurrency.

But scheduled reduction can have concurrent writes. Note the way that we do this in Esent, we always write, then we do a cleanup.

Oren Eini (Ayende Rahien)

unread,

Aug 22, 2012, 6:53:54 PM8/22/12

to rav...@googlegroups.com

What is the TransactionStorage method that get called here?

Michael Weber

unread,

Aug 22, 2012, 7:34:38 PM8/22/12

to rav...@googlegroups.com

Thread #1

2012-08-22 16:21:22.4747 19 Update(pagesByTitle2:5,0,6)

2012-08-22 16:21:22.4747 19 at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)

at System.Environment.get_StackTrace()

at Raven.Storage.Bdb.Tables.IndexingStatsTable.Update(Txn txn, String name, Guid etag, DateTime timestamp) in c:\dev\ravendb\Raven.Database\Storage\Bdb\Tables\IndexingStatsTable.cs:line 101

at Raven.Storage.Bdb.StorageActions.DocumentStorageActions.UpdateLastIndexed(String index, Guid etag, DateTime timestamp) in c:\dev\ravendb\Raven.Database\Storage\Bdb\StorageActions\Indexing.cs:line 43

at Raven.Database.Indexing.IndexingExecuter.<>c__DisplayClass19.<ExecuteIndexingWork>b__a(IStorageActionsAccessor actions) in c:\dev\ravendb\Raven.Database\Indexing\IndexingExecuter.cs:line 139

at Raven.Storage.Bdb.TransactionalStorage.ExecuteBatch(Action`1 action) in c:\dev\ravendb\Raven.Database\Storage\Bdb\TransactionalStorage.cs:line 129

at Raven.Storage.Bdb.TransactionalStorage.Batch(Action`1 action) in c:\dev\ravendb\Raven.Database\Storage\Bdb\TransactionalStorage.cs:line 100

at Raven.Database.Indexing.IndexingExecuter.ExecuteIndexingWork(IList`1 indexesToWorkOn) in c:\dev\ravendb\Raven.Database\Indexing\IndexingExecuter.cs:line 135

at Raven.Database.Indexing.AbstractIndexingExecuter.ExecuteIndexing() in c:\dev\ravendb\Raven.Database\Indexing\AbstractIndexingExecuter.cs:line 174

at Raven.Database.Indexing.AbstractIndexingExecuter.Execute() in c:\dev\ravendb\Raven.Database\Indexing\AbstractIndexingExecuter.cs:line 42

at System.Threading.Tasks.Task.InnerInvoke()

at System.Threading.Tasks.Task.Execute()

at System.Threading.Tasks.Task.ExecutionContextCallback(Object obj)

at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)

at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)

at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)

at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)

at System.Threading.Tasks.ThreadPoolTaskScheduler.LongRunningThreadWork(Object obj)

at System.Threading.ThreadHelper.ThreadStart_Context(Object state)

at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)

at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)

at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)

at System.Threading.ThreadHelper.ThreadStart(Object obj)

Thread #2

2012-08-22 16:21:22.4747 8 Update(pagesByTitle2:5,0,6)

2012-08-22 16:21:22.4877 8 at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)