How do I diagnose cause of failover in a replicated RavenDB setup?

425 views
Skip to first unread message

Arnold Zokas

unread,
Jul 10, 2012, 5:54:09 AM7/10/12
to rav...@googlegroups.com
I have created a data migration tool that reads data from a legacy SQL database, transforms it into something a lot more sensible and then stores it in RavenDB.

When I try to write 22500 1KB objects in a single Raven session, I get the following error:

System.InvalidOperationException: Could not replicate POST operation to secondary node, failover behavior is: AllowReadsFromSecondaries

The message suggests Master Raven instance encountered a fatal error and migration tool failed over to Slave Slave Raven instance and tried to write to it. What I don't know is why.

Background information:

 - .NET 4.0
 - RavenDB #2025-Unstable (can't use stable build because of dependency on Json.NET)
 - Using RavenDB Replication bundle (replication is configured correctly)
 - Write operations are wrapped in a TransactionScope (there are writes happening elsewhere in the migration tool using a separate Raven session)
 - No errors in Windows Event Log

Log on Master Raven instance contains the following error:

    System.Threading.ThreadAbortException: Thread was being aborted.
       at Microsoft.Isam.Esent.Interop.Implementation.NativeMethods.JetSeek(IntPtr sesid, IntPtr tableid, UInt32 grbit)
       at Microsoft.Isam.Esent.Interop.Implementation.JetApi.JetSeek(JET_SESID sesid, JET_TABLEID tableid, SeekGrbit grbit) in C:\Work\ravendb\SharedLibs\Sources\managedesent-61618\EsentInterop\JetApi.cs:line 3172
       at Microsoft.Isam.Esent.Interop.Api.TrySeek(JET_SESID sesid, JET_TABLEID tableid, SeekGrbit grbit) in C:\Work\ravendb\SharedLibs\Sources\managedesent-61618\EsentInterop\MoveHelpers.cs:line 127
       at Raven.Storage.Esent.StorageActions.DocumentStorageActions.EnsureDocumentIsNotCreatedInAnotherTransaction(String key, Guid txId) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\StorageActions\Util.cs:line 77
       at Raven.Storage.Esent.StorageActions.DocumentStorageActions.AddDocument(String key, Nullable`1 etag, RavenJObject data, RavenJObject metadata) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\StorageActions\Documents.cs:line 296
       at Raven.Database.DocumentDatabase.<>c__DisplayClass37.<Put>b__30(IStorageActionsAccessor actions) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs:line 480
       at Raven.Storage.Esent.TransactionalStorage.Batch(Action`1 action) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\TransactionalStorage.cs:line 328
       at Raven.Database.DocumentDatabase.Put(String key, Nullable`1 etag, RavenJObject document, RavenJObject metadata, TransactionInformation transactionInformation) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs:line 468
       at Raven.Database.DocumentDatabase.<>c__DisplayClass67.<Commit>b__64(DocumentInTransactionData doc) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs:line 654
       at Raven.Storage.Esent.StorageActions.DocumentStorageActions.CompleteTransaction(Guid txId, Action`1 perDocumentModified) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\StorageActions\General.cs:line 272
       at Raven.Database.DocumentDatabase.<>c__DisplayClass67.<Commit>b__63(IStorageActionsAccessor actions) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs:line 658
       at Raven.Storage.Esent.TransactionalStorage.ExecuteBatch(Action`1 action) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\TransactionalStorage.cs:line 374
       at Raven.Storage.Esent.TransactionalStorage.Batch(Action`1 action) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\TransactionalStorage.cs:line 334
       at Raven.Database.DocumentDatabase.Commit(Guid txId) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs:line 668
       at Raven.Database.Server.Responders.TransactionCommit.Respond(IHttpContext context) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Server\Responders\TransactionCommit.cs:line 28
       at Raven.Database.Server.HttpServer.DispatchRequest(IHttpContext ctx) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Server\HttpServer.cs:line 582
       at Raven.Database.Server.HttpServer.HandleActualRequest(IHttpContext ctx) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Server\HttpServer.cs:line 347

Is there an upper limit on number of writes per session when using transactions with replication?
Could this, perhaps, be a result of using an unstable build?

Oren Eini (Ayende Rahien)

unread,
Jul 10, 2012, 6:16:47 AM7/10/12
to rav...@googlegroups.com
inline

On Tue, Jul 10, 2012 at 12:54 PM, Arnold Zokas <arnold...@gmail.com> wrote:
I have created a data migration tool that reads data from a legacy SQL database, transforms it into something a lot more sensible and then stores it in RavenDB.

When I try to write 22500 1KB objects in a single Raven session, I get the following error:

System.InvalidOperationException: Could not replicate POST operation to secondary node, failover behavior is: AllowReadsFromSecondaries


That is expected if the master is not available.
 
The message suggests Master Raven instance encountered a fatal error and migration tool failed over to Slave Slave Raven instance and tried to write to it. What I don't know is why.

Background information:

Log on Master Raven instance contains the following error:

    System.Threading.ThreadAbortException: Thread was being aborted.

That is an interesting error. It usually only happen if something else is in the process killing our threads, or during rude process shutdown.
 
Is there an upper limit on number of writes per session when using transactions with replication?

No, internally they are batched.
 
Could this, perhaps, be a result of using an unstable build?

That is possible, but I don't think so.
Can you reproduce this?

Arnold Zokas

unread,
Jul 10, 2012, 8:48:01 AM7/10/12
to rav...@googlegroups.com
Yes, this is reproducible. Don't think this is caused by replication, though.
Upgraded to build #2032 (both server and client), temporarily disabled replication and deleted data folder on the server.

Now I'm getting this error on the client:

System.Net.WebException: The operation has timed out
   at System.Net.HttpWebRequest.GetResponse()
   at Raven.Client.Connection.HttpJsonRequest.ReadJsonInternal(Func`1 getResponse) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Connection\HttpJsonRequest.cs:line 295
   at Raven.Client.Connection.HttpJsonRequest.ReadResponseJson() in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Connection\HttpJsonRequest.cs:line 214
   at Raven.Client.Connection.ServerClient.<>c__DisplayClass60.<Commit>b__5f(String u) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Connection\ServerClient.cs:line 905
   at Raven.Client.Connection.ReplicationInformer.TryOperation[T](Func`2 operation, String operationUrl, Boolean avoidThrowing, T& result) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Connection\ReplicationInformer.cs:line 571
   at Raven.Client.Connection.ReplicationInformer.ExecuteWithReplication[T](String method, String primaryUrl, Int32 currentRequest, Int32 currentReadStripingBase, Func`2 operation) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Connection\ReplicationInformer.cs:line 537
   at Raven.Client.Connection.ServerClient.ExecuteWithReplication[T](String method, Func`2 operation) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Connection\ServerClient.cs:line 165
   at Raven.Client.Document.DocumentSession.Commit(Guid txId) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Document\DocumentSession.cs:line 500
   at Raven.Client.Document.PromotableRavenClientEnlistment.SinglePhaseCommit(SinglePhaseEnlistment singlePhaseEnlistment) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Document\PromotableRavenClientEnlistment.cs:line 60
   at System.Transactions.DurableEnlistmentCommitting.EnterState(InternalEnlistment enlistment)
   at System.Transactions.CommittableTransaction.Commit()
   at System.Transactions.TransactionScope.InternalDispose()
   at System.Transactions.TransactionScope.Dispose()

And corresponding error on the server:

System.Threading.ThreadAbortException: Thread was being aborted.
   at Microsoft.Isam.Esent.Interop.Implementation.NativeMethods.JetSeek(IntPtr sesid, IntPtr tableid, UInt32 grbit)
   at Microsoft.Isam.Esent.Interop.Implementation.JetApi.JetSeek(JET_SESID sesid, JET_TABLEID tableid, SeekGrbit grbit) in C:\Work\ravendb\SharedLibs\Sources\managedesent-61618\EsentInterop\JetApi.cs:line 3172
   at Microsoft.Isam.Esent.Interop.Api.TrySeek(JET_SESID sesid, JET_TABLEID tableid, SeekGrbit grbit) in C:\Work\ravendb\SharedLibs\Sources\managedesent-61618\EsentInterop\MoveHelpers.cs:line 127
   at Raven.Storage.Esent.StorageActions.DocumentStorageActions.EnsureDocumentIsNotCreatedInAnotherTransaction(String key, Guid txId) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\StorageActions\Util.cs:line 77
   at Raven.Storage.Esent.StorageActions.DocumentStorageActions.AddDocument(String key, Nullable`1 etag, RavenJObject data, RavenJObject metadata) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\StorageActions\Documents.cs:line 303
   at Raven.Database.DocumentDatabase.<>c__DisplayClass3a.<Put>b__33(IStorageActionsAccessor actions) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs:line 488
   at Raven.Storage.Esent.TransactionalStorage.Batch(Action`1 action) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\TransactionalStorage.cs:line 328
   at Raven.Database.DocumentDatabase.Put(String key, Nullable`1 etag, RavenJObject document, RavenJObject metadata, TransactionInformation transactionInformation) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs:line 476
   at Raven.Database.DocumentDatabase.<>c__DisplayClass6b.<Commit>b__68(DocumentInTransactionData doc) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs:line 679
   at Raven.Storage.Esent.StorageActions.DocumentStorageActions.CompleteTransaction(Guid txId, Action`1 perDocumentModified) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\StorageActions\General.cs:line 272
   at Raven.Database.DocumentDatabase.<>c__DisplayClass6b.<Commit>b__67(IStorageActionsAccessor actions) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs:line 683
   at Raven.Storage.Esent.TransactionalStorage.ExecuteBatch(Action`1 action) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\TransactionalStorage.cs:line 374
   at Raven.Storage.Esent.TransactionalStorage.Batch(Action`1 action) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Storage.Esent\TransactionalStorage.cs:line 334
   at Raven.Database.DocumentDatabase.Commit(Guid txId) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs:line 693
   at Raven.Database.Server.Responders.TransactionCommit.Respond(IHttpContext context) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Server\Responders\TransactionCommit.cs:line 28
   at Raven.Database.Server.HttpServer.DispatchRequest(IHttpContext ctx) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Server\HttpServer.cs:line 651
   at Raven.Database.Server.HttpServer.HandleActualRequest(IHttpContext ctx) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Server\HttpServer.cs:line 420

On Tuesday, 10 July 2012 11:16:47 UTC+1, Oren Eini wrote:
inline

Oren Eini (Ayende Rahien)

unread,
Jul 10, 2012, 8:49:45 AM7/10/12
to rav...@googlegroups.com
How do you reproduce this?

Arnold Zokas

unread,
Jul 10, 2012, 10:15:44 AM7/10/12
to rav...@googlegroups.com
Currently, I am reproducing this by running my custom data migration tool.
Having run more tests, I am certain this is a transaction-related issue rather than an issue with replication:
  • Replication seems to work fine when transaction code is removed
  • Transactions continue to fail (during commit stage) even when replication is disabled. The error messages are always the same.
I have also tried upgrading to newer unstable builds, but:
  • #2026, #2027 - throw exception when trying to create a new document via Raven Studio (missing license for text editor component)
  • #2028 - 2032 - fail to detect replication bundle
I will try to reproduce this in a test project with build #2025 and push it to GitHub. Will post a reply to this as soon as it is done.


On Tuesday, 10 July 2012 13:49:45 UTC+1, Oren Eini wrote:
How do you reproduce this?

Oren Eini (Ayende Rahien)

unread,
Jul 10, 2012, 10:16:59 AM7/10/12
to rav...@googlegroups.com
Thanks!

Arnold Zokas

unread,
Jul 10, 2012, 11:53:14 AM7/10/12
to rav...@googlegroups.com
Tried to replicate this behaviour in a standalone project, but with no success.

I've done some more testing with my migration tool and this error manifests itself only when the number of documents to be committed crosses a certain threshold:
  • Works fine with 10000 documents
  • Starts failing after about 12000 documents
My short-term solution is to disable transactions on initial data migration and enable them for incremental data migration.
It would be good to find out what the problem is though.

Is there any information I can provide (other than a repro project) to help diagnose this?

On Tuesday, 10 July 2012 15:16:59 UTC+1, Oren Eini wrote:
Thanks!

Oren Eini (Ayende Rahien)

unread,
Jul 10, 2012, 1:39:04 PM7/10/12
to rav...@googlegroups.com
Okay, let me see.
Is it possible that you are actually saving all of them in a single transaction?

Arnold Zokas

unread,
Jul 10, 2012, 6:37:29 PM7/10/12
to rav...@googlegroups.com
Yes, single transaction. Is there an upper limit on number of writes per transaction?


On Tuesday, 10 July 2012 18:39:04 UTC+1, Oren Eini wrote:
Okay, let me see.
Is it possible that you are actually saving all of them in a single transaction?

Oren Eini (Ayende Rahien)

unread,
Jul 11, 2012, 12:32:05 AM7/11/12
to rav...@googlegroups.com
There is, sort of. 
It is the max version store, which determine how many uncommitted changes you can have at one time.
I would really like to have a repro of this, because while I understand the overall error condition, the actual error stack that you provide make no sense.

Oren Eini (Ayende Rahien)

unread,
Jul 11, 2012, 12:33:37 AM7/11/12
to rav...@googlegroups.com
Also, note that if you can't reproduce this locally on your own, it is probably because the transaction hasn't been promoted to full DTC tx.

Below is a class that can force this, and here is how:

using (var scope = new TransactionScope(TransactionScopeOption.RequiresNew))
{
// Promote the transaction

System.Transactions.Transaction.Current.EnlistDurable(DummyEnlistmentNotification.Id, new DummyEnlistmentNotification(), EnlistmentOptions.None);


public class DummyEnlistmentNotification : IEnlistmentNotification
{
public static readonly Guid Id = Guid.NewGuid();

   public bool WasCommitted { get; set; }
public void Prepare(PreparingEnlistment preparingEnlistment)
{
preparingEnlistment.Prepared();
}

public void Commit(Enlistment enlistment)
{
   WasCommitted = true;
enlistment.Done();
}

public void Rollback(Enlistment enlistment)
{
enlistment.Done();
}

public void InDoubt(Enlistment enlistment)
{
enlistment.Done();

Arnold Zokas

unread,
Jul 11, 2012, 6:04:30 AM7/11/12
to rav...@googlegroups.com
Good news. Have a repro for you - check out https://github.com/ArnoldZokas/RavenSandbox
Please bear in mind the repro is vastly simplified :-) 

By process of elimination I have worked out this behaviour is triggered by having nested transaction scopes such as:

using(var tx = new TransactionScope(TransactionScopeOption.RequiresNew))
{
using(var tx = new TransactionScope(TransactionScopeOption.Suppress))
{
     // get data out of sql db
}

          // create raven session, write data, save changes
          // create another raven session, write more data, save changes
}


On Wednesday, 11 July 2012 05:33:37 UTC+1, Oren Eini wrote:
Also, note that if you can't reproduce this locally on your own, it is probably because the transaction hasn't been promoted to full DTC tx.

Below is a class that can force this, and here is how:

using (var scope = new TransactionScope(TransactionScopeOption.RequiresNew))
{
// Promote the transaction

System.Transactions.Transaction.Current.EnlistDurable(DummyEnlistmentNotification.Id, new DummyEnlistmentNotification(), EnlistmentOptions.None);


public class DummyEnlistmentNotification : IEnlistmentNotification
{
public static readonly Guid Id = Guid.NewGuid();

   public bool WasCommitted { get; set; }
public void Prepare(PreparingEnlistment preparingEnlistment)
{
preparingEnlistment.Prepared();
}

public void Commit(Enlistment enlistment)
{
   WasCommitted = true;
enlistment.Done();
}

public void Rollback(Enlistment enlistment)
{
enlistment.Done();
}

public void InDoubt(Enlistment enlistment)
{
enlistment.Done();
}
}

On Wed, Jul 11, 2012 at 7:32 AM, Oren Eini (Ayende Rahien) <aye...@ayende.com> wrote:
There is, sort of. 
It is the max version store, which determine how many uncommitted changes you can have at one time.
I would really like to have a repro of this, because while I understand the overall error condition, the actual error stack that you provide make no sense.

Oren Eini (Ayende Rahien)

unread,
Jul 13, 2012, 9:22:31 AM7/13/12
to rav...@googlegroups.com
Okay, I haven't been able to reproduce the crash, but I was able to see that it takes to long to commit very big transactions.
Please test with the next build of RavenDB

Oren Eini (Ayende Rahien)

unread,
Jul 15, 2012, 4:04:38 AM7/15/12
to rav...@googlegroups.com
Update:
There is no real way to make this work faster as long as we want to maintain transactional consistency.
I wasn't able to reproduce the crash.
Please note that saving 20,000 docs in a single DTC transaction is not an expected usage of the feature. Please reduce (by an order of magnitude or so) the number of documents you modify in a single DTC transaction.

Arnold Zokas

unread,
Jul 15, 2012, 1:15:56 PM7/15/12
to rav...@googlegroups.com
Thanks for investigating. I have modified the migration app not to use transactions for initial migration run. Incremental migration usually deal with less than a 100 changed documents.
Reply all
Reply to author
Forward
0 new messages