Hi,
I have a situation where I have 2 properties that together uniquely identify my saga data. In a single threaded environment, my saga mapping works great (both versions that I will describe below). However, the production environment is highly concurrent and my saga mapping falls apart and degenerates into a log full of SQL server deadlocks and I don’t understand why.
Here’s a sample of what fills the logs:
NHibernate.Util.ADOExceptionReporter - System.Data.SqlClient.SqlException (0x80131904): Transaction (Process ID 70) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
NHibernate.Event.Default.AbstractFlushingEventListener - Could not synchronize database state with session
NHibernate.Exceptions.GenericADOException: could not execute batch command.[SQL: SQL not available] ---> System.Data.SqlClient.SqlException: Transaction (Process ID 70) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
NServiceBus.Unicast.Transport.TransportReceiver - Failed to process message
NHibernate.Exceptions.GenericADOException: could not execute batch command.[SQL: SQL not available] ---> System.Data.SqlClient.SqlException: Transaction (Process ID 70) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
I started with a custom IFindSaga that allowed me to pull the two properties from the message and query the table. Again, this worked great in a single threaded environment but in a multithreaded environment this does little more than deadlock (almost every message).
public class MySagaFinder :
IFindSagas<MySagaData>
.Using<MyMessage1>
{
public NHibernateStorageContext StorageContext { get; set; }
public MySagaData FindBy(MyMessage1 message)
{
return StorageContext.Session.QueryOver<MySagaData>()
.Where(x =>
x.Id1 == message.AlmostUniqueId1
&& x.Id2 == message.AlmostUniqueId2)
.SingleOrDefault();
}
}
I then removed the IFindSaga above and replaced it with this ConfigureHowToFindSaga mapping:
mapper.ConfigureMapping<MyMessage1>(
message => $"{ message.AlmostUniqueId1}_{ message.AlmostUniqueId2}")
.ToSaga(sagaData => sagaData.UniqueBecauseId1AndId2AreUnique);
It helped but only by a little. I am still seeing a tremendous number of deadlocks but rather than almost 100% it’s more like 70% of the messages trying to get to the saga data will cause deadlocks.
I suspect that I can optimize some of this by applying indexes and other optimizations. However, I tried the following mapping but it didn’t seem to help either:
this.Property(x => x.UniqueBecauseId1AndId2AreUnique,
m =>
{
m.Unique(true);
m.NotNullable(true);
m.Index("idx_UniqueIdAndConversationId");
m.UniqueKey("key_UniqueIdAndConversationId");
});
I think my first question is – why would this mapping cause a saga operating in a highly concurrent environment to deadlock so severely (so severe that almost no messages can reach their saga data)? Secondly, are there any tips or tricks to troubleshoot and resolve issues like this?
Thanks,
Mark
Hi Ramon,
I am using a simplified sample to facilitate the discussion but this is what the saga data looks like:
public class MySagaData : ContainSagaData
{
[Unique]
public virtual string UniqueBecauseId1AndId2AreUnique { get; set; }
public virtual Guid AlmostUniqueId1 { get; set; }
public virtual string AlmostUniqueId2 { get; set; }
public virtual IDictionary<Guid, TrackingObject> TrackingObjects { get; set; } = new Dictionary<Guid, TrackingObject>();
}
I have also used SQL profiler and found the locking/deadlock problem to be related to the IDictionary table in my saga data. The mapping I am now using is pasted below and I think this is part of the problem. It creates the saga table and tracking object table exactly like it needs to be designed but perhaps the indexes are not quite optimal?
this.Map(sagaData => sagaData.TrackingObjects,
collection =>
{
collection.Table("SagaDataTrackingObjects");
collection.OptimisticLock(true);
},
collectionElementRelation => collectionElementRelation.Component(m =>
{
m.Property(x => x.OrderStarted);
m.Property(x => x.NumberCollected);
m.Property(x => x.NumberProcessed);
m.Property(x => x.NumberDelivered);
m.Property(x => x.OrderCompleted);
}));
Also, (alternatively?) I think what is happening is that multiple threads are trying to insert into the Dictionary collection (which should be ok) but it seems to be taking a lock on the whole table instead of just a single row.
I found http://docs.particular.net/nservicebus/sagas/concurrency and https://ayende.com/blog/3946/nhibernate-mapping-concurrency but I’m not clear on how to update the Saga Finder FindBy function so that it locks only the row it needs to update in the Dictionary (sub table).
When I try to follow the pattern at the second link, I have something like this in my IFindSaga .FindBy ( message ) function:
using (StorageContext.Session.BeginTransaction())
{
var id = $"{message.AlmostUniqueId1}_{message.AlmostUniqueId2}";
return StorageContext.Session.Get<TrackingObject>(id, LockMode.Read);
}
I am assuming Get<> locks less records? (not really sure, just trying to follow the pattern) but I don’t know what to do next because this doesn’t even build since I need to return a SagaData here and not a single element from the collection.
Do you have any tips, samples, or references for how to deal with a saga data that contains a dictionary object (sub table) but how to lighten the locks in the collection table because of a concurrent scenario?
Thanks again for your help.
Mark
The saga initiates several sub workflows during the course of its processing. The reason it uses tracking objects is because it doesn’t know how many data acquisition sub workflows it will have to initiate until it has initiated them all. The data collected during the process determines when it’s done. So, it uses a tracking object to track that it started a sub workflow, tracked its progress, and then eventually its completion. When all tracking objects are complete, the saga knows its complete. An additional complication is that each sub workflow is essentially identical except for the data. So, the last update handler basically has to check to see if all sub workflow progress is complete and if every tracking object is complete the saga can then be completed.
For example, saga1 might start 3 sub-workflows. The saga records tracking info and updates its tracking/persistence periodically during this processing. In this example, sub-workflow 1 completes in a short period of time. Sub-workflow 3 might take longer and sub-workflow 2 might take a much longer time. Saga2 might have 20 sub-workflows and they also complete in different orders.
At each final message update, the saga checks to see if there are any non-complete tracking objects and if all are complete it marks the saga complete. So essentially, the tracking objects are just a collection to help know how many sub workflows the saga started and track their progress.
Here is a mocked example of the Saga data:
|
Id |
Originator |
OriginalMessageId |
UniqueId |
TrackingInfo_Descriptive |
|
5A88EF5E-2F4A-4583-80E2-A65500B0F42F |
<endpoint> |
d8270063-ca45-4516-8a3f-a65500b0f178 |
23F8DD65-4C57-4F00-8A1A-24CDBC97B3AF |
Saga1 Info |
|
9F68A125-A2F4-42A2-A67A-A65500B0F531 |
<endpoint> |
d2d2a56f-3eb3-4e18-bbb5-a65500b0edff |
90F292E4-3D95-493C-A699-ABF2DAF02326 |
Saga2 Info |
|
5B680084-4D91-412C-A1CA-A65500B0F556 |
<endpoint> |
3b2feae2-9d9d-4321-8724-a65500b0f264 |
F7E5EDB2-DE16-494F-B6A4-0019D8EFCAF0 |
Saga3 Info |
|
trackingsagadata_key |
AlmostUniqueId1 |
AlmostUniqueId2 |
StartedDateTime |
Progress1DateTiem |
Progress2DateTime |
CompletedDateTime |
|
5A88EF5E-2F4A-4583-80E2-A65500B0F42F |
ABC |
123 |
8/1/2016 8:41 |
8/1/2016 9:25 |
8/3/2016 9:56 |
8/4/2016 9:59 |
|
5A88EF5E-2F4A-4583-80E2-A65500B0F42F |
DEF |
456 |
8/2/2016 8:43 |
8/2/2016 9:27 |
8/4/2016 9:56 |
8/4/2016 9:57 |
|
5A88EF5E-2F4A-4583-80E2-A65500B0F42F |
HIJ |
789 |
8/3/2016 8:45 |
8/3/2016 9:29 |
||
|
9F68A125-A2F4-42A2-A67A-A65500B0F531 |
XYZ |
439 |
8/4/2016 8:47 |
|||
|
9F68A125-A2F4-42A2-A67A-A65500B0F531 |
URC |
654 |
8/5/2016 8:49 |
8/5/2016 9:33 |
||
|
9F68A125-A2F4-42A2-A67A-A65500B0F531 |
POI |
878 |
8/6/2016 8:51 |
8/6/2016 9:35 |
8/4/2016 9:56 |
|
|
9F68A125-A2F4-42A2-A67A-A65500B0F531 |
LKI |
414 |
8/7/2016 8:53 |
8/7/2016 9:37 |
||
|
9F68A125-A2F4-42A2-A67A-A65500B0F531 |
QWE |
555 |
8/8/2016 8:55 |
|||
|
9F68A125-A2F4-42A2-A67A-A65500B0F531 |
CST |
777 |
8/9/2016 8:57 |
I changed the saga data definition from Dictionary to List and I think that helped but it did not completely solve the deadlock problem. Instead of almost everything deadlocking, it now processes some things successfully but it still has regular deadlocking on every attempted run.
public class MySagaData : ContainSagaData
{
[Unique]
public virtual string UniqueBecauseId1AndId2AreUnique { get; set; }
public virtual string AlmostUniqueId1 { get; set; }
public virtual string AlmostUniqueId2 { get; set; }
public virtual IList<TrackingObject> TrackingObjects { get; set; } = new List<TrackingObject>();
}
(from the ClassMapping object)
this.List(sagaData => sagaData.TrackingObjects,
collection =>
{
collection.Table("TrackingObjectCollection");
collection.OptimisticLock(true);
},
collectionElementRelation =>
{
collectionElementRelation.Component(m =>
{
m.Property(x => x.AlmostUniqueId1);
m.Property(x => x.AlmostUniqueId2);
m.Property(x => x.StartedDateTime);
m.Property(x => x.Progress1DateTime);
m.Property(x => x.Progress2DateTime);
m.Property(x => x.CompletedDateTime);
});
});
The SQL Server key lock deadlocks are definitely because 2 or more processes are attempting to INSERT INTO the TrackingObject collection and locking on the trackingsagadata_key (from the mocked data sample above).
I also want to mention that in the 2 places in the code where an item is inserted into the collection, it does so in a foreach over the data returned from another location (again, data driven but could be a few or could be a lot of items in the list to be inserted).
You mentioned that LockMode.Upgrade or Read doesn’t have anything to do with TrackingObjects because it is in a different table. What does affect the locking mode for this table?
Does each sagaData.TrackingObjects.Add( data ) call take a lock or does this already operate in a batch mode insert? (If not, is there a way to insert an item into a collection like this in a batch mode so that it takes 1 lock does the insert, and then release it or is this taking a separate lock for each?)
Mark