Saga NHibernatePersistence & SQL Deadlocks?

786 views
Skip to first unread message

Mark Harris

unread,
Jul 28, 2016, 7:46:15 PM7/28/16
to Particular Software
Helpful information to include
Product name:NServiceBus Saga (IFindSaga and Mapping)
Version: 5.x
Stacktrace:
Description:

Hi,

 

                I have a situation where I have 2 properties that together uniquely identify my saga data.  In a single threaded environment, my saga mapping works great (both versions that I will describe below).  However, the production environment is highly concurrent and my saga mapping falls apart and degenerates into a log full of SQL server deadlocks and I don’t understand why.

 

Here’s a sample of what fills the logs:

 

NHibernate.Util.ADOExceptionReporter - System.Data.SqlClient.SqlException (0x80131904): Transaction (Process ID 70) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

 

NHibernate.Event.Default.AbstractFlushingEventListener - Could not synchronize database state with session

NHibernate.Exceptions.GenericADOException: could not execute batch command.[SQL: SQL not available] ---> System.Data.SqlClient.SqlException: Transaction (Process ID 70) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

 

NServiceBus.Unicast.Transport.TransportReceiver - Failed to process message

NHibernate.Exceptions.GenericADOException: could not execute batch command.[SQL: SQL not available] ---> System.Data.SqlClient.SqlException: Transaction (Process ID 70) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

 

 

                I started with a custom IFindSaga that allowed me to pull the two properties from the message and query the table.  Again, this worked great in a single threaded environment but in a multithreaded environment this does little more than deadlock (almost every message).

 

    public class MySagaFinder :

        IFindSagas<MySagaData>

            .Using<MyMessage1>

    {

        public NHibernateStorageContext StorageContext { get; set; }

 

        public MySagaData FindBy(MyMessage1 message)

        {

            return StorageContext.Session.QueryOver<MySagaData>()

                .Where(x =>

                    x.Id1 == message.AlmostUniqueId1

                    && x.Id2 == message.AlmostUniqueId2)

                .SingleOrDefault();

        }

    }

 

               

                I then removed the IFindSaga above and replaced it with this ConfigureHowToFindSaga mapping:

      

mapper.ConfigureMapping<MyMessage1>(

        message => $"{ message.AlmostUniqueId1}_{ message.AlmostUniqueId2}")

        .ToSaga(sagaData => sagaData.UniqueBecauseId1AndId2AreUnique);

 

                It helped but only by a little.  I am still seeing a tremendous number of deadlocks but rather than almost 100% it’s more like 70% of the messages trying to get to the saga data will cause deadlocks.

 

 

                I suspect that I can optimize some of this by applying indexes and other optimizations.  However, I tried the following mapping but it didn’t seem to help either:

 

            this.Property(x => x.UniqueBecauseId1AndId2AreUnique,

                m =>

                {

                    m.Unique(true);

                    m.NotNullable(true);

                    m.Index("idx_UniqueIdAndConversationId");

                    m.UniqueKey("key_UniqueIdAndConversationId");

                });

 

 

                I think my first question is – why would this mapping cause a saga operating in a highly concurrent environment to deadlock so severely (so severe that almost no messages can reach their saga data)?  Secondly, are there any tips or tricks to troubleshoot and resolve issues like this?

 

Thanks,

Mark

ramon...@particular.net

unread,
Jul 29, 2016, 4:18:11 AM7/29/16
to Particular Software

Hi Mark,

Deadlocks usually occur when there is no index OR when there is not a strict update route. The no index issues is easily solved. Just add the correct indexes and then the database can use an index seek and jump to the record to read all data. If you have a complex type you need to have correct indexes on all related tables that are part of this aggregate.

By recording a SQL Profiler session its easy to see which operations might require an index and SQL Profiles can also be configured to store details about the deadlocks to helps diagnose issues.

What often happens is that a lock is put on the row and then when data is updated the relevant indexes need to be updated too so you not only lock the row but *also* the indexes. This is very often the cause of deadlocks too so make sure you only have the indexes that are in use. Delete indexes that are not required.

The absolute worse thing that can happen is when SQL decides to move from a row to a table lock which can happen when there is no index and that will also very quickly result in deadlocks.


SQL Profiler is your friend! Please read for example the following article 

http://sqlmag.com/database-performance-tuning/gathering-deadlock-information-deadlock-graph


Let us know if that helps to find what is causing the deadlocks.


Also, can you share your full saga type?



Regards,
Ramon

Mark Harris

unread,
Jul 29, 2016, 1:54:59 PM7/29/16
to Particular Software

Hi Ramon,

 

I am using a simplified sample to facilitate the discussion but this is what the saga data looks like:

 

public class MySagaData : ContainSagaData

{

    [Unique]

    public virtual string UniqueBecauseId1AndId2AreUnique { get; set; }

 

    public virtual Guid AlmostUniqueId1 { get; set; }

 

    public virtual string AlmostUniqueId2 { get; set; }

 

    public virtual IDictionary<Guid, TrackingObject> TrackingObjects { get; set; } = new Dictionary<Guid, TrackingObject>();

}

 

 

I have also used SQL profiler and found the locking/deadlock problem to be related to the IDictionary table in my saga data.  The mapping I am now using is pasted below and I think this is part of the problem.  It creates the saga table and tracking object table exactly like it needs to be designed but perhaps the indexes are not quite optimal?

 

            this.Map(sagaData => sagaData.TrackingObjects,

                collection =>

                {

                    collection.Table("SagaDataTrackingObjects");

                    collection.OptimisticLock(true);

                },

                 collectionElementRelation => collectionElementRelation.Component(m =>

                 {

                     m.Property(x => x.OrderStarted);

                     m.Property(x => x.NumberCollected);

                     m.Property(x => x.NumberProcessed);

                     m.Property(x => x.NumberDelivered);

                     m.Property(x => x.OrderCompleted);

                 }));

 

Also, (alternatively?) I think what is happening is that multiple threads are trying to insert into the Dictionary collection (which should be ok) but it seems to be taking a lock on the whole table instead of just a single row.    

 

I found http://docs.particular.net/nservicebus/sagas/concurrency and https://ayende.com/blog/3946/nhibernate-mapping-concurrency but I’m not clear on how to update the Saga Finder FindBy function so that it locks only the row it needs to update in the Dictionary (sub table). 

 

When I try to follow the pattern at the second link, I have something like this in my IFindSaga .FindBy ( message ) function:

 

using (StorageContext.Session.BeginTransaction())

{

     var id = $"{message.AlmostUniqueId1}_{message.AlmostUniqueId2}";

 

     return StorageContext.Session.Get<TrackingObject>(id, LockMode.Read);

}

 

I am assuming Get<> locks less records? (not really sure, just trying to follow the pattern) but I don’t know what to do next because this doesn’t even build since I need to return a SagaData here and not a single element from the collection.

 

Do you have any tips, samples, or references for how to deal with a saga data that contains a dictionary object (sub table) but how to lighten the locks in the collection table because of a concurrent scenario?

 

Thanks again for your help.

Mark

ramon...@particular.net

unread,
Jul 29, 2016, 4:00:46 PM7/29/16
to Particular Software

Hi Mark,


Thanks for sharing!


I wouldn't use LockMode.Read but LockMode.Upgrade as you are going to update the entity and you know in advance that later in the same transaction you will do an update on the. This will result in a row lock on the aggregate root which is what you want to sync updates on the same entity.

The LockMode.Upgrade doesn't have anything to do with the TrackingObjects  collection as that is a different table. I'll have to see what schema is created on the database. However, because a Dictionary contains unique items NHibernate will always need to fetch all records before it can add a new object. That can be a pretty expensive read operation. Instead, I would optimize the collection for adds if you are sure that no duplicates would ever be inserted. You can do that by not using a Dictionary but a regular List.

If you then need to ever read a single item you would just do TrackingObjects.Where(x=>x.Id = id);

Doing that will result in a lean method to add an object without needing to read all rows from the database first.


If you have lots of contention on a single saga instance then this will help INSERT performance as it does not need SELECT all dictionary items first. In that case you can also change the LockMode to Read as now the entity updates are not required to be synchronized.


Is there a specific reason that you are storing those tracking objects in your saga? A saga is also a consistency boundary. If there is no need to store these as a single entity then you might be better of storing these on their own.


Can you share how many of such tracking objects can be stored in a single saga and maybe what the saga its responsibility it?


I'm not sure but do all these tracking object properties need to be stored in your saga? Maybe you can store most of them separately from your saga and only store the data that is relevant for your saga

Last but not least, NHibernate is not always able to create all indexes and also can create lots of additional indexes that you never use. If you have a high volume environment with lots of contention the I would not let NHibernate auto update the schema.


Regards,
Ramon

Mark Harris

unread,
Aug 1, 2016, 12:15:01 PM8/1/16
to Particular Software

The saga initiates several sub workflows during the course of its processing.  The reason it uses tracking objects is because it doesn’t know how many data acquisition sub workflows it will have to initiate until it has initiated them all.  The data collected during the process determines when it’s done.  So, it uses a tracking object to track that it started a sub workflow, tracked its progress, and then eventually its completion.  When all tracking objects are complete, the saga knows its complete.  An additional complication is that each sub workflow is essentially identical except for the data.  So, the last update handler basically has to check to see if all sub workflow progress is complete and if every tracking object is complete the saga can then be completed.

 

For example, saga1 might start 3 sub-workflows.  The saga records tracking info and updates its tracking/persistence periodically during this processing.  In this example, sub-workflow 1 completes in a short period of time.  Sub-workflow 3 might take longer and sub-workflow 2 might take a much longer time.  Saga2 might have 20 sub-workflows and they also complete in different orders. 

 

At each final message update, the saga checks to see if there are any non-complete tracking objects and if all are complete it marks the saga complete.  So essentially, the tracking objects are just a collection to help know how many sub workflows the saga started and track their progress.

 

Here is a mocked example of the Saga data:

 

Id

Originator

OriginalMessageId

UniqueId

TrackingInfo_Descriptive

5A88EF5E-2F4A-4583-80E2-A65500B0F42F

<endpoint>

d8270063-ca45-4516-8a3f-a65500b0f178

23F8DD65-4C57-4F00-8A1A-24CDBC97B3AF

Saga1 Info

9F68A125-A2F4-42A2-A67A-A65500B0F531

<endpoint>

d2d2a56f-3eb3-4e18-bbb5-a65500b0edff

90F292E4-3D95-493C-A699-ABF2DAF02326

Saga2 Info

5B680084-4D91-412C-A1CA-A65500B0F556

<endpoint>

3b2feae2-9d9d-4321-8724-a65500b0f264

F7E5EDB2-DE16-494F-B6A4-0019D8EFCAF0

Saga3 Info

 

 

trackingsagadata_key

AlmostUniqueId1

AlmostUniqueId2

StartedDateTime

Progress1DateTiem

Progress2DateTime

CompletedDateTime

5A88EF5E-2F4A-4583-80E2-A65500B0F42F

ABC

123

8/1/2016 8:41

8/1/2016 9:25

8/3/2016 9:56

8/4/2016 9:59

5A88EF5E-2F4A-4583-80E2-A65500B0F42F

DEF

456

8/2/2016 8:43

8/2/2016 9:27

8/4/2016 9:56

8/4/2016 9:57

5A88EF5E-2F4A-4583-80E2-A65500B0F42F

HIJ

789

8/3/2016 8:45

8/3/2016 9:29

9F68A125-A2F4-42A2-A67A-A65500B0F531

XYZ

439

8/4/2016 8:47

9F68A125-A2F4-42A2-A67A-A65500B0F531

URC

654

8/5/2016 8:49

8/5/2016 9:33

9F68A125-A2F4-42A2-A67A-A65500B0F531

POI

878

8/6/2016 8:51

8/6/2016 9:35

8/4/2016 9:56

9F68A125-A2F4-42A2-A67A-A65500B0F531

LKI

414

8/7/2016 8:53

8/7/2016 9:37

9F68A125-A2F4-42A2-A67A-A65500B0F531

QWE

555

8/8/2016 8:55

9F68A125-A2F4-42A2-A67A-A65500B0F531

CST

777

8/9/2016 8:57

 

 

 

 

I changed the saga data definition from Dictionary to List and I think that helped but it did not completely solve the deadlock problem.  Instead of almost everything deadlocking, it now processes some things successfully but it still has regular deadlocking on every attempted run.

 

 

public class MySagaData : ContainSagaData

{

    [Unique]

    public virtual string UniqueBecauseId1AndId2AreUnique { getset; }

 

    public virtual string AlmostUniqueId1 { getset; }

 

    public virtual string AlmostUniqueId2 { getset; }

 

    public virtual IList<TrackingObject> TrackingObjects { getset; } = new List<TrackingObject>();

}

 

 

(from the ClassMapping object)

 

            this.List(sagaData => sagaData.TrackingObjects,

                collection =>

                {

                    collection.Table("TrackingObjectCollection");

                    collection.OptimisticLock(true);

                },

                collectionElementRelation =>

                {

                    collectionElementRelation.Component(m =>

                    {

                        m.Property(x => x.AlmostUniqueId1);

                        m.Property(x => x.AlmostUniqueId2);

                        m.Property(x => x.StartedDateTime);

   m.Property(x => x.Progress1DateTime);

   m.Property(x => x.Progress2DateTime);

   m.Property(x => x.CompletedDateTime);

                    });

                });

 

 

 

The SQL Server key lock deadlocks are definitely because 2 or more processes are attempting to INSERT INTO the TrackingObject collection and locking on the trackingsagadata_key (from the mocked data sample above).

 

I also want to mention that in the 2 places in the code where an item is inserted into the collection, it does so in a foreach over the data returned from another location (again, data driven but could be a few or could be a lot of items in the list to be inserted). 

 

You mentioned that LockMode.Upgrade or Read doesn’t have anything to do with TrackingObjects because it is in a different table.  What does affect the locking mode for this table? 

 

Does each sagaData.TrackingObjects.Add( data ) call take a lock or does this already operate in a batch mode insert?  (If not, is there a way to insert an item into a collection like this in a batch mode so that it takes 1 lock does the insert, and then release it or is this taking a separate lock for each?)


Mark

ramon...@particular.net

unread,
Aug 1, 2016, 12:42:26 PM8/1/16
to Particular Software

Hi Mark,

Quick reply to some questions. I'll look into your it tomorrow.

If inserting multiple tracking objects during the same transaction then SQL Server might decide to escalate to a page or table lock depending on the kind of data that is inserted and how that affects updating the indexes. Your tracking objects composite keys doesn't seem to result in incremental values which means, random inserts in an index and that is likely to escalate a lock.

Regarding the lock mode upgrade. An ORM manages an aggregate and with NHibernate the idea is that the root of the aggregate controls synchronization and version control for the whole aggregate. Meaning, that is you do a SELECT with an upgrade lock then any other consumer trying to do the same with be synchronized if that also uses an upgrade lock. However, if you know that child record in the aggregate are added then will never conflict then now you are synchronizing for no reason.


You select 'Upgrade' it's likely that that can be concurrency issues because there is modify contention on a specific aggregate. If this is unlikely, then don't. This is a decision that is different for each message type.


You are right that when you add multiple items to a collection that NHibernate will not immediately send these to the database (unless they require key that is generated by the database). When the session is triggered to save its changes then NHibernate will batch (multiple SQL statements in a single roundtrip ) its SQL but only if it is configured to do so. Google for NHibernate batch size.


Hope this helps for now. I'll look at your schema in depth tomorrow.


Regards,
Ramon

ramon...@particular.net

unread,
Aug 2, 2016, 8:01:18 AM8/2/16
to particula...@googlegroups.com

Hi Mark,

I've inspected the schema created by NHibernate. I didn't specify any custom mapping in my test but the 'TrackingObjects' only has a single non-unique index for the foreign key to the aggregate root this his MySagaData.

That shouldn't have issues for INSERT's into the tracking object table as these cannot be conflicting. However, the index still requires updating.

I think it's wise to inspect the SQL Profiler events and check which indexes and/or tables result in conflicts.

You can also experiment with the following query to retrieve your saga data:

    var instance = storageContext.Session
        .QueryOver<MySagaData>()
        .Lock().Upgrade
        .Where(d => d.UniqueBecauseId1AndId2AreUnique == id)
        .SingleOrDefault();


Mind the lock statement. That should result in a SELECT with an update and row lock and should synchronize writes. 

This potentially has a performance impact so it's best to measure what works best for you. Pessimistic locking to not have deadlocks or optimistic locking for more performance with an occasional deadlock that is retried automatically by NServiceBus its retry logic.


Regards,
Ramon

ps. I'm going on holiday so it could take a while before I respond.

Reply all
Reply to author
Forward
0 new messages