Deadlock

51 views
Skip to first unread message

Matthias Krings

unread,
Jul 15, 2015, 5:42:29 AM7/15/15
to masstrans...@googlegroups.com
I found a scenario in which MT (2.9) causes a deadlock, which it shouldn't as far as I can see. I boiled it down to the code below, which resembles more or less my original code. In the real world, there are more queues and processes involved, I use RabbitMQ, and instead of Thread.Sleep() some real work is done. Unfortunately, I was not able to come up with something easier.

private static DateTime _start;
 
private static void Main(string[] args)
{
	const int count = 40;
	_start = DateTime.Now;
	var bus = ServiceBusFactory.New(sbc => { BasicBusConfiguration(sbc, "res"); });
	var pb = 0;
	var pc = 0;
	var pd = 0;
	var pe = 0;
	var hb = 0;
	var he = 0;
 
	bus.SubscribeHandler<MessageD>(
		msg =>
			bus.MessageContext<MessageD>()
				.Respond(new MessageE(), context => { Log("Publish E: "ref pe); }));
 
	bus.SubscribeHandler<MessageC>(
		msg =>
		{
			Log("Publish D: "ref pd);
			bus.PublishRequest(new MessageD(), x => x.Handle<MessageE>(y =>
			{
				Log("Handle E: "ref he);
				Thread.Sleep(200);
			}));
		});
 
	bus.SubscribeHandler<MessageA>(msg =>
	{
		bus.MessageContext<MessageA>().Respond(new MessageB(), context =>
		{
			for (var i = 0; i < count; i++)
			{
				Thread.Sleep(10);
				Log("Publish C: "ref pc);
				bus.Publish(new MessageC());
			}
			Log("Publish B: "ref pb);
		});
	});
 
 
	bus.PublishRequest(new MessageA(),
		x => x.Handle<MessageB>(y => { Log("Handle B: "ref hb); }));
	var f = 0;
	Log("Finished!"ref f);
	Console.ReadLine();
}
 
private static void Log(string text, ref int count)
{
	Console.WriteLine(DateTime.Now.Subtract(_start).Ticks/10000 + " " + text + count++);
}
 
private static void BasicBusConfiguration(ServiceBusConfigurator sbc, string queue)
{
	var uriString = "loopback://localhost/" + queue;
	sbc.ReceiveFrom(uriString);
}
 
[Serializable]
public class MessageBase : CorrelatedBy<Guid>
{
	public MessageBase()
	{
		CorrelationId = Guid.NewGuid();
	}
 
	public Guid CorrelationId { getset; }
}
 
[Serializable]
public class MessageD : MessageBase
{
}
 
[Serializable]
public class MessageC : MessageBase
{
}
 
[Serializable]
public class MessageB : MessageBase
{
}
 
[Serializable]
public class MessageA : MessageBase
{
}
 
[Serializable]
public class MessageE : MessageBase
{
}

The code can be read from bottom to top:
  • MessageA is published, and it waits for MessageB
  • The handler of MessageA sends out 40 MessageC with 10ms pause in between, before sending out MessageB.
  • The handler of MessageC sends out Message D, and waits for Message E. Upon receive of Message E, it waits for 200 ms.
  • The handler of Message D sends out Message E right away.
I can understand that the handler of E blocks somehow incoming messages, which is the reason why it takes some time until MessageB makes it to the original sender. At least, this is the case when the constant "count" is set to 30 instead of 40. With 30, on my computer the output looks like this:

1297 Publish C: 0
1307 Publish D: 0
1317 Publish C: 1
1322 Publish D: 1
1329 Publish C: 2
1335 Publish E: 0
1358 Publish C: 3
1359 Publish E: 1
1362 Publish D: 2
1368 Publish D: 3
1369 Handle E: 0
1369 Handle E: 1
1370 Publish C: 4
1373 Publish E: 2
1374 Publish D: 4
1381 Publish C: 5
1392 Publish C: 6
1403 Publish C: 7
1414 Publish C: 8
1425 Publish C: 9
1436 Publish C: 10
1448 Publish C: 11
1459 Publish C: 12
1470 Publish C: 13
1481 Publish C: 14
1492 Publish C: 15
1503 Publish C: 16
1514 Publish C: 17
1525 Publish C: 18
1536 Publish C: 19
1546 Publish C: 20
1557 Publish C: 21
1568 Publish C: 22
1572 Handle E: 2
1573 Publish E: 3
1574 Publish E: 4
1574 Publish D: 5
1575 Publish D: 6
1575 Publish D: 7
1579 Publish C: 23
1590 Publish C: 24
1601 Publish C: 25
1612 Publish C: 26
1623 Publish C: 27
1635 Publish C: 28
1646 Publish C: 29
1646 Publish B: 0
1648 Publish D: 8
1773 Publish D: 9
1774 Publish D: 10
2769 Publish D: 11
3769 Publish D: 12
4770 Publish D: 13
5769 Publish D: 14
6769 Publish D: 15
7771 Publish D: 16
8775 Publish D: 17
9769 Publish D: 18
10770 Publish D: 19
11770 Publish D: 20
12770 Publish D: 21
13772 Publish D: 22
14770 Handle E: 3
14970 Handle E: 4
14971 Publish E: 5
14972 Publish D: 23
15172 Publish E: 6
15173 Publish E: 7
15174 Publish D: 24
15174 Publish D: 25
15770 Publish D: 26
16770 Publish D: 27
17770 Publish D: 28
18770 Publish D: 29
19771 Handle B: 0
19772 Finished!0
19773 Publish E: 8
19775 Publish E: 9
19776 Publish E: 10
19779 Publish E: 11
19780 Publish E: 12
19782 Publish E: 13
19783 Publish E: 14
19785 Publish E: 15
19786 Publish E: 16
19787 Publish E: 17
19788 Publish E: 18
19790 Publish E: 19
19791 Publish E: 20
19792 Publish E: 21
19794 Publish E: 22
19797 Handle E: 5
20000 Publish E: 23
20000 Handle E: 6
20001 Handle E: 7
20201 Publish E: 24
20202 Publish E: 25
20203 Publish E: 26
20203 Publish E: 27
20204 Publish E: 28
20205 Handle E: 8
20205 Publish E: 29
20207 Handle E: 10
20210 Handle E: 11
20206 Handle E: 9
20408 Handle E: 12
20409 Handle E: 13
20412 Handle E: 14
20413 Handle E: 15
20415 Handle E: 16
20416 Handle E: 17
20417 Handle E: 18
20417 Handle E: 19
20610 Handle E: 20
20634 Handle E: 21
20635 Handle E: 22
20636 Handle E: 23
20636 Handle E: 24
20637 Handle E: 25
20639 Handle E: 28
20640 Handle E: 29
20638 Handle E: 27
20638 Handle E: 26

It takes nearly 20 seconds until the original answer MessageB makes it back to the first caller. I understand why it is a bit longer, I do not understand why it is that long, though. Processing all 30 MessageE at 200ms would maximal take 6 seconds, even disregarding paralellism.

Anyway, it becomes much worse when I try the same with 40 messages. In that case the code ends in a deadlock, the output is like this:

1310 Publish C: 0
1322 Publish D: 0
1327 Publish C: 1
1353 Publish C: 2
1359 Publish D: 1
1362 Publish D: 2
1365 Publish C: 3
1369 Publish D: 3
1372 Publish E: 0
1375 Publish E: 1
1376 Publish C: 4
1379 Publish E: 2
1381 Handle E: 0
1381 Publish E: 3
1382 Handle E: 1
1382 Publish D: 4
1387 Publish C: 5
1398 Publish C: 6
1408 Publish C: 7
1419 Publish C: 8
1429 Publish C: 9
1440 Publish C: 10
1451 Publish C: 11
1463 Publish C: 12
1473 Publish C: 13
1484 Publish C: 14
1495 Publish C: 15
1506 Publish C: 16
1517 Publish C: 17
1528 Publish C: 18
1539 Publish C: 19
1550 Publish C: 20
1561 Publish C: 21
1573 Publish C: 22
1584 Publish C: 23
1584 Handle E: 2
1585 Handle E: 3
1586 Publish D: 5
1587 Publish E: 4
1588 Publish D: 6
1595 Publish C: 24
1606 Publish C: 25
1617 Publish C: 26
1628 Publish C: 27
1639 Publish C: 28
1650 Publish C: 29
1661 Publish C: 30
1672 Publish C: 31
1683 Publish C: 32
1694 Publish C: 33
1705 Publish C: 34
1716 Publish C: 35
1727 Publish C: 36
1738 Publish C: 37
1749 Publish C: 38
1760 Publish C: 39
1760 Publish B: 0
1763 Publish D: 7
1767 Publish D: 8
1785 Publish D: 9
1786 Publish D: 10
1786 Publish D: 11
1787 Publish D: 12
2768 Publish D: 13
3768 Publish D: 14
4768 Publish D: 15
5767 Publish D: 16
6774 Publish D: 17
7768 Publish D: 18
8767 Publish D: 19
9769 Publish D: 20
10768 Publish D: 21
11768 Publish D: 22
12770 Publish D: 23
13769 Handle E: 4
13970 Publish E: 5
13970 Publish E: 6
13971 Publish D: 24
13971 Publish D: 25
14768 Publish D: 26
15769 Publish D: 27
16767 Publish D: 28
17770 Publish D: 29
18768 Publish D: 30
19769 Publish D: 31
20770 Publish D: 32
21769 Publish D: 33
22769 Publish D: 34
23769 Publish D: 35
24770 Publish D: 36


Here the output stops, all Threads are in a WaitOne(), but I do not see why. I see that at some point all available threads are waiting in the Sleep(200). But later, one of these threads shoud end. Thus, this one will become free again. Now we either process another MessageE, and will again wait no longer than 200ms. Or we process MessageD, and thus will publish another MessageE. After publishing E, again a thread will be free.

What am I missing here?

Thanks a lot!

Matthias Krings

unread,
Aug 18, 2015, 4:54:24 AM8/18/15
to masstransit-discuss
May I push this one again? Is there anybody who can reproduce my problem, i.e. experiences a deadlock when executing the code? Does the deadlock disappear when lowering the constant "count" in the first line? If so, is this behavior expected or an error?
Reply all
Reply to author
Forward
0 new messages