Self hosting with multiple endpoints but we only get one TimeoutManager

139 views
Skip to first unread message

Steve James

unread,
Sep 13, 2017, 9:46:17 AM9/13/17
to Particular Software
Hi, could you please confirm if the below behaviour is expected.

When we self host and start multiple endpoints (using SqlTransport), whenever we send a Delayed message only the first endpoint that was started polls the TimeoutEntity table.
The Endpoint column in TimeoutEntity is always populated with that endpoints name.

We also get this intermittent exception ".. Immediate Retry is going to retry message '8b

a04960-a8ab-4d80-8e8b-0c70d027d180' because of an exception: System.Exception: timeout '29dcb273-e656-4871-bdae-9900d883e7ec' was concurrently processed."


Should we be setting an endpoint to be a master (and if so how)?

In a scaled out scenario what stops multiple servers processing the same Timeout messages?



Helpful information to include
Product name: NServiceBus, NSerivceBus.NHibernate, NHibernate, NServiceBus.SqlServer
Version: NServiceBus 6.1, NSerivceBus.NHibernate 7.1.2, NHibernate 4.0.4.4000, NServiceBus.SqlServer 3.0.1
Stacktrace:
Description:

Szymon Pobiega

unread,
Sep 14, 2017, 2:07:39 AM9/14/17
to particula...@googlegroups.com
Hi

Do you mean multiple endpoints (different logical name) or different instances of the same endpoint for scaling out? The Endpoint column in the TimeoutEntity table is the name of the endpoint that is supposed to receive the delayed message.

Regarding the intermittent exception, it happens in all scale-out scenarios. The reason is that multiple instances of the same endpoint query for due timeouts and it might happen that more than one instance actually loads a given timeout entity. Then both of them send (to themselves) a timeout dispatch message. The handling of this dispatch message looks like this:
 1. Load the TimeoutEntity from the database
 2. Dispatch the timeout message to the destination
 3. Try remove the TimeoutEntity. If can't, throw exception and retry
Bottom line is that only one instance manages to successfully delete the TimeoutEntity which ensures exactly-once semantics are preserved and no duplicates are introduced

That all said, if you upgrade to the latest version of SQL Server transport you will get native delayed delivery handling within the transport. This native handling is much more robust and requires a lot more database traffic.

Szymon

--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftware+unsub...@googlegroups.com.
To post to this group, send email to particularsoftware@googlegroups.com.
Visit this group at https://groups.google.com/group/particularsoftware.
To view this discussion on the web visit https://groups.google.com/d/msgid/particularsoftware/b40c74ee-9296-4c60-9e0c-bf552ee972e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Steve James

unread,
Sep 14, 2017, 4:11:12 AM9/14/17
to Particular Software
Hi Szymon,
We have different endpoints with different logical names, currently only running in one self hosting service.
When we do a sql trace we can only see sql statements selecting from the TimeoutEntity for the first endpoint we start e.g. 
"exec sp_executesql N'SELECT this_.Id as y0_, this_.Time as y1_ FROM TimeoutEntity this_ WHERE this_.Endpoint = @p0 and (this_.Time > @p1 and this_.Time <= @p2) ORDER BY this_.Time asc',N'@p0 nvarchar(4000),@p1 datetime,@p2 datetime',@p0=N'FirstEndpointName',@p1='2007-09-14 07:14:59',@p2='2017-09-14 07:18:00'

 

  

Szymon Pobiega

unread,
Sep 14, 2017, 4:17:28 AM9/14/17
to particula...@googlegroups.com
Hi

Can you post the config code of all your endpoints? I am interested in the persistence and container setup in particular.

Szymon

--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftware+unsub...@googlegroups.com.
To post to this group, send email to particularsoftware@googlegroups.com.
Visit this group at https://groups.google.com/group/particularsoftware.

Steve James

unread,
Sep 14, 2017, 5:47:23 AM9/14/17
to particula...@googlegroups.com
Hi, I've sent details in a private response.
Ah, they appear to have been blocked. 
Cheers 

Szymon Pobiega

unread,
Sep 14, 2017, 8:48:31 AM9/14/17
to particula...@googlegroups.com
Hi

The reason why you see this behavior is that all your endpoints share the same container. Although NServiceBus with each release relies on the container less and less, it still registers some of the components in the container. Most of them are singletons and some of them require certain properties passed via constructor.

In your case each endpoint registers a timeout poller object as a singleton, each with its own endpoint name. Depending on the container implementation, either the first or the last registered poller is actually present in the container. The other registrations are ignored. As a result, all the endpoints use the same poller object. This is probably only a tip of the iceberg because NServiceBus has more components registered like this e.g. the message dispatcher.

The solution is stop sharing the container instance and only share the container configuration (ContainerBuilder in case of Autofac). This way all your containers will be configured the same way (like you want) but each endpoint will get its own instance.

Cheers,
Szymon

2017-09-14 11:47 GMT+02:00 'Steve James' via Particular Software <particula...@googlegroups.com>:
Hi, I've sent details in a private response.
Cheers 

--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftware+unsub...@googlegroups.com.
To post to this group, send email to particularsoftware@googlegroups.com.
Visit this group at https://groups.google.com/group/particularsoftware.

Steve James

unread,
Sep 14, 2017, 8:50:46 AM9/14/17
to Particular Software
 Further investigation has led us to believe that the issue is occuring because we create one autofac container and use the same container for all endpoints. If we comment out the call to endpointConfiguration.UseContainer then we get multiple selects againsts the EntityTimeout table.

Steve James

unread,
Sep 14, 2017, 8:52:09 AM9/14/17
to Particular Software
Snap!
We will implement your suggestion. Thanks for your help.
Reply all
Reply to author
Forward
0 new messages