Monitoring your NServiceBus-related system

Yifat Shani

unread,

Jun 19, 2017, 2:21:05 AM6/19/17

to Particular Software

Hi everyone,

We'd like to share information about the work we've been doing and invite you to share your experience.

We all know that monitoring distributed systems is challenging. Some of you use third-party monitoring tools for infrastructure and others have built, or are looking to build, a home-grown solution.
So we've been working on enhancing the Particular Service Platform to close this gap and provide a means of monitoring your NServiceBus-related system more easily.

The initial offering will focus on identifying some key metrics for assessing the health of a system and then presenting these metrics to you in a manner that's easy to visualize and consume.

In the weeks ahead we will share more information about our monitoring philosophy and how we are looking to ease the pain of implementing it. So follow our blog to get notified of updates.
In the meantime, we want to hear about your needs and problems. If you've built your own monitoring solution or if you're considering building one, what does it look like?

This is your chance. Let us know below!

The team, in Particular

Yifat Shani

unread,

Jun 19, 2017, 4:25:08 AM6/19/17

to Particular Software

Join us for our first live webinar on the monitoring theme, Wednesday, June 28 at 12:00 EDT (17:00BST).

William Brander and Sean Farmar will show the metrics you should consider when monitoring microservices.

Reserve your spot here.

Mike Minutillo

unread,

Jul 11, 2017, 3:08:06 AM7/11/17

to particula...@googlegroups.com

Hi everyone,

We hope you enjoyed the webinar. If you missed it, the recording is available here.

Everyone's system is different, and we'd like to hear about yours.

* How many endpoints are in your system?
* Are you using monitoring tools to keep an eye on your system? If so, which ones?
* Do you use different tools to monitor your infrastructure and applications?

By answering these short questions about your environment you can help us to improve the monitoring experience of the Particular Service Platform.

We look forward to hearing from you.

The team, in Particular

Tadley Cyclist

unread,

Jul 11, 2017, 9:33:29 AM7/11/17

to Particular Software

Hi,

We have 6 endpoints in our application - one of which hooks into our own "Health Monitor" tool and simply consumes events raised by ServiceControl.

So for the most part we use ServiceControl to monitor both the "up time" for the endpoints and also any failed messages. Our own "Health Monitor" logs events to a central database which our service desk have access to, but also sends a email to the operations team.

In addition to the above, we use a tool called PRTG to monitor many of the server parameters (disk space, memory usage, and so on) plus this can look at the dead letter queues (we use MSMQ as a transport) to ensure there isn't some sort of transport level issue.

On the last point, one of our endpoints is configured to be send-only which means we can't use the Heartbeat plug-in to monitor it. I'm curious to know why this should be the case? We had a scenario recently where MSMQ on the servers appeared to be up and running but messages could not be transferred between servers. So the messages ended up in the DLQ of the machine with the send-only endpoint. As it stands, I can't see a way around this without rolling our own monitor to look at the outgoing queues and raise the alarm if a message has sat there for longer than typically required to deliver the message.

Ian Jones

Distribution Technology

Mike Minutillo

unread,

Jul 12, 2017, 10:06:20 PM7/12/17

to Particular Software

Hi Ian,

Thanks for your response.

I'm interested in your Health Monitor app. ServiceControl raises 5 different types of event (Heartbeat Started/Stopped, Custom Check Succeeded/Failed, and Failed Message). Do you subscribe to all of them? Do you store all of them in the database? Does each one trigger an email?

The Heartbeat plugin can be used to monitor Send-Only endpoints in NServiceBus 5 and 6. It turns out we fixed this a while ago and missed changing the documentation! We have fixed the documentation now. This didn't work before because the code that controlled when to start and stop sending heartbeats was tied into the code for starting and stopping the receive pipeline.

This will help you to know when the endpoint is not running but it won't help you to detect messages in the DLQ. You still need to monitor the DLQ with PRTG as well.

Regards,

Mike Minutillo

Particular Software

Tadley Cyclist

unread,

Jul 13, 2017, 6:44:41 AM7/13/17

to Particular Software

Hi Mike,

With regard to the Health Monitor, in life before NServiceBus, we implemented a lot of batch jobs (it's now my life's work to try and kill them!) which somebody had to manually check the status of each day. So we built a single logging store which our batch jobs plus our online applications could write to. It has a simple web front-end which means the Ops guys can see at a glance, whether any batch jobs failed and so on. Think of it like a poor man's ServicePulse! I'd love to use ServicePulse in the office, but as you know, it's possible to see the content of messages through the UI, which would mean potentially sensitive client data could be viewed by people who shouldn't be able to see it. I'd also like to use NServiceBus to handle the messaging between the batch jobs, etc. and our Health Monitor - one step at a time though!

In terms of what it actually subscribes to - currently it's just Failed Message and Heartbeat Stopped. These were deemed to be the two things we really wanted visibility of, so as well as logging the events, the Health Monitor emails the Ops team.

I discovered that the Heartbeat Plug-in can be used with Send-Only endpoints a couple of days ago - Mauro Servienti pointed out that it was the documentation which was wrong and promptly fixed it! So adding this to the web-application which hosts the endpoint is now on my "to do" list.

The only "hole" in our monitoring that still concerns me is around MSMQ rather than NServiceBus itself. We've had a couple of cases where messages have not been delivered despite everything appearing to be well with the machines concerned. Monitoring of the DLQ highlighted this, but to me this feels like a last resort - I want visibility of communication issues earlier. The only way I can think of doing this would be to build a little service that monitors the outgoing queues and flags any messages that have been sat there longer than a specified time.

All the best,

Ian.

Justin Alexander

unread,

Jul 13, 2017, 3:26:33 PM7/13/17

to Particular Software

Tad/Mike,

Thanks for what you've shared in this thread. Its been of interest to me as I'm actively working to increase the maturity of the operational standards/procedures my team applies to our NSB-based systems.

I wanted to quickly add that I have seen some other members of the community suggest using CustomChecks as a way to monitor things like outgoing queues and/or DLQ within ServicePulse. I haven't gotten around to experimenting with this yet, so I can't offer a concrete code example.. but hopefully that will change in the near future. :)