Vernamq metrics

140 views
Skip to first unread message

Anoop tk

unread,
Aug 12, 2020, 3:34:05 AM8/12/20
to vernemq-users
Hello Users,

Your valuable comments and suggestions are much appreciated!

We are planning to migrate our traditional MQTT broker to a new one, and is considering Verne-MQ as one of the preferred choice. We found parity in MQTT features provided by Verne MQ and would like to set up a trial system for demo. We would like to have a clarification on the metrics assessment, for which following is considered as our objective.

a. Custom Scripts for health and error reporting (based on metrics exposed)

b. Analyze and fix discrepancies.

c. Evaluation and service recovery of the data loss.

We have identified following metrics to be used to identify the data loss (case where a subscriber no longer receive the message due to an error in system)

1.counter.queue_message_in > counter_queue_message_out

2.counter.mqtt_invalid_msg_size_error >0

3.counter.queue_message_expired >0

4.counter.queue_message_drop >0

Is the above assumption complete ? Does verne mq expose any other approach to identify the data loss (not due to connectivity or subscriber offline). Would be glad to see the expert options on this.

Please note : Data loss is considered as a message received by broker, but failed to publish it to a subscriber on time or later, but skipped by Verne MQ system.

André Fatton

unread,
Aug 20, 2020, 12:02:41 PM8/20/20
to vernemq-users
Hi Anoop,
great to see you evaluating VerneMQ! :)
I guess your most important measure here is the "message drop" counter (or the corresponding fact as it shows in the logs). It means that your consumers can't keep up, and VerneMQ will drop messages to the floor, on the outgoing end.

I can't draw the full picture on how to setup things in a production environment here. But let me add that it's probably useful to not think about your messaging infrastructure as an un-bounded architecture. VerneMQ's internal components are un-bounded (queues, etc.), but this does not mean that you wouldn't have to think about your target load. You always build a system for a specific load, and one of the goals for that here is obviously that this is a load (and a system architecture) that doesn't loose messages. This is testable.

Cheers,
André
Reply all
Reply to author
Forward
0 new messages