ack/deliver up to twice the publish rate, what's going on here?

807 views
Skip to first unread message

david....@sprinklr.com

unread,
Jul 25, 2016, 2:59:44 PM7/25/16
to rabbitmq-users

We're seeing something pretty weird with a fanout exchange and two queues. Each queue is showing a higher deliver count than publish count. The ack and deliver rates track pretty closely, publish is pretty much always less, as much as 2x less. The publish number tracks with the publish(in) number on the exchange. It almost seems like we're seeing the combined ack/deliver rate of BOTH queues attached to the exchange. We've looked at the web ui and at the numbers over the REST API and they're the same. These queues are feeding an analytics aggregator and the resulting counts coming back from the database look right so far, so initially it looks like the ack/deliver stats in rabbit are just crazy, we have no evidence that the messages are somehow multiplying in the queues. The queues are durable, and we're using acking consumers with the Java client.  Rabbit 3.6.1, Erlang 18.3, running on AmazonLinux in EC2.

Michael Klishin

unread,
Jul 25, 2016, 8:02:15 PM7/25/16
to rabbitm...@googlegroups.com
A message can be routed to more than 1 queue.

On Mon, Jul 25, 2016 at 9:59 PM, <david....@sprinklr.com> wrote:

We're seeing something pretty weird with a fanout exchange and two queues. Each queue is showing a higher deliver count than publish count. The ack and deliver rates track pretty closely, publish is pretty much always less, as much as 2x less. The publish number tracks with the publish(in) number on the exchange. It almost seems like we're seeing the combined ack/deliver rate of BOTH queues attached to the exchange. We've looked at the web ui and at the numbers over the REST API and they're the same. These queues are feeding an analytics aggregator and the resulting counts coming back from the database look right so far, so initially it looks like the ack/deliver stats in rabbit are just crazy, we have no evidence that the messages are somehow multiplying in the queues. The queues are durable, and we're using acking consumers with the Java client.  Rabbit 3.6.1, Erlang 18.3, running on AmazonLinux in EC2.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

michael....@sprinklr.com

unread,
Jul 25, 2016, 8:25:08 PM7/25/16
to rabbitmq-users
Hmm, I don't follow how that applies. 

We're looking at one queue. Whether that message was also routed into a different queue or not (though yes -- it definitely was!), simply couldn't affect the amount of deliveries out of the queue in question, nor could it lower the number of published messages going into the queue. Not as far as I understand.

Here's an image, in case there's some confusion over what we're describing, that might help:


Thank you for the answer, sorry that I can't follow the reasoning. Not sure if I'm a step behind your answer or if something wasn't fully clear in the question!

Michael Klishin

unread,
Jul 25, 2016, 9:00:46 PM7/25/16
to rabbitm...@googlegroups.com
Messages can be re-published (not automatically) or re-delivered (when manual
acknowledgements are used, see tutorial 2 on rabbitmq.com).

Queue length limits and TTL can also lead to a difference between
ingress (published) and egress (deliveries) rates because what's discarded
isn't going to be delivered.

Queues implementation is such that routed messages are enqueued
concurrently with deliveries (in a way that doesn't break the FIFO principle
they generally follow) so rates as reported to the management plugin won't
always match for a particular point in time (stats emission interval).

There is at least one more factor I can think of but given that on the chart the
difference is fairly consistent over a period of time, it's unlikely to matter.

michael....@sprinklr.com

unread,
Jul 26, 2016, 2:28:18 PM7/26/16
to rabbitmq-users
Thanks again Michael. I do think, however, you have our situation backwards.

The problem is that egress is exceeding ingress, not the other way around. Given that "all subsequently published messages will be confirmed or nack'd once" ack should never exceed publish...it is either a bug violating that principle or published messages are being underreported. Note that there are no reported redeliveries and we do not have any republish workflows set up. Our publish workflow is very simple: we publish to a fanout exchange bound to the queue in question.

We have very good reason to believe this is a bug/limitation of our version's statistics collector: upon further review, we found the next minor release ahead of us has a rewrite of the stats collector: https://github.com/rabbitmq/rabbitmq-management/issues/41 so we'll upgrade when we get a chance.

Thanks again for your help -- in fact, looks like you wrote the shiny new stats collector we'll be upgrading to, so thank you for having fixed our issue for us (if all goes well)!

Michael Klishin

unread,
Jul 27, 2016, 3:27:29 AM7/27/16
to rabbitm...@googlegroups.com
We've already mentioned messages being routed to multiple queues plus redeliveries. Those
cover the difference in nearly every case I can remember.

Are priority queues used?

michael....@sprinklr.com

unread,
Jul 27, 2016, 2:41:45 PM7/27/16
to rabbitmq-users
Hmm, I'll do more looking into what multiple queues means for publishing counts. I'm still not following but I've clarified enough times that its clearly something I'm missing. We are pushing messages into multiple queues, I still just don't understand how that applies, so that's an area I will need to improve my knowledge even outside the issue of this context.

Yes, it is using priority queues. I think most of our messages currently fall into one of two priorities, but I know for a fact that we have it enabled.

In terms of how we "know" its a reporting issue, if you are interested:

we have a queue (we'll call it C) that always except for rare circumstances publishes to our fanout exchange. That is the only publish workflow we have set up for these queues with strange reporting (we'll call them A and B). Queue C is on a different rabbitmq server....for various reasons. Queue C is not showing this strange behavior. We think the other server is being bogged down by the amount of things to track (lots of consumers, channels, and connections).

So here's the smoking gun: Queue C's ack rate tracks almost perfectly with Queue A's ack rates (and B as well but no picture provided). 


We therefore think the publish rate is underreporting according to the details described in that github issue we linked earlier: when there are lots of entities to track (we have over 1000 consumers, 1300 channels, 120 ish connections, 9 queues, and 13 exchanges) then the stats collector gets overwhelmed "intentionally drops stats". In support of the theory that our stats collector is overwhelmed, at times it reports data a few minutes behind real time on our system. Moreover, this strange reporting started happening after we added 320 channels. Essentially, we think the stats it is dropping is the published count and that the ack/delivery counts are accurate.
Reply all
Reply to author
Forward
0 new messages