RabbitMQ transactions issues on server restarts

90 views
Skip to first unread message

stm...@gmail.com

unread,
Nov 15, 2018, 2:28:35 AM11/15/18
to rabbitmq-users
Hello,

We recently performed RabbitMQ reliability test in order to verify if it can guarantee 0 messages loss.
Results show that if the broker is restarted in the process of producing/consuming we have an issue with messages during transactions.

Testing environment:
AWS

Hardware: 
| Broker            | m5.large   | 2vCPU / 8GiB | 
| Producer/Consumer | t3.medium | 2vCPU / 2GiB | 

Software:
OS: CentOS Linux release 7.5.1804
Erlang: 21.1.1
Broker: RabbitMQ 3.7.8
Client: JmsTools

The result and project with all required data for issue reproducing may be found on GitHub: https://github.com/veaceslavdoina/messages-brokers-testing

Is there any configuration from the default ones which may permit us to solve the issue?

Thank you!

Slava.

Luke Bakken

unread,
Nov 15, 2018, 5:13:29 AM11/15/18
to rabbitmq-users
Hello,

I appreciate you taking the time to provide a way to reproduce your findings. Could you please describe, in detail, what the test suite does? I get a general feeling for it based on the results table but it's not clear what "Produced with roll-back", "Consumed with roll-back", etc really mean in terms of publishing and consuming with transactions.

Thanks,
Luke

Michael Klishin

unread,
Nov 15, 2018, 7:41:10 AM11/15/18
to rabbitm...@googlegroups.com
Without having any specifics of what the test does, one obvious thing to keep in mind would be that when RabbitMQ is restarted
the publishers are going to run into exceptions that they are expected to handle.

Publisher confirms also must be used [1] for reliability.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

stm...@gmail.com

unread,
Nov 15, 2018, 7:58:10 AM11/15/18
to rabbitmq-users
Luke,

Some information about what exactly jmstools suite do you may find in the ActiveMQ similar discussion posted by the suite author: http://activemq.2283324.n4.nabble.com/ActiveMQ-and-Artemis-reliability-Messages-lost-tp4744881p4745438.html

Just a copy/paste:
----
The tool simulates an application, so it ignores whatever is in ActiveMQ/Artemis. So will a real application. If a message is lost to the 
tool it is also lost to the real application and then it is lost.

All sent messages are stamped with unique ids. The log analyzer verifies that all sent messages are received. That is better than using counters. One 
duplicate and one lost message will even out with a counter, but will be detected with unique ids.

The tool does handle disconnects and reconnects in the same way as a real application and the errors are logged. There are some places where they are 
race conditions that are not detected by the log analyzer. One would need to read the tool's detailed logs and investigate, possibly also in the broker 
logs. However, with very few exceptions it is just as stable (or more) as a real JMS application would be. If it detects an issue a real application 
would also have an issue.
----

Produced with roll-back - the total number of produced messages, including ones which were produced and then rolled-back by the producer.


Consumed with roll-back - the total number of consumed messages, including ones which were consumed and then rolled-back by the consumer.


lost message is, well, gone. Sent but not received. It may be truly lost, it may have been moved to a dead letter queue or the id property may have been lost. Without the id property the message becomes an alien.


duplicate message has been delivered and (more importantly) committed by the consumer at least twice.


In doubt messages are messages that may have been committed or rolled back or that may be stuck in prepared state. With normal transactions this state is used when the tool has tried to commit but failed. If the failure is due to a broker crash or network failure it is not clear if the problem happened before or after the message was committed on the server. For two phase transactions there are a wide range of errors that leave messages in doubt. In all cases in doubt transactions require more analysis.

ghost message has been sent, rolled back, but was delivered anyway. For explicit rollbacks (using the rollback option) that is very serious as it means that transactions are broken. There may be rollbacks caused by lost connections or other technical issues where a message has been committed but where the tool assumed it had been rolled back. Normally those messages should be listed as in doubt, but they may end up here.


An alien message is a message without the properties set by the producer. In a correctness test where the consumer is used with the producer and with the id property set, there should be none. For a performance test where messages are generated without the id property they are normal.


An undead message has never been sent, but yet it is not alien. Most likely a log file from another test has been imported, but it is also possible that a message got through without being logged when a producer was killed. That may be correct, or it may be problematic. If there are unexplained undead messages present, keep testing and find out why.


Thank you!

Slava.

Luke Bakken

unread,
Nov 15, 2018, 10:08:11 AM11/15/18
to rabbitmq-users
Hi Slava,

Thanks for pointing out that message thread.

Since the JmsTools application uses the JMS protocol and the RabbitMQ JMS client (link), it requires that the JMS plugin is enabled in RabbitMQ (link). However, I don't see that plugin being enabled in your project so I'm a bit puzzled in how it's working.

The message thread brings up several good points, one of which is that to begin investigating this issue we would have to first eliminate the JmsTools application as the source of the "ghost messages". Before we go that direction, I would like to ask if you are planning on using JMS and transactions in your application? The recommendation of the RabbitMQ team is to use AMQP and other mechanisms than transactions to guarantee reliability (publisher/consumer confirms, mirroring, etc).

Thanks -
Luke

stm...@gmail.com

unread,
Nov 19, 2018, 1:56:39 AM11/19/18
to rabbitmq-users
Hello Luke,

Thank you for observation! Now JMS plugin is enabled during RabbitMQ installation.
I did one more test and got similar results: (https://github.com/veaceslavdoina/messages-brokers-testing/blob/master/RESULTS.md#test-rabbitmq-378-standalone-20181117-173201), but now we see "In doubt" messages, as the JmsTools started to log such type of messages.

In doubt messages are messages that may have been committed or rolled back or that may be stuck in a prepared state. With normal transactions, this state is used when the tool has tried to commit but failed. If the failure is due to a broker crash or network failure it is not clear if the problem happened before or after the message was committed on the server. For two-phase transactions there are a wide range of errors that leave messages in doubt. In all cases in doubt transactions require more analysis.

Yes, we are a planning to use JMS with the transaction, this is why we started to test our current queue system and looking around to an alternative.

About JmsTools. Yes, we may assume that it may have issues. In the case of ActiveMQ as I remember we noticed messages lost using PerfHarness from IBM. JmsTools just show this in the more suitable mode as it is dedicated for such type of tests.

May you please advice which tools may we use in order to perform such tests.


Thank you!

Slava.
Reply all
Reply to author
Forward
0 new messages