How to reopen/recreate a channel after the crash in Java?

Alexandr Porunov

unread,

Oct 9, 2016, 1:58:09 PM10/9/16

to rabbitmq-users

Hello,

I noticed that my channel get stuck when the node to which the channel had been created is crashed.

Situation.

I have 3 nodes

node1

node2

node3

Node1 holds the master queue and node2 and node3 hold the slave queues. Channel is created to the node1. When I shutdown the node1 my application gets stuck. I want to configure my application to reconnect to another node, recreate a channel and continue to work normaly.

Here is my code:

ConnectionFactory connectionFactory = new ConnectionFactory();

connectionFactory.setUsername("admin");

connectionFactory.setPassword("adminpass");

connectionFactory.setRequestedHeartbeat(60);

connectionFactory.setAutomaticRecoveryEnabled(true);

connectionFactory.setTopologyRecoveryEnabled(true);

connectionFactory.setNetworkRecoveryInterval(5000);

Address[] addrArr = new Address[]{ new Address("192.168.0.77", 5672)

, new Address("192.168.0.78", 5672), new Address("192.168.0.79", 5672)};

ExecutorService es = Executors.newFixedThreadPool(20);

conn = connectionFactory.newConnection(es, addrArr);

Channel ch = conn.createChannel();

ch.queueDeclare(QUEUE_NAME, true, false, false, null);

ch.addConfirmListener(new RabbitMQConfirmListener(unconfirmedMessagesMap));

ch.confirmSelect();

try {

for(int i=0;i<100000;i++){

ch.basicPublish("", QUEUE_NAME,

MessageProperties.PERSISTENT_BASIC,

task.getBody());

}

} catch (Exception e) {

System.out.print(e);

try {

AMQP.Basic.RecoverOk recoverOk = ch.basicRecover();

} catch (IOException e1) {

LOG.error("CHANNEL RECOVER ERROR!");

e1.printStackTrace();

}

I start my application and it starts to publish messages. After that I shutdown node1 and my application stops forever. I don't see any errors so, it doesn't throw any exceptions. How to handle this situation? How to be able to recreate a channel? My new master successfuly changes but application can not continue to work.

Sincerely,

Alexandr

Michael Klishin

unread,

Oct 9, 2016, 2:08:04 PM10/9/16

to rabbitm...@googlegroups.com

With automatic connection recovery you are not supposed to do that after *connection* recovery. After a channel exception, just open a new channel.

Closed channels immediately throw an exception once you attempt to use them but connection loss can take a while to detect: TCP is built on the idea of waiting and retries.

See http://rabbitmq.com/heartbeat.html.

Lastly — and I believe this is mentioned in the docs —

the Java client does not attempt to enqueue outgoing publishes internally while the connection is down. It's your application's responsibility to decide how to keep them around and whether they should be redelivered. I believe Spring AMQP offers a few options in this area.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexandr Porunov

unread,

Oct 9, 2016, 2:46:42 PM10/9/16

to rabbitmq-users

But it doesn't throw any exceptions and re-connection doesn't work too.

ch.basicPublish("", QUEUE_NAME,

MessageProperties.PERSISTENT_BASIC,

task.getBody());

It blocks and wait infinitely (I spend 20 minutes and nothing changed).

Do you know why it can happens?

Alexandr Porunov

unread,

Oct 9, 2016, 3:07:29 PM10/9/16

to rabbitmq-users

It throws an exception only if I manually turn on a broken node and after that it reconnects to a new node. Why it it happens? Why it can't reconnect to the new node after the failure?

Michael Klishin

unread,

Oct 9, 2016, 4:51:50 PM10/9/16

to rabbitm...@googlegroups.com

The client does not know or care if the node is the same or not as long as the vhost exists, the credentials and permissions are correct.

This sounds like a resource alarm being in effect so no publishes get through l. However, recent releases of the server will drop blocked connections as soon as a heartbeat timeout happens and the client will do the same thing.

So there really has to be something funky going on for no exception to be observed when a heartbeat timeout occurs. And from there the connection recovery logic is really unsophisticated and trivial to reason about.

See server logs and do a network capture with Wireshark for possible clues.

On 9 Oct 2016, at 22:07, Alexandr Porunov <alexandr...@gmail.com> wrote:

It throws an exception only if I manually turn on a broken node and after that it reconnects to a new node. Why it it happens? Why it can't reconnect to the new node after the failure?

--

Reply all

Reply to author

Forward