[Akka 2.5.x][Remoting] - Recovering from guaranteed nodes

14 views
Skip to first unread message

gutzeit

unread,
May 17, 2018, 5:26:07 AM5/17/18
to Akka User List
Dear list members,

My company runs a large scale deployment (hundreds of JVMs) based on Akka, deployed in different regions globally, while some of the services are communicating using Akka Remoting (TCP, not artery).
As it goes, global cloud deployments suffer from occasional disconnections between different regions, total disconnections or severe packet loss. We expect things to be shaky while network disruption happens, but we also expect everything to go back to normal, when storm passes.

Observing the logs we see many instances of the following:

Tried to associate with unreachable remote address [akka.tcp://systemName b...@192.168.236.12:2558]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.]

AssociationError [akka.tcp://com-company-...@192.168.222.36:2558] -> [akka.tcp://syste...@192.168.236.11:2558]: Error [Invalid address: akka.tcp://M systemName@192.168.236.11:2558] [ akka.remote.InvalidAssociation: Invalid address: akka.tcp://systemName@192.168.236.11:2558 Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. ]

Reading Akka Remoting documentation, those errors mean that the two remote actor system in question would never be able to communicate with each other, unless the systems are restarted.

What is a proper expected way of recovering from those situations? It does not sound logical to me that I need to restart all nodes of the system every time network disconnection occurs, what am I missing here?

Thanks in advance for your replies.

Regards,
Dima Gutzeit

Konrad “ktoso” Malawski

unread,
May 17, 2018, 5:52:33 AM5/17/18
to akka...@googlegroups.com, gutzeit
Could I ask you to move the question to discuss.akka.io
Thanks!

-- 
Cheers,
Konrad 'ktoso' Malawski
--
*****************************************************************************************************
** New discussion forum: https://discuss.akka.io/ replacing akka-user google-group soon.
** This group will soon be put into read-only mode, and replaced by discuss.akka.io
** More details: https://akka.io/blog/news/2018/03/13/discuss.akka.io-announced
*****************************************************************************************************
>>>>>>>>>>
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

gutzeit

unread,
May 17, 2018, 5:57:42 AM5/17/18
to Akka User List
Done, thank you.


On Thursday, May 17, 2018 at 5:52:33 PM UTC+8, Konrad Malawski wrote:
Could I ask you to move the question to discuss.akka.io
Thanks!

-- 
Cheers,
Konrad 'ktoso' Malawski

On May 17, 2018 at 18:26:11, gutzeit (gut...@gmail.com) wrote:

Dear list members,

My company runs a large scale deployment (hundreds of JVMs) based on Akka, deployed in different regions globally, while some of the services are communicating using Akka Remoting (TCP, not artery).
As it goes, global cloud deployments suffer from occasional disconnections between different regions, total disconnections or severe packet loss. We expect things to be shaky while network disruption happens, but we also expect everything to go back to normal, when storm passes.

Observing the logs we see many instances of the following:

Tried to associate with unreachable remote address [akka.tcp://systemName b@192.168.236.12:2558]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.]

AssociationError [akka.tcp://com-company-resour...@192.168.222.36:2558] -> [akka.tcp://systemName@192.168.236.11:2558]: Error [Invalid address: akka.tcp://M systemName@192.168.236.11:2558] [ akka.remote.InvalidAssociation: Invalid address: akka.tcp://systemName@192.168.236.11:2558 Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. ]

Reading Akka Remoting documentation, those errors mean that the two remote actor system in question would never be able to communicate with each other, unless the systems are restarted.

What is a proper expected way of recovering from those situations? It does not sound logical to me that I need to restart all nodes of the system every time network disconnection occurs, what am I missing here?

Thanks in advance for your replies.

Regards,
Dima Gutzeit

Reply all
Reply to author
Forward
0 new messages