All,
I am facing a very confusing issue between two systems, and i came to a conclusion that may be someone could confirm (or not).
I am running Akka 2.5.3 in Java, the context is a CI build running integration tests.
I have 2 systems: A (=QA system) and B (=CI system). B knows A through dns and B uses IP address for identification.
The flow is : A is started, then B tries to resolve an actor through ActorSelection to A. When A and B are running on the same host, all runs fine.
The issue is when A and B are on different hosts. A is behind a firewall with only the relevant port opened. I am not 100% sure of the firewall settings on host B.
On the first run, all is fine and ActorSelection returns the ActorRef, but on the second run ActorSelection fails in timeout and I get the below exception on host A, while no other exception is visible in host B.
What I am thinking is that when B is restarted, A considers it as a "reconnection" differently then during first run "connection" and might follow different path. Even if the System internal Id is different.
In this "reconnection" use case, it looks like there might be a connection attempt A -> B while in the first "connection" only B -> A happens.
This would explain why when running on the same host all runs fine. If my thinking is correct, then i would have to investigate the firewall rules on host B, which is beyond my control and i would need this confirmation before going further.
[DEBUG] [2017-07-19 17:22:16,317] [17:22:16.317UTC] [SystemA] [akka.tcp://SystemA@hostA:2550/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FSystemB%40172.17.0.3%3A2557-3/endpointWriter] [SystemA-akka.remote.default-remote-dispatcher-90] [] akka.remote.EndpointWriter - AssociationError [akka.tcp://SystemA@hostA:2550] -> [akka.tcp://
Sys...@172.17.0.3:2557]: Error [Association failed with [akka.tcp://
Sys...@172.17.0.3:2557]] [
Caused by: java.util.concurrent.TimeoutException: No response from remote for outbound association. Associate timed out after [15000 ms].
at akka.remote.transport.ProtocolStateActor$$anonfun$2.applyOrElse(AkkaProtocolTransport.scala:366)
at akka.remote.transport.ProtocolStateActor$$anonfun$2.applyOrElse(AkkaProtocolTransport.scala:340)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
at akka.actor.FSM.processEvent(FSM.scala:663)
at akka.actor.FSM.processEvent$(FSM.scala:660)
at akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:285)
at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:657)
at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:629)
at akka.actor.Actor.aroundReceive(Actor.scala:513)
at akka.actor.Actor.aroundReceive$(Actor.scala:511)
at akka.remote.transport.ProtocolStateActor.aroundReceive(AkkaProtocolTransport.scala:285)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527)
at akka.actor.ActorCell.invoke(ActorCell.scala:496)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
]
[INFO ] [2017-07-19 17:22:16,318] [17:22:16.317UTC] [SystemA] [akka.tcp://SystemA@hostA:2550/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FSystemB%40172.17.0.3%3A2557-4] [SystemA-akka.remote.default-remote-dispatcher-75] [] akka.remote.transport.ProtocolStateActor - No response from remote for outbound association. Associate timed out after [15000 ms].
[WARN ] [2017-07-19 17:22:16,318] [17:22:16.318UTC] [SystemA] [akka.tcp://SystemA@hostA:2550/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FSystemB%40172.17.0.3%3A2557-3] [SystemA-akka.remote.default-remote-dispatcher-90] [] akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://
Sys...@172.17.0.3:2557] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://
Sys...@172.17.0.3:2557]] Caused by: [No response from remote for outbound association. Associate timed out after [15000 ms].]
[DEBUG] [2017-07-19 17:22:16,318] [17:22:16.318UTC] [SystemA] [akka://SystemA/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FSystemB%40172.17.0.3%3A2557-4] [SystemA-akka.remote.default-remote-dispatcher-75] [] akka.remote.transport.ProtocolStateActor - stopped
Any help or experience would be hugely appreciated
Thanks !
Olivier