Two tcp connections between two actorsystems

86 views
Skip to first unread message

jie tang

unread,
Nov 25, 2015, 12:56:07 PM11/25/15
to Akka User List
Hi,
   Thanks in advance.

   I have three server. Two of them are websocket servers. The last server is a business server. They use Akka Remote Module to communicate.
   Two websocket servers listen to web socket connections. When one websocket server gets a web socket connection , it creates a ClientActor for the websocket connection and the ClientActor sends a ConnectionOpened message to the only MasterActor in the business server via ActorSelection. The MasterActor creates a BusinessActor who returns back a ConnectionRegistered message to the sender of the ConnectionOpened message. Now the ClientActor has the ActorRef of the BusinessActor and watches it. When the BusinessActor dies(for example the business server restarts), the corresponding ClientActor will close websocket connection and kill itself. The websocket client app may retry websocket connection after a period.

    Today we restarted the business server to upgrade it. But something strange happened.
    After the business server started, Websocket server A has a tcp connection with the business server. But Websocket server B has two tcp connections with the business server. When websocket server B accepted a web socket connection, it created a CilentActor. The new ClientActor sent a ConnectionOpen Message to the business server , but it didn't receive ConnectionRegistered message.


 I opened akka debug log in business server and websocket servers.

The business servers log showed it sent ConnectionRegistered message back:
2015-11-25 23:27:05,091;DEBUG;EndpointWriter;(businessserver-akka.actor.default-dispatcher-17);received local message RemoteMessage: [ActorSelectionMessage(ConnectionOpened [device=[eId=7290]],Vector(user, master),false)] to [Actor[akka://businessserver/]]<+[akka://businessserver/] from [Actor[akka.tcp://websocket-act...@10.162.209.21:2553/user/websocket_actor_supervisor/did:7290#-1266191974]()]
2015-11-25 23:27:05,102;DEBUG;EndpointWriter;(businessserver-akka.actor.default-dispatcher-4);sending message RemoteMessage: [com.ainemo.protocol.websocket.ConnectionRegistered@4be4c643] to [Actor[akka.tcp://websocket-act...@10.162.209.21:2553/user/websocket_actor_supervisor/did:7290#-1266191974]]<+[akka.tcp://websocket-...@10.162.209.21:2553/user/websocket_actor_supervisor/did:7290] from [Actor[akka://businessserver/user/master/$fj#-1775937842]]

But the websocket server B's akka log doesn't say that it received ConnectionRegistered message:
2015-11-25 23:27:05,090;DEBUG;EndpointWriter;(websocket-actors-akka.actor.default-dispatcher-30);sending message RemoteMessage: [ActorSelectionMessage(ConnectionOpened [device=[eId=7290]]],Vector(user, autoserver),false)] to [Actor[akka.tcp://businessserve...@10.170.187.126:2554/]]<+[akka.tcp:/businessserve@10.170.187.126:2554/] from [Actor[akka://websocket-actors/user/websocket_actor_supervisor/did:7290#-1266191974]]


But everything is ok for websocket server A. There was only a tcp connection between it and businessserver. And it received ConnectionRegistered messages.


What's wrong with websocket server B? Is it ok for two tcp connections between websocket server B and business server? What I should do to avoid it?


The netstat's output:
[root@dev-sig-server ~]# netstat -anp | grep 2554
tcp        0      0 10.170.187.126:2554         0.0.0.0:*                                 LISTEN                             4119/java
tcp        0      0 10.170.187.126:2554         10.162.209.21:32833         ESTABLISHED                4119/java
tcp        0      0 10.170.187.126:2554         10.162.198.161:33326       ESTABLISHED               4119/java
tcp        0      0 10.170.187.126:2554         10.162.209.21:32843         ESTABLISHED                 4119/java


We use akka-remote_2.10-2.3.12.jar

business server's application.conf:
akka {
    actor {
        provider = "akka.remote.RemoteActorRefProvider"
    }
    remote {
         log-received-messages = on
         log-sent-messages = on

         enabled-transports = ["akka.remote.netty.tcp"]

         netty.tcp {
            hostname = 10.170.187.126
            port = 2554
         }
    }

    loggers = ["akka.event.slf4j.Slf4jLogger"]

    loglevel = "DEBUG"
}
The websocket server's application.conf is similar.

Akka Team

unread,
Nov 26, 2015, 5:46:18 AM11/26/15
to Akka User List
Hi,



But everything is ok for websocket server A. There was only a tcp connection between it and businessserver. And it received ConnectionRegistered messages.


What's wrong with websocket server B? Is it ok for two tcp connections between websocket server B and business server? What I should do to avoid it?

It is normal to have two TCP connections between Actor Systems. By default an ActorSystem tries to reuse incoming connections for outgoing messages. However, in cases where the two systems open the connections concurrently (for example you restart one of them, while the other tries to connect to it, so when it comes up they both try to connect) there might be two such connections alive. While there is a theoretical opportunity for the two systems to reconcile and agree on closing one of them, it is not a simple thing to do safely in practice so we didn't bother so far.

You must also be prepared for temporary message losses between reconnects since Akka does not guarantee delivery and does not buffer messages forever. See http://doc.akka.io/docs/akka/2.4.0/scala/remoting.html#Lifecycle_and_Failure_Recovery_Model for details.

-Endre
 


The netstat's output:
[root@dev-sig-server ~]# netstat -anp | grep 2554
tcp        0      0 10.170.187.126:2554         0.0.0.0:*                                 LISTEN                             4119/java
tcp        0      0 10.170.187.126:2554         10.162.209.21:32833         ESTABLISHED                4119/java
tcp        0      0 10.170.187.126:2554         10.162.198.161:33326       ESTABLISHED               4119/java
tcp        0      0 10.170.187.126:2554         10.162.209.21:32843         ESTABLISHED                 4119/java


We use akka-remote_2.10-2.3.12.jar

business server's application.conf:
akka {
    actor {
        provider = "akka.remote.RemoteActorRefProvider"
    }
    remote {
         log-received-messages = on
         log-sent-messages = on

         enabled-transports = ["akka.remote.netty.tcp"]

         netty.tcp {
            hostname = 10.170.187.126
            port = 2554
         }
    }

    loggers = ["akka.event.slf4j.Slf4jLogger"]

    loglevel = "DEBUG"
}
The websocket server's application.conf is similar.

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--
Akka Team
Typesafe - Reactive apps on the JVM
Blog: letitcrash.com
Twitter: @akkateam

jie tang

unread,
Nov 26, 2015, 7:15:34 AM11/26/15
to akka...@googlegroups.com
Thanks for your reply.

But the two tcp connections were both initiated by websocket server A. Is that valid?
And it seemed that websocket server A could never receive ConnectionRegistered messages after business server restarted. I waited for 10 minutes. At last I have to restart websocket server A.

Akka Team

unread,
Nov 26, 2015, 7:17:05 AM11/26/15
to Akka User List
Hi,

On Thu, Nov 26, 2015 at 1:13 PM, jie tang <crybi...@gmail.com> wrote:
Thanks for your reply.

But the two tcp connections were both initiated by websocket server A. Is that valid?

No, *that* is not valid. It might happen for a short transient period, but not for long. This is likely a bug. Can you reproduce this reliably?

-Endre

jie tang

unread,
Nov 26, 2015, 7:31:41 AM11/26/15
to akka...@googlegroups.com
Highly possible but not always.

A typical flow:
1 A websocket client connects to websocket server A
2 The websocket server A creates a WebsocketActor for the client. The WebsocketActor sends a ConnectionOpened message to businessserver via ActorSelection.
3 If the WebsocketActor can not receive the ConnectionRegistered message in 10 seconds , it kills it itself and closes websocket connection between the websocket client.
4 Now the websocket client may repeat step 1 immediately.

There are thousands of websocket clients connecting to websocket server A.  So when the business server restarts, the websocket server A may sends hundreds of ConnectionOpened messages via ActorSelection at the same time. There may be a race condition?

Akka Team

unread,
Nov 26, 2015, 8:06:28 AM11/26/15
to Akka User List
Hi,

Can you try the settings
  akka.remote.transport-failure-detector.heartbeat-interval = 1 s
  akka.remote.transport-failure-detector.acceptable-hearbeat-pause = 3 s

and see if the race still happens? This seems to be related to a test failure that we get from time to time but so far has been not able to reliably reproduce. If the frequency of the problem with the above settings reduces then there is a good chance that what you see is the same what we looked at before.

-Endre

jie tang

unread,
Nov 26, 2015, 11:32:04 AM11/26/15
to akka...@googlegroups.com
Thanks.

I tried your settings.

  akka.remote.transport-failure-detector.heartbeat-interval = 1 s
  akka.remote.transport-failure-detector.acceptable-hearbeat-pause = 3 s
It seems that the frequence of the problem reduces.

Can I use the following workaround ?
  One single Actor----STATE----in websocket server watches a actor in business server. When STATE receives a Terminated message, it starts to send ping messages to the business server periodly via ActorSelection until it receives a response. When a websocket client connects to the websocket server, it asks the single actor whether it can send ConnectionOpened message to the business server. If the business server is unreachable, it closes the websocket connection and retries a later time.
   Does this approach avoid the possible race condition?
Reply all
Reply to author
Forward
0 new messages