Remote actor woes

172 views
Skip to first unread message

Linbin Chang

unread,
Nov 19, 2013, 8:56:48 AM11/19/13
to akka...@googlegroups.com
Ive using akka in a fairly large project and is doing fine thus far except one thing: remote actors.

The way I am using remote actor might be a little unorthodox but I did find some recommendation on this mailing list before basing my communication on it but I am paying the price of not really vet the design before basing my project on it.

I used the remote actor in a client and server setting (over WAN no less) and that means a lot of unreliable network and frequent disconnections.  What I found out is that remote actors tried to do too much to reconnect back to its user that's make this approach almost unusable in practice. 

My setup is as follow:

1. From my client, I am attaching to a remote actor (belongs to server' actor system) on the server.
2. The client sends an ActorRef over to the server so the server can push message back to the client.
3. server side akka remote config:

    use-passive-connections = on

    retry-gate-closed-for = 0 s

    retry-window = 1 s

    maximum-retries-in-window = 1

    gate-invalid-addresses-for = 1 s

    quarantine-systems-for = 1 s

My problems are the following:

1. Why doesn't remoting honor the setting "use-passive-connections" no matter what?  When the client initiates the connection, everything is fine.  But once the client disconnects, all hell break loose.  The server will try to connect back to the client which was unexpected because the server sits behind a firewall and the admin raises quite an eyebrow when he see his server tries to connect to an IP and port outside of his network.  

2. After the retrying, which does not really last the said "maximum-retries-in-window" times, the remote actor enters a quarantine period that I have no idea how to turn it off or make it as short as possible.  I tried to make it really short, like 1 second and it works most of the time but once in a while, it does not work.

3. When it fails to work, that client ip and port combination is gone unless I restart the server (very unacceptable) or the client changes to a different port (restarting the client application leaves a very bad taste in my customer's mouth when I was telling him how reliable his future system is going be before the project starts).

4. Setting "maximum-retries-in-window" is not honored both in client and server.  It might retry 5, or 10 times before it gives up or it might retry forever.

So, is it possible to do the following:

1. Some way for "use-passive-connections = true" on the client side to mean "actively-reconnect = false" on the server side as well.  ie: server do not actively reconnect back to client.

2. Some way to turn off the so-called quarantine on remote actors and just terminate or send some message to actors on both end that the connection has failed.

3. No retrying, just tell the actor the connection has failed and be done with it.  The actor can handle the reconnecting itself.

I know I am asking a lot but I need to have the problems resolved pretty soon.  Any help will be greatly appreciated!!

Linbin Chang

unread,
Nov 19, 2013, 9:01:18 AM11/19/13
to akka...@googlegroups.com
BTW, I am using Scala 2.10.3 and Akka 2.2.3 on JDK 7.

Endre Sándor Varga

unread,
Nov 19, 2013, 9:54:56 AM11/19/13
to akka...@googlegroups.com, Linbin Chang
Hi Linbin,

>
> 1. Why doesn't remoting honor the setting "use-passive-connections" no
> matter what? When the client initiates the connection, everything is fine.
> But once the client disconnects, all hell break loose. The server will
> try to connect back to the client which was unexpected because the server
> sits behind a firewall and the admin raises quite an eyebrow when he see
> his server tries to connect to an IP and port outside of his network.

Akka remoting is not a client-server architecture but a p2p architecture. "use-passive-connections" does not prevent any node to try to reconnect to another one. The problem is that any ActorRef that is exposed to any system can be a target for sending from that system.

An example:

1. System S1 connects to system S2
2. Actor A on S1 sends message to actor B on S2
3. actor B receives the message, and *has the sender reference, A*
- this is the important step. At this point, for the actor ref A to be truly transparent, system S2 has to be able to connect to S1. Therefore, when it loses connection to S1, it tries to reconnect since it attempts to maintan connectivity to actorref A.

This is obviously not a client-server architecture, but the semantics of ActorRefs require this. If you want to design a pure client-server system you should use the Akka IO module which exposes traditional networking for you -- the tradeoff is that you lose any conveniences of ActorRefs, but you have full control over connection lifecycles and directions.

>
> 2. After the retrying, which does not really last the said
> "maximum-retries-in-window" times, the remote actor enters a quarantine
> period that I have no idea how to turn it off or make it as short as
> possible. I tried to make it really short, like 1 second and it works most
> of the time but once in a while, it does not work.

There is no quarantine period when the retry limit is reached (that would be the gate functionality). What happens is that after the limit is reached, all outbound buffers that were aggregated during retries for that endpoint are dropped. After this happened, no outbound connection happens unless you send another message for that address.

Example
1. System tries to send M1, M2, M3
2. M1 is sent
3. loss of connectivity
4. M2, M3 sitting in buffer while remoting retries
5. retry limit reached, M2, M3 are dropped
6. ... silence ...
7. M4 is sent to the same address, connection is attempted again
8. connection succeeds
9. M4 is delivered

>
> 3. When it fails to work, that client ip and port combination is gone
> unless I restart the server (very unacceptable) or the client changes to a
> different port (restarting the client application leaves a very bad taste
> in my customer's mouth when I was telling him how reliable his future
> system is going be before the project starts).

Can you reproduce this? Do you use DeathWatch on remote actors? Because what you described should only
happen (that restart is needed) when the system is quarantined (and we test this heavily, although bugs can happen) -- but that needs at least DeathWatch or remote deployed actors.

> 4. Setting "maximum-retries-in-window" is not honored both in client and
> server. It might retry 5, or 10 times before it gives up or it might retry
> forever.

This setting is related to the time window as well. You have the following settings:

retry-window = 1 s
maximum-retries-in-window = 1

This means that 1 retry per second is considered fine! Considering that for a connection to fail several seconds might be needed, this setting is probably not strong enough.

>
> So, is it possible to do the following:
>
> 1. Some way for "use-passive-connections = true" on the client side to mean
> "actively-reconnect = false" on the server side as well. ie: server do not
> actively reconnect back to client.

Unfortunately that would break all ActorRef semantics as I described above. Remoting should be not used for client-server systems but for inter-server, p2p communication. If you need more specialized functionality, you should use Akka IO.

>
> 2. Some way to turn off the so-called quarantine on remote actors and just
> terminate or send some message to actors on both end that the connection
> has failed.

Quarantining only happens between nodes when
- system messages has been lost and cannot be recovered. In this case the two systems cannot ever reliably communicate because they are in completely inconsistent state, one of the systems has to be restarted. This is a typical STONITH (Shoot The Other Node In The Head) like situation (although it does not kill the other system, just isolates it)
- Remote DeathWatch has been triggered by the failure detector. In this case all watchers are notified that the remote actors on the other system are dead. The other system now needs to be quarantined, since no actor is allowed to be visible after it sent the Terminated message. In other words, once a system is declared dead, it remains dead (from the viewpoint of the node who quarantines)

>
> 3. No retrying, just tell the actor the connection has failed and be done
> with it. The actor can handle the reconnecting itself.

Retry gate is one way to do that. If you set it for 10s, then the remoting system will do the following

1. Messages flow between S1 and S2
2. S1 loses connection to S2, and has configured gate with 10s
3. S1 marks the address of S2 as gated, and drops all outbound buffers
4. All messages destined to S2 are discarded, until
5.a S2 connects successfully to S1, proving that it is alive, and lifting the gate
5.b 10 seconds elapses, S2 is now ungated. S1 will not try to connect unless a new message is attempted to be sent to an actor on S2

>
> I know I am asking a lot but I need to have the problems resolved pretty
> soon. Any help will be greatly appreciated!!

Thanks for the feedback! I hope my answer will be somewhat heplful.

--
Endre Varga
Software Engineer
Typesafe - Reactive Apps on the JVM
twitter: drewhk

ahjohannessen

unread,
Nov 19, 2013, 5:54:38 PM11/19/13
to akka...@googlegroups.com
Perhaps using spray.io would make your problem easier to solve :)

Linbin Chang

unread,
Nov 19, 2013, 10:11:33 PM11/19/13
to akka...@googlegroups.com
Thanks for the response.  I knew I was using remote actor wrong when I was starting having lots of issues when network disconnection occurred.  No matter, I guess I will have to learn Akka IO now.


--
     Read the docs: http://akka.io/docs/
     Check the FAQ: http://akka.io/faq/
     Search the archives: https://groups.google.com/group/akka-user
---You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscribe@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.

Endre Sándor Varga

unread,
Nov 20, 2013, 5:49:26 AM11/20/13
to akka...@googlegroups.com, Linbin Chang
2013.11.20. 04:11:33 dátumon Linbin Chang <linbi...@gmail.com> írta:

> Thanks for the response. I knew I was using remote actor wrong when I was
> starting having lots of issues when network disconnection occurred. No
> matter, I guess I will have to learn Akka IO now.

I would not say you were completely wrong. Remoting can be confusing, which is something I really want to fix.

Happy hAkking with Akka IO!
>> email to akka-user+...@googlegroups.com.
>> To post to this group, send email to akka...@googlegroups.com.
>> Visit this group at http://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>


Reply all
Reply to author
Forward
0 new messages