Cluster unreachable and a lot of cluster connections

1,564 views
Skip to first unread message

Johannes Berg

unread,
Dec 9, 2014, 9:29:52 AM12/9/14
to akka...@googlegroups.com
Hi! I'm doing some load tests in our system and getting problems that some of my nodes are marked as unreachable even though the processes are up. I'm seeing it going a few times from reachable to unreachable and back a few times before staying unreachable saying connection gated for 5000ms and staying silently that way.

Looking at the connections made to one of the seed nodes I see that I have several hundreds of connections from other nodes except the failing ones. Is this normal? There are several (hundreds) just between two nodes. When are connections formed between cluster nodes and when are they taken down?

Also is there some limit on how many connections a node with default settings will accept?

We have auto-down-unreachable-after = 10s set in our config, does this mean if the node is busy and doesn't respond in 10 seconds it becomes unreachable?

Is there any reason why it would stay unreachable and not re-try to join the cluster?

We are using Akka 2.3.6 and using cluster aware routers quite much with a lot of remote messages going around.

Anyone that can shed some light on this or that can point me at some documentation about these things?

Björn Antonsson

unread,
Dec 11, 2014, 4:22:41 AM12/11/14
to akka...@googlegroups.com
Hi Johannes,

On 9 December 2014 at 15:29:53, Johannes Berg (jber...@gmail.com) wrote:

Hi! I'm doing some load tests in our system and getting problems that some of my nodes are marked as unreachable even though the processes are up. I'm seeing it going a few times from reachable to unreachable and back a few times before staying unreachable saying connection gated for 5000ms and staying silently that way.

Looking at the connections made to one of the seed nodes I see that I have several hundreds of connections from other nodes except the failing ones. Is this normal? There are several (hundreds) just between two nodes. When are connections formed between cluster nodes and when are they taken down?


Several hundred connections between two nodes seems very wrong. There should only be one connection between two nodes that communicate over akka remoting or are part of a cluster. How many nodes do you have in your cluster?

If you are using cluster aware routers then there should be one connection between the router node and the rooutee nodes (can be the same connection that is used for the cluster communication).

The connections between the nodes don't get torn down, they stay open, but they are reused for all remoting communication between the nodes.

Also is there some limit on how many connections a node with default settings will accept?

We have auto-down-unreachable-after = 10s set in our config, does this mean if the node is busy and doesn't respond in 10 seconds it becomes unreachable?

Is there any reason why it would stay unreachable and not re-try to join the cluster?


The auto down, setting is actually just what it says. I the node is considered unreachable for 10 seconds, it will be moved to DOWN and won't be able to come back into the cluster. The different states of the cluster and the settings are explained in the documentation.


If you are having problems with nodes becoming unreachable then you could check if you are doing one of these things:
1) sending to large blobs as messages, that effectively block out the heart beats going over the same connection
2) having long GC pauses that trigger the failure detector since nodes don't reply to heartbeats

B/

We are using Akka 2.3.6 and using cluster aware routers quite much with a lot of remote messages going around.

Anyone that can shed some light on this or that can point me at some documentation about these things?
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

-- 
Björn Antonsson
Typesafe – Reactive Apps on the JVM
twitter: @bantonsson

Johannes Berg

unread,
Jan 21, 2015, 4:29:26 AM1/21/15
to akka...@googlegroups.com
Many connections seem to be formed in the case when the node has been marked down for unreachability even though it's still alive and it tries to connect back into the cluster. The removed node prints:

"Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted."

It doesn't seem to close the connections properly even though it opens new ones continously.

Anyway that's a separate issue that I'm not that concerned about right now, I've now realized I don't want to use automatic downing instead I would like to allow nodes to go unreachable and come back to reachable even if it takes quite some time and manually stopping the process and downing the node in case of an actual crash.

Consequently I've put

auto-down-unreachable-after = off

in the config. Now I have the problem that nodes still are removed, this is from the leader node log:

08:50:14.087UTC INFO [system-akka.actor.default-dispatcher-4] Cluster(akka://system) - Cluster Node [akka.tcp://system@ip1:port1] - Leader is removing unreachable node [akka.tcp://system@ip2:port2]

I can understand my node is marked unreachable beause it's under heavy load but I don't understand what could cause it to be removed. I'm not doing any manual downing and have the auto-down to off, what else could trigger the removal?

Using the akka-cluster script I can see that the node has most other nodes marked as unreachable (including the leader) and that it has another leader than other nodes.

My test system consists of 8 nodes.

About the unreachability I'm not having long GC pauses and not sending large blobs, but I'm sending very many smaller messages as fast as I can. If I just hammer it fast enough it will end up unreachable which I can except, but I need to get it back to reachable.

Endre Varga

unread,
Jan 21, 2015, 4:31:02 AM1/21/15
to akka...@googlegroups.com
Hi Johannes,

We just released 2.3.9 with important bugfixes. I recommend to update and see if the problem is still persisting.

-Endre

Johannes Berg

unread,
Jan 21, 2015, 10:47:14 AM1/21/15
to akka...@googlegroups.com
Upgrading to 2.3.9 does indeed seem to solve my problem. At least I haven't experienced them yet.

Now I'm curious what the fixes were, is there somewhere a change summary between versions or where is it listed what bugs have been fixed in which versions?

Viktor Klang

unread,
Jan 21, 2015, 10:51:54 AM1/21/15
to Akka User List
Cheers,

Endre Varga

unread,
Jan 21, 2015, 10:53:06 AM1/21/15
to akka...@googlegroups.com
Hi Johannes,


The tickets cross reference the PRs, too, so you can look at the code changes. The issue that probably hit you is https://github.com/akka/akka/issues/16623 which manifested as system message delivery errors on some systems, but actually was caused by accidentally duplicated internal actors (a regression).

-Endre

On Wed, Jan 21, 2015 at 4:47 PM, Johannes Berg <jber...@gmail.com> wrote:

Johannes Berg

unread,
Jan 22, 2015, 5:45:24 AM1/22/15
to akka...@googlegroups.com
Okay, I increased the load further and now I see the same problem again. It seems to just have gotten a bit better in that it doesn't happen as fast, but with enough load it happens.

To re-iterate, I have Akka 2.3.9 on all (8) nodes and auto-down-unreachable-after = off on all nodes and I don't do any manual downing anywhere, still the leader log prints this:

2015-01-22 10:35:37 +0000 - [INFO] - from Cluster(akka://system) in system-akka.actor.default-dispatcher-2
Cluster Node [akka.tcp://system@ip1:port1] - Leader is removing unreachable node [akka.tcp://system@ip2:port2]

and the node(s) under load is(are) removed from the cluster (quarantined). How is this possible?

Viktor Klang

unread,
Jan 22, 2015, 8:56:09 AM1/22/15
to Akka User List
Endre, could it be due to pending-to-send system message overflow?
Cheers,

Endre Varga

unread,
Jan 22, 2015, 9:03:15 AM1/22/15
to akka...@googlegroups.com
Without detailed logs I cannot say. If there would be a system message buffer overflow then it would cry loudly in the logs. Also it says that an unreachable node is being removed, so there should be events happening before unreachability. This might be something completely else. The full config would also help, but a reproducer would be the best.

-Endre

Patrik Nordwall

unread,
Jan 22, 2015, 9:05:52 AM1/22/15
to akka...@googlegroups.com
If it's quarantined it will be removed from cluster. Please include the log entry that says that it is quarantined, if any.

/Patrik

Johannes Berg

unread,
Jan 22, 2015, 9:41:49 AM1/22/15
to akka...@googlegroups.com
Thanks for the tip for what to look for, my logs are huge so it's a bit of a jungle. Anyway I found this:

10:34:23.701UTC ERROR[system-akka.actor.default-dispatcher-2] Remoting - Association to [akka.tcp://system@ip2:port2] with UID [-1637388952] irrecoverably failed. Quarantining address.
akka.remote.ResendBufferCapacityReachedException: Resend buffer capacity of [1000] has been reached.
    at akka.remote.AckedSendBuffer.buffer(AckedDelivery.scala:121) ~[akka-remote_2.11-2.3.9.jar:na]
    at akka.remote.ReliableDeliverySupervisor.akka$remote$ReliableDeliverySupervisor$$tryBuffer(Endpoint.scala:388) ~[akka-remote_2.11-2.3.9.jar:na]
    at akka.remote.ReliableDeliverySupervisor.akka$remote$ReliableDeliverySupervisor$$handleSend(Endpoint.scala:372) ~[akka-remote_2.11-2.3.9.jar:na]
    at akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:279) ~[akka-remote_2.11-2.3.9.jar:na]
    at akka.actor.Actor$class.aroundReceive(Actor.scala:465) ~[akka-actor_2.11-2.3.9.jar:na]
    at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:188) ~[akka-remote_2.11-2.3.9.jar:na]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) ~[akka-actor_2.11-2.3.9.jar:na]
    at akka.actor.ActorCell.invoke(ActorCell.scala:487) ~[akka-actor_2.11-2.3.9.jar:na]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) ~[akka-actor_2.11-2.3.9.jar:na]
    at akka.dispatch.Mailbox.run(Mailbox.scala:221) ~[akka-actor_2.11-2.3.9.jar:na]
    at akka.dispatch.Mailbox.exec(Mailbox.scala:231) ~[akka-actor_2.11-2.3.9.jar:na]
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) ~[scala-library-2.11.5.jar:na]
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) ~[scala-library-2.11.5.jar:na]
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) ~[scala-library-2.11.5.jar:na]
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.11.5.jar:na]
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.11.5.jar:na]

Where ip2 and port2 is the same as in my previous post, and this happened on a node (ip3:port3) which also had high load.

After this the ip2:port2 node started to print:
10:34:24.234UTC WARN [system-akka.actor.default-dispatcher-2] Remoting - Tried to associate with unreachable remote address [akka.tcp://system@ip3:port3]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.

On ip3:port3 I also later see:
10:34:25.180UTC ERROR[system-akka.actor.default-dispatcher-2] Remoting - Association to [akka.tcp://system@ip2:port2] with UID [-1637388952] irrecoverably failed. Quarantining address.
java.lang.IllegalStateException: Error encountered while processing system message acknowledgement buffer: [3 {0, 1, 2, 3}] ack: ACK[2114, {}]
    at akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:287) ~[akka-remote_2.11-2.3.9.jar:na]
    at akka.actor.Actor$class.aroundReceive(Actor.scala:465) ~[akka-actor_2.11-2.3.9.jar:na]
    at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:188) ~[akka-remote_2.11-2.3.9.jar:na]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) ~[akka-actor_2.11-2.3.9.jar:na]
    at akka.actor.ActorCell.invoke(ActorCell.scala:487) ~[akka-actor_2.11-2.3.9.jar:na]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) ~[akka-actor_2.11-2.3.9.jar:na]
    at akka.dispatch.Mailbox.run(Mailbox.scala:221) ~[akka-actor_2.11-2.3.9.jar:na]
    at akka.dispatch.Mailbox.exec(Mailbox.scala:231) ~[akka-actor_2.11-2.3.9.jar:na]
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) ~[scala-library-2.11.5.jar:na]
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) ~[scala-library-2.11.5.jar:na]
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) ~[scala-library-2.11.5.jar:na]
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.11.5.jar:na]
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.11.5.jar:na]
Caused by: java.lang.IllegalArgumentException: Highest SEQ so far was 3 but cumulative ACK is 2114
    at akka.remote.AckedSendBuffer.acknowledge(AckedDelivery.scala:103) ~[akka-remote_2.11-2.3.9.jar:na]
    at akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:283) ~[akka-remote_2.11-2.3.9.jar:na]
    ... 12 common frames omitted


Maybe this explains something? What should I do about this?

Patrik Nordwall

unread,
Jan 22, 2015, 10:32:33 AM1/22/15
to akka...@googlegroups.com
You can try to increase akka.remote.system-message-buffer-size config setting. Default is 1000.
/Patrik

Patrik Nordwall
Typesafe Reactive apps on the JVM
Twitter: @patriknw

Johannes Berg

unread,
Jan 22, 2015, 10:53:49 AM1/22/15
to akka...@googlegroups.com

I will try that but it seems that will only help to a certain point and when I push the load further it will hit it again.

I hit this within a minute after I put on the load which is a bit annoying to me. I'm fine with it becoming unreachable as long as I can get it back to reachable when it has crunched through the load. Will it still buffer up system messages even though it's unreachable? At what rate are system messages typically sent?

As it is now it's easy to take down the system before you have any chance of spinning up new nodes.

Johannes Berg

unread,
Jan 22, 2015, 11:00:33 AM1/22/15
to akka...@googlegroups.com
Also I've made the failure detection a bit less sensitive since it very quickly went unreachable before:

akka.cluster.failure-detector {
  threshold = 16.0
  acceptable-heartbeat-pause = 6s
  min-std-deviation = 200 ms
  expected-response-after = 9 s
}

Does this have any impact on how much space I need to reserve in the system message resend buffer? (related to the question if system messages are buffered up while it's unreachable)

Endre Varga

unread,
Jan 22, 2015, 11:31:01 AM1/22/15
to akka...@googlegroups.com
Hi Johannes,

On Thu, Jan 22, 2015 at 4:53 PM, Johannes Berg <jber...@gmail.com> wrote:

I will try that but it seems that will only help to a certain point and when I push the load further it will hit it again.

There is no system message traffic between two Akka systems by default, to have a system send system messages to another you either need to use remote deployment or deathwatch on remote actors. Which one are you using? What is the scenario?

The main issue is that whatever is the rate of system message delivery we *cannot* backpressure remote deployment or how many watched remote actors die. For any delivery buffer there is a large enough "mass actor extinction event" that will just fill it up. You can increase the buffer size though up to that point where you expect that a burst maximum is present (for example you know the peak number of remote watched actors and the peak rate of them dying).
 

I hit this within a minute after I put on the load which is a bit annoying to me. I'm fine with it becoming unreachable as long as I can get it back to reachable when it has crunched through the load.

That means a higher buffer size. If there is no sysmsg buffer size that can absorb your load then you have to rethink your remote deployment/watch strategy (whichever feature you use).
 
Will it still buffer up system messages even though it's unreachable?

After quarantine there is no system message delivery, everything is dropped. There is no recovery from quarantine that is its purpose. If there is any lost system message between two systems (and here they are dropped due to the buffer being full) then they are in an undefined state, especially with remote deployment, so they quarantine each other. 
 
At what rate are system messages typically sent?

They are sent at the rate you are remote deploying or watching actors remotely or at the rate remote watched actors die. On the wire it depends, and user messages share the same TCP connection with the system messages which might also reduce available throughput. 

You can tune the dispather of remoting by adding more threads to it, you might also increase the netty threadpool: http://doc.akka.io/docs/akka/2.3.9/general/configuration.html#akka-remote

You might want to set the system-message-ack-piggyback-timeout setting to a lower value, like 100ms.
 
-Endre

Johannes Berg

unread,
Jan 23, 2015, 2:39:17 AM1/23/15
to akka...@googlegroups.com
Thanks for the answers, this really explains a lot. I will go back to my abyss and rethink some things. See below some answers/comments.


On Thursday, January 22, 2015 at 6:31:01 PM UTC+2, drewhk wrote:
Hi Johannes,

On Thu, Jan 22, 2015 at 4:53 PM, Johannes Berg <jber...@gmail.com> wrote:

I will try that but it seems that will only help to a certain point and when I push the load further it will hit it again.

There is no system message traffic between two Akka systems by default, to have a system send system messages to another you either need to use remote deployment or deathwatch on remote actors. Which one are you using? What is the scenario?

I do use deathwatch on remote actors and the amount of deatchwatches I have is linear to the load I put on the system so that explains increased number of system messages based on load then I guess.
 

The main issue is that whatever is the rate of system message delivery we *cannot* backpressure remote deployment or how many watched remote actors die. For any delivery buffer there is a large enough "mass actor extinction event" that will just fill it up. You can increase the buffer size though up to that point where you expect that a burst maximum is present (for example you know the peak number of remote watched actors and the peak rate of them dying).

Thinking about these features more closely I can see that these things may require acked delivery but I would have expected something that grows indefinately until outofmem like unbounded inboxes. It's not apparent from the documentation about deathwatching that you need to consider some buffer size if you are watching very many actors that may be created or die at a very fast rate (maybe a note about this could be added to the docs?). A quick glance at the feature you don't expect it to be limited by anything else than normal actor message sending and receiving. Furthermore I wouldn't have expected a buffer overflow due to deathwatching would cause a node to get quarantined and removed from the cluster, instead I would expect some deatchwatching to fail to work correctly. Causing the node to go down in case of a buffer overflow seems a bit dangerous considering ddos attacks even though it maybe makes the system behave more consistently.
 
 

I hit this within a minute after I put on the load which is a bit annoying to me. I'm fine with it becoming unreachable as long as I can get it back to reachable when it has crunched through the load.

That means a higher buffer size. If there is no sysmsg buffer size that can absorb your load then you have to rethink your remote deployment/watch strategy (whichever feature you use).

Now that I know what's causing the increasing rate of system messages I certainly will rethink my deatchwatch stategy and/or limiting the load based on the configured buffer size.
 
 
Will it still buffer up system messages even though it's unreachable?

After quarantine there is no system message delivery, everything is dropped. There is no recovery from quarantine that is its purpose. If there is any lost system message between two systems (and here they are dropped due to the buffer being full) then they are in an undefined state, especially with remote deployment, so they quarantine each other. 

After quarantine I understand there's no system message delivery, but when it's just unreachable it buffers them up, right? I think there should be a note about this in the documentation about deathwatching and remote deployment what buffer may need tweaking and what can happen if the buffer is overflown.
 

Caoyuan

unread,
Jan 23, 2015, 4:12:33 AM1/23/15
to akka...@googlegroups.com
As per our experience on spray-socketio project, too many remote actor watching will cause the cluster quarantined very quickly. 

The default heartbeat interval for remote watching is:

akka.remote {

   watch-failure-detector {

     heartbeat-interval = 1 s

     threshold = 10.0

     acceptable-heartbeat-pause = 10 s

     unreachable-nodes-reaper-interval = 1s

     expected-response-after = 3 s

   }

}


i.e, 1 second. Thus, when "I do use deathwatch on remote actors and the amount of deatchwatches I have is linear to the load", the amount of heartbeats that is sent per seconds could be mass.

You can try to increase the 'heartbeat-interval' to 10, 20, 30, or even 300... seconds to see if it can resolve your problem. But remember that 'akka.remote.heartbeat-interval' is a globe setting, so the better way is to write a custom heartbeat based remote death watching.

/Caoyuan

Patrik Nordwall

unread,
Jan 23, 2015, 6:06:44 AM1/23/15
to akka...@googlegroups.com
On Fri, Jan 23, 2015 at 10:12 AM, Caoyuan <dcao...@gmail.com> wrote:
As per our experience on spray-socketio project, too many remote actor watching will cause the cluster quarantined very quickly. 

The default heartbeat interval for remote watching is:

akka.remote {

   watch-failure-detector {

     heartbeat-interval = 1 s

     threshold = 10.0

     acceptable-heartbeat-pause = 10 s

     unreachable-nodes-reaper-interval = 1s

     expected-response-after = 3 s

   }

}


i.e, 1 second. Thus, when "I do use deathwatch on remote actors and the amount of deatchwatches I have is linear to the load", the amount of heartbeats that is sent per seconds could be mass.

No, the number of heartbeat messages per seconds are influenced by how many actors you watch. The heartbeats are for monitoring the connection between two nodes.

Also, note that akka.remote.watch-failure-detector is not used between nodes in the same cluster. That is akka.cluster.failure-detector

Johannes Berg

unread,
Jan 23, 2015, 6:34:26 AM1/23/15
to akka...@googlegroups.com
Did you forget a NOT there? Did you mean "No, the number of heartbeat messages per seconds are NOT influenced by how many actors you watch."?

Increasing akka.remote.system-message-buffer-size to 10000 did solve the problem for the load I'm pushing at the system now.
...

Patrik Nordwall

unread,
Jan 23, 2015, 7:23:41 AM1/23/15
to akka...@googlegroups.com
On Fri, Jan 23, 2015 at 12:34 PM, Johannes Berg <jber...@gmail.com> wrote:
Did you forget a NOT there? Did you mean "No, the number of heartbeat messages per seconds are NOT influenced by how many actors you watch."?

Indeed, thanks!



--

Caoyuan

unread,
Jan 23, 2015, 8:09:18 AM1/23/15
to akka...@googlegroups.com
On Fri, Jan 23, 2015 at 7:06 PM, Patrik Nordwall <patrik....@gmail.com> wrote:
On Fri, Jan 23, 2015 at 10:12 AM, Caoyuan <dcao...@gmail.com> wrote:
As per our experience on spray-socketio project, too many remote actor watching will cause the cluster quarantined very quickly. 

The default heartbeat interval for remote watching is:

akka.remote {

   watch-failure-detector {

     heartbeat-interval = 1 s

     threshold = 10.0

     acceptable-heartbeat-pause = 10 s

     unreachable-nodes-reaper-interval = 1s

     expected-response-after = 3 s

   }

}


i.e, 1 second. Thus, when "I do use deathwatch on remote actors and the amount of deatchwatches I have is linear to the load", the amount of heartbeats that is sent per seconds could be mass.

No, the number of heartbeat messages per seconds are ***NOT*** influenced by how many actors you watch. The heartbeats are for monitoring the connection between two nodes.

I looked the code again, Patrik is right, the heartbeats are sending between nodes only. I'll reconsider the cause that we encountered before, to see what happened when too many actors are remote watched.

Thanks for the pointing out of my mistake .

Patrik Nordwall

unread,
Jan 23, 2015, 8:10:50 AM1/23/15
to akka...@googlegroups.com
On Fri, Jan 23, 2015 at 2:09 PM, Caoyuan <dcao...@gmail.com> wrote:


On Fri, Jan 23, 2015 at 7:06 PM, Patrik Nordwall <patrik....@gmail.com> wrote:


On Fri, Jan 23, 2015 at 10:12 AM, Caoyuan <dcao...@gmail.com> wrote:
As per our experience on spray-socketio project, too many remote actor watching will cause the cluster quarantined very quickly. 

The default heartbeat interval for remote watching is:

akka.remote {

   watch-failure-detector {

     heartbeat-interval = 1 s

     threshold = 10.0

     acceptable-heartbeat-pause = 10 s

     unreachable-nodes-reaper-interval = 1s

     expected-response-after = 3 s

   }

}


i.e, 1 second. Thus, when "I do use deathwatch on remote actors and the amount of deatchwatches I have is linear to the load", the amount of heartbeats that is sent per seconds could be mass.

No, the number of heartbeat messages per seconds are ***NOT*** influenced by how many actors you watch. The heartbeats are for monitoring the connection between two nodes.

I looked the code again, Patrik is right, the heartbeats are sending between nodes only. I'll reconsider the cause that we encountered before, to see what happened when too many actors are remote watched.

Thanks for the pointing out of my mistake .

no worries, thanks for sharing your observations
/Patrik

Patrik Nordwall

unread,
Jan 23, 2015, 8:15:03 AM1/23/15
to akka...@googlegroups.com
Johannes, I think you have some very good points regarding the documentation. Would you mind creating an issue?

Thanks,
Patrik

Roland Kuhn

unread,
Jan 23, 2015, 4:25:59 PM1/23/15
to akka-user
23 jan 2015 kl. 08:39 skrev Johannes Berg <jber...@gmail.com>:

Thanks for the answers, this really explains a lot. I will go back to my abyss and rethink some things. See below some answers/comments.

On Thursday, January 22, 2015 at 6:31:01 PM UTC+2, drewhk wrote:
Hi Johannes,

On Thu, Jan 22, 2015 at 4:53 PM, Johannes Berg <jber...@gmail.com> wrote:

I will try that but it seems that will only help to a certain point and when I push the load further it will hit it again.

There is no system message traffic between two Akka systems by default, to have a system send system messages to another you either need to use remote deployment or deathwatch on remote actors. Which one are you using? What is the scenario?

I do use deathwatch on remote actors and the amount of deatchwatches I have is linear to the load I put on the system so that explains increased number of system messages based on load then I guess.
 

The main issue is that whatever is the rate of system message delivery we *cannot* backpressure remote deployment or how many watched remote actors die. For any delivery buffer there is a large enough "mass actor extinction event" that will just fill it up. You can increase the buffer size though up to that point where you expect that a burst maximum is present (for example you know the peak number of remote watched actors and the peak rate of them dying).

Thinking about these features more closely I can see that these things may require acked delivery but I would have expected something that grows indefinately until outofmem like unbounded inboxes. It's not apparent from the documentation about deathwatching that you need to consider some buffer size if you are watching very many actors that may be created or die at a very fast rate (maybe a note about this could be added to the docs?). A quick glance at the feature you don't expect it to be limited by anything else than normal actor message sending and receiving. Furthermore I wouldn't have expected a buffer overflow due to deathwatching would cause a node to get quarantined and removed from the cluster, instead I would expect some deatchwatching to fail to work correctly.

This is a tradeoff that we cannot make in Akka: the core semantics must work under all conditions, with clear failure paths. Dropping a Terminated message can kill your application on a logical level (in the sense of forever waiting for it) and resending manually is not possible, so we cannot possibly drop it. The good thing is that we can fabricate this particular message when there is need—like a failed remote node—and therefore we do that. But once that has been done, the Actor that we said just Terminated must stay dead, it cannot come back (because that could also break your program). This is the reason for the quarantine. Now the only missing piece is that in a distributed system the only reliable measure available for assessing the health of remote nodes is their ability to respond to messages, and if one stops for long enough then we must draw the appropriate conclusions and perform the consequences.

Causing the node to go down in case of a buffer overflow seems a bit dangerous considering ddos attacks even though it maybe makes the system behave more consistently.

There is no “ddos” here, your app is effectively killing itself by overloading the network. This is something that Akka cannot protect you from.

On a final note, Akka weakens or removes consistency in many places in order to achieve scalability and resilience, but the choice of what can be sacrificed is not arbitrary, certain aspects cannot be inconsistent because that would make programming unreasonable (in all senses of the word).

Regards,

Roland



Dr. Roland Kuhn
Akka Tech Lead
Typesafe – Reactive apps on the JVM.
twitter: @rolandkuhn


Johannes Berg

unread,
Jan 26, 2015, 2:02:56 AM1/26/15
to akka...@googlegroups.com
No problem, I've created an issue in regards to the documentation here:
https://github.com/akka/akka/issues/16717
...

Endre Varga

unread,
Jan 26, 2015, 2:35:51 AM1/26/15
to akka...@googlegroups.com
Hi Johannes,

On Fri, Jan 23, 2015 at 8:39 AM, Johannes Berg <jber...@gmail.com> wrote:
Thanks for the answers, this really explains a lot. I will go back to my abyss and rethink some things. See below some answers/comments.

On Thursday, January 22, 2015 at 6:31:01 PM UTC+2, drewhk wrote:
Hi Johannes,

On Thu, Jan 22, 2015 at 4:53 PM, Johannes Berg <jber...@gmail.com> wrote:

I will try that but it seems that will only help to a certain point and when I push the load further it will hit it again.

There is no system message traffic between two Akka systems by default, to have a system send system messages to another you either need to use remote deployment or deathwatch on remote actors. Which one are you using? What is the scenario?

I do use deathwatch on remote actors and the amount of deatchwatches I have is linear to the load I put on the system so that explains increased number of system messages based on load then I guess.
 

The main issue is that whatever is the rate of system message delivery we *cannot* backpressure remote deployment or how many watched remote actors die. For any delivery buffer there is a large enough "mass actor extinction event" that will just fill it up. You can increase the buffer size though up to that point where you expect that a burst maximum is present (for example you know the peak number of remote watched actors and the peak rate of them dying).

Thinking about these features more closely I can see that these things may require acked delivery but I would have expected something that grows indefinately until outofmem like unbounded inboxes.

Just configure it to a very large value and it will behave exactly like an unbounded mbox, since it will OOME before it hits the limit. Since the buffer is not preallocated you effectively disable the limit by making it very large.

-Endre

Johannes Berg

unread,
Jan 26, 2015, 4:39:41 AM1/26/15
to akka...@googlegroups.com
Yes, I can see the Deathwatching probably need to be this way for the general case, but in my particular case I would be fine with relaxed consistency in this (in my app logic it's not the end of the days if the deathwatch fails to trigger) and certainly preferrably to loosing a node when it's already under heavy load. I imagine in other cases it's very much the opposite.

Due to how we implemented a certain feature using deathwatches we unknowingly created a serious weakness that fairly easily can kill nodes through our public entry-point. Of course we need to take other measurements to protect ourselves against ddos attacks and limit public entries when overloaded but under no (or very few exceptional) circumstances do we want a node be removed from the cluster due to heavy load. It's fine that the system start rejecting public entries when all nodes are under heavy load but the nodes should come back to availability when they've crunched through the load and we want to do that without interrupting the work it's doing (without restarting the JVM/actor system). I expect the load I can accept on a node is limited by the available CPU, RAM and network throughput and I will set the public entry limit based on that. Ultimately due to bugs, wrongly set entry limits and other failures, nodes can actually hang and need to be restarted but that can't happen too frequently. That's my goal anyway.

Anyway, a note in the documentation about deathwatches would have helped us making better decisions and luckily I found this issue in our load tests and not later in the field.

Regards,
Johannes
...

Caoyuan

unread,
Jan 26, 2015, 4:52:10 AM1/26/15
to akka...@googlegroups.com
On Fri, Jan 23, 2015 at 9:09 PM, Caoyuan <dcao...@gmail.com> wrote:


On Fri, Jan 23, 2015 at 7:06 PM, Patrik Nordwall <patrik....@gmail.com> wrote:


On Fri, Jan 23, 2015 at 10:12 AM, Caoyuan <dcao...@gmail.com> wrote:
As per our experience on spray-socketio project, too many remote actor watching will cause the cluster quarantined very quickly. 

The default heartbeat interval for remote watching is:

akka.remote {

   watch-failure-detector {

     heartbeat-interval = 1 s

     threshold = 10.0

     acceptable-heartbeat-pause = 10 s

     unreachable-nodes-reaper-interval = 1s

     expected-response-after = 3 s

   }

}


i.e, 1 second. Thus, when "I do use deathwatch on remote actors and the amount of deatchwatches I have is linear to the load", the amount of heartbeats that is sent per seconds could be mass.

No, the number of heartbeat messages per seconds are ***NOT*** influenced by how many actors you watch. The heartbeats are for monitoring the connection between two nodes.

I looked the code again, Patrik is right, the heartbeats are sending between nodes only. I'll reconsider the cause that we encountered before, to see what happened when too many actors are remote watched.

I reconsidered the cause we encountered in the case of our spray-socketio service, where, we tried to monitor each coming connection remotely, and there were 10k+ connections that were disconnected per second, thus there were  10k+ DeathWatchNotification system messages sent across nodes, which then delayed the cluster heartbeats.

Zhuchen Wang

unread,
Apr 29, 2015, 3:02:09 PM4/29/15
to akka...@googlegroups.com
Hi Endre,

I didn't understand "There is no system message traffic between two Akka systems by default, to have a system send system messages to another you either need to use remote deployment or deathwatch on remote actors" very well.

Does remote deployment means create actors on any remote systems?

Let's say if 2 ActorSystems creates same actor hierarchy locally with the same code and actor1 and actor2 in both systems have the same path. In this case, actor1 and actor2 can talk to each other using actorSelection by changing only the Address field.

I wonder in this case, will any system messages be sent to keep the connections between actor1 and actor2 and will Quarantine happen in this scenario? We assume that there is no remote death watch.

BTW, we are using akka 2.3.9

Thanks,

Zhuchen

Akka Team

unread,
May 4, 2015, 5:42:03 AM5/4/15
to Akka User List
Hi,

On Wed, Apr 29, 2015 at 7:34 PM, Zhuchen Wang <zcx....@gmail.com> wrote:
Hi Endre,

I didn't understand "There is no system message traffic between two Akka systems by default, to have a system send system messages to another you either need to use remote deployment or deathwatch on remote actors" very well.

Does remote deployment means create actors on any remote systems?

 

Let's say if 2 ActorSystems creates same actor hierarchy locally with the same code and actor1 and actor2 in both systems have the same path. In this case, actor1 and actor2 can talk to each other using actorSelection by changing only the Address field.

In this case there is no remote deployment involved. The two systems created actors on their own. The remote deployment scenario would mean that system1 *creates* an actor on system2 and it *supervises* that remote actor. This supervision channel needs system messages.

In general, I don't really recommend remote deployment nowadays, but opinions might differ. But considering that you asked about it, you probably don't use it :)
 

I wonder in this case, will any system messages be sent to keep the connections between actor1 and actor2 and will Quarantine happen in this scenario? We assume that there is no remote death watch.

No, there won't be any. There are two relations that need system messages:
 - supervision, which only exists between remote systems if remote deployment is used, but not if ordinary local actors are created on any system (they always belong to the local supervision hierarchy, system messages are local)
 - watch, which needs system messages and heartbeating between remote systems. If you never watch an actor reference from a remote system, there won't be any remote watch going on.

-Endre



--
Akka Team
Typesafe - Reactive apps on the JVM
Blog: letitcrash.com
Twitter: @akkateam
Reply all
Reply to author
Forward
0 new messages