2.2.3 Noisy shutdown w/exception

tigerfoot

unread,

Nov 25, 2013, 9:06:10 PM11/25/13

to akka...@googlegroups.com

Hello,

I've got two Akka ActorSystems (using Cluster for test cases). Config below. I know--cluster is way overkill here, but I'm testing the mechanism for a much larger use.

My test looks like this:

val s1 = Server1() // initiates an ActorSystem inside the Server1 class

// Establish a second ActorSystem to simulate another remote client on the network sending messages to the first.

val host = s1.context.myHostname // my local IP address (not localhost)

val sys2 = ActorSystem("foobar", ConfigFactory.parseString(ServerConfigs.svr2Cfg) )

val selection = sys2.actorSelection( s"""akka.tcp://MyCluster@$host:9021/user/awe.server.ServerModule""" )

selection ! ServerEvent( "acme", "hello", None )

// Simulate doing work

Thread.sleep(500)

// Shut things down cleanly

s1.shutdown

s1.system.awaitTermination

sys2.shutdown

sys2.awaitTermination

assert( s1.system.isTerminated )

assert( sys2.isTerminated )

Now this functions just fine, but I'm getting the noisy exception at the bottom of my post. Any idea what's causing that?

I saw this post here and wondered if this is the same thing: https://www.assembla.com/spaces/akka/tickets/3096#/activity/ticket:

Thanks for any ideas!

Greg

Config:

val svr1Cfg = s"""

akka {

loglevel = "ERROR"

stdout-loglevel = "ERROR"

loggers = ["akka.event.slf4j.Slf4jLogger"]

actor {

provider = "akka.cluster.ClusterActorRefProvider"

}

remote {

enabled-transports = ["akka.remote.netty.tcp"]

netty.tcp {

port = 9021

}

cluster {

seed-nodes = [ "akka.tcp://MyCluster@$myHost:9021" ]

auto-down = on

log-info = off

}

"""

val svr2Cfg = s"""

akka {

loglevel = "ERROR"

stdout-loglevel = "ERROR"

loggers = ["akka.event.slf4j.Slf4jLogger"]

actor {

provider = "akka.cluster.ClusterActorRefProvider"

}

remote {

enabled-transports = ["akka.remote.netty.tcp"]

netty.tcp {

port = 9022

}

cluster {

seed-nodes = [ "akka.tcp://MyCluster@$myHost:9021" ]

auto-down = on

log-info = off

}

"""

Nov 25, 2013 8:00:59 PM org.jboss.netty.channel.DefaultChannelPipeline

WARNING: An exception was thrown by an exception handler.

java.util.concurrent.RejectedExecutionException: Worker has already been shutdown

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:115)

at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:73)

at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)

at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:57)

at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)

at org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)

at org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)

at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)

at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)

at org.jboss.netty.channel.Channels.disconnect(Channels.java:781)

at org.jboss.netty.channel.AbstractChannel.disconnect(AbstractChannel.java:211)

at akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:222)

at akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:221)

at scala.util.Success.foreach(Try.scala:205)

at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:204)

at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)

at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)

at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)

at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)

at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)

at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)

at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)

at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)

at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Björn Antonsson

unread,

Nov 26, 2013, 2:07:02 AM11/26/13

to akka...@googlegroups.com

Hi Greg,

On Tuesday, 26 November 2013 at 03:06, tigerfoot wrote:

Hello,

I've got two Akka ActorSystems (using Cluster for test cases). Config below. I know--cluster is way overkill here, but I'm testing the mechanism for a much larger use.

My test looks like this:

val s1 = Server1() // initiates an ActorSystem inside the Server1 class

// Establish a second ActorSystem to simulate another remote client on the network sending messages to the first.
val host = s1.context.myHostname // my local IP address (not localhost)
val sys2 = ActorSystem("foobar", ConfigFactory.parseString(ServerConfigs.svr2Cfg) )
val selection = sys2.actorSelection( s"""akka.tcp://MyCluster@$host:9021/user/awe.server.ServerModule""" )
selection ! ServerEvent( "acme", "hello", None )

// Simulate doing work
Thread.sleep(500)

// Shut things down cleanly
s1.shutdown
s1.system.awaitTermination
sys2.shutdown
sys2.awaitTermination
assert( s1.system.isTerminated )
assert( sys2.isTerminated )

Now this functions just fine, but I'm getting the noisy exception at the bottom of my post. Any idea what's causing that?
I saw this post here and wondered if this is the same thing: https://www.assembla.com/spaces/akka/tickets/3096#/activity/ticket:

Thanks for looking at the tickets. You are right that your exception log is the one in the ticket. It’s not an error in your application. It’s a known issue and it looks bad in the logs.

B/

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.

--

Björn Antonsson

Typesafe – Reactive Apps on the JVM

twitter: @bantonsson

Robert Metzger

unread,

Mar 16, 2015, 5:09:21 PM3/16/15

to akka...@googlegroups.com

Hi,

sorry for reviving this very old email thread.

Are there any updates or workarounds for this issue?

The bug in the tracker has been marked as "not reproducible", but we're seeing the error quite often (I would guess in 10% of our automated tests).

Patrik Nordwall

unread,

Mar 23, 2015, 4:26:33 AM3/23/15

to akka...@googlegroups.com

Please open a new issue at github if this can be reproduced with Akka 2.3.9.

Thanks,

Patrik

>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html

>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.

For more options, visit https://groups.google.com/d/optout.

--

Patrik Nordwall
Typesafe - Reactive apps on the JVM
Twitter: @patriknw

Robert Metzger

unread,

Mar 24, 2015, 4:02:20 PM3/24/15

to akka...@googlegroups.com

Hi,

We are currently using Akka 2.3.7. I'm going to set the version to 2.3.9.

I think I've found a way to work around the issue:

It seems that netty is using "java.util.logging" by default, our project is using slfj4. Therefore, I added the following code to let Netty use Slf4j:

InternalLoggerFactory.setDefaultFactory(new Slf4JLoggerFactory)

After that, I added the following entry to our log4j properties:

log4j.logger.org.jboss.netty.channel.DefaultChannelPipeline=ERROR, file
With these two changes, our users won't see the exceptions anymore during shutdown.
My initial tests indicate that this resolved the issue.

If the issue persists with this workaround and on akka 2.3.9, I'll file a bug.

You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/fhKE6aLdxSs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Robert Metzger

unread,

Mar 30, 2015, 10:51:58 AM3/30/15

to akka...@googlegroups.com

A quick follow-up question: I've upgraded Akka from 2.3.7 to 2.3.9. I've noticed that failed remote machines are detected much later in 2.3.9 than In 2.3.7. Akka detected failed machines in less than 5 seconds with 2.3.7. With 2.3.9 it took much more time, in the example below almost 2 minutes.

I haven't investigated the issue closer. Maybe this is also caused by our system.

Did anything with respect to failure detection change between the two releases?

When using Flink on YARN, there are actually two systems monitoring the JVMs: Akka and YARN. From the timestamps in the log, one can easily see that the time until a failed JVM is detected is much longer with 2.3.9

With Akka 2.3.9:

15:07:56,922 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:39280] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].

yarn detects failure --> 15:07:58,130 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Container container_1426807853451_0035_01_000002 is completed with diagnostics: Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Killed by external signal

15:08:04,163 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:39280] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://fl...@130.149.21.2:39280]].

15:08:14,143 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:39280] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://fl...@130.149.21.2:39280]].

15:08:24,149 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:39280] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://fl...@130.149.21.2:39280]].

15:08:34,138 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:39280] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://fl...@130.149.21.2:39280]].

15:08:44,154 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:39280] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://fl...@130.149.21.2:39280]].

15:08:54,158 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:39280] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://fl...@130.149.21.2:39280]].

15:09:04,146 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:39280] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://fl...@130.149.21.2:39280]].

15:09:14,150 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:39280] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://fl...@130.149.21.2:39280]].

....

15:09:44,165 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:39280] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://fl...@130.149.21.2:39280]].
15:09:54,144 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://fl...@130.149.21.2:39280]
akka sends Terminated --> 15:09:54,154 INFO org.apache.flink.runtime.instance.InstanceManager - Unregistered task manager akka.tcp://fl...@130.149.21.2:39280. Number of registered task managers 15. Number of available slots 15

Difference between YARN and Akka: 2 Minutes.

With Akka 2.3.7:

16:47:23,859 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://fl...@130.149.21.2:45854] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
16:47:26,337 INFO org.apache.flink.yarn.ApplicationMaster$anonfun$2$anon$1 - Container container_1426807853451_0038_01_000005 is completed with diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

16:47:37,786 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://fl...@130.149.21.2:45854]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /130.149.21.2:45854
16:47:37,795 INFO org.apache.flink.runtime.instance.InstanceManager - Unregistered task manager akka.tcp://fl...@130.149.21.2:45854. Number of registered task managers 10. Number of available slots 10.

Difference between YARN and Akka: 11 seconds.

Endre Varga

unread,

Mar 30, 2015, 11:03:12 AM3/30/15

to akka...@googlegroups.com

Hi Robert

What is your watch failure detector setting? Detection speed depends on those. There was a bug in earlier remoting that published internal AddressTerminated messages when it was not supposed to (remoting does not consider unreachable machines as dead, that decision is taken by remote DeathWatch or clustering).

-Endre

Robert Metzger

unread,

Mar 30, 2015, 1:14:15 PM3/30/15

to akka...@googlegroups.com

Thanks for the quick reply.

This is the setting:

watch-failure-detector{
   heartbeat-interval = 10 s
   acceptable-heartbeat-pause = 100 s
   threshold = 12
}

Akka Team

unread,

Apr 7, 2015, 6:05:54 AM4/7/15

to Akka User List

Hi Robert,

On Mon, Mar 30, 2015 at 7:13 PM, Robert Metzger <metr...@gmail.com> wrote:

Thanks for the quick reply.
This is the setting:
watch-failure-detector{
   heartbeat-interval = 10 s
   acceptable-heartbeat-pause = 100 s

The above setting means that you accept 100s of silence from a remote host, therefore no DeathWatch events will be generated earlier than 100 seconds. This explains why you get late notifications. The pre 2.3.9 scenario "worked" accidentally, now it really works and respects this setting properly.

-Endre

--

Akka Team

Typesafe - Reactive apps on the JVM

Blog: letitcrash.com
Twitter: @akkateam

Robert Metzger

unread,

Apr 10, 2015, 8:30:00 AM4/10/15

to akka...@googlegroups.com

Hi,

thank you for answering the question.

We'll update our configuration and move to akka 2.3.9.

Reply all

Reply to author

Forward