akka.stream.StreamTcpException - Host is down

409 views
Skip to first unread message

Thomas Becker

unread,
Jan 7, 2018, 12:03:53 PM1/7/18
to Lagom Framework Users
Hi all,

I'm just getting into the lagom framework and doing my very first steps. I have two lagom applications now. One in scala and one in java whereas the java one is the more complex. 

I've rewritten the backend of one of the scala application which was derived from the hello world example. It worked fine and it's only service method was feed by a tick source which fetched some data from several other systems and provided them as an akka source to the service layer. A JavaScript application then reads the websocket stream.

I've rewritten the backend to run a scheduler which fetches the data now instead and pushes it into a cassandra backend. That's working fine. And then I want to combine a filtered set from the cassandra database with a "live" ticksource of the backend. However since that refactoring I can't even call the simplest backend service methods. 

override final def descriptor = {
  import Service._
// @formatter:off
named("home-integrator-lagom")
.withCalls(
pathCall("/api/homeData/:interval?from", homeData _),
pathCall("/api/pastHomeData", pastHomeData _),
pathCall("/api/hello", hello _)
)
.withAutoAcl(true)
// @formatter:on
}

Even the hello call with it's implementation:

override def hello = ServiceCall { _ =>
Future.successful("hello world")
}

doesn't work anymore and I have no clue why.

Here's a line from the console output of sbt runAll showing that the services listener is up and running and basically listening on all devices:

[info] Service home-integrator-impl listening for HTTP on 0:0:0:0:0:0:0:0:55976

And here's the error I get when I try to curl any of my registered backend url paths:

2018-01-07T16:48:45.800Z [error] akka.actor.ActorSystemImpl [sourceThread=application-akka.actor.default-dispatcher-2, akkaSource=akka.actor.ActorSystemImpl(application), sourceActorSystem=application, akkaTimestamp=16:48:45.797UTC] - Internal server error, sending 500 response
akka.stream.StreamTcpException: Tcp command [Connect(0.0.0.0:55976,None,List(),Some(10 seconds),true)] failed because of Host is down

The listener is there and working fine:

nc -vz localhost 55976
found 0 associations
found 1 connections:
     1:	flags=82<CONNECTED,PREFERRED>
	outif lo0
	src ::1 port 49780
	dst ::1 port 55976
	rank info not available
	TCP aux info available

Connection to localhost port 55976 [tcp/*] succeeded!

I've no idea what I have done to break it as I spent a while refactoring the backend to persist stuff without running the service or service unit tests (yeah, my fault). And I'm kinda stuck at this point. So any hint/help would be appreciated. And bear with me am totally new to lagom and evaluating it for some projects.

Cheers,
Thomas

Message has been deleted

Thomas Becker

unread,
Jan 7, 2018, 12:10:33 PM1/7/18
to Lagom Framework Users
I've just noticed that it works fine if I curl the service's port (55976) directly instead of 9000: curl -v http://localhost:55976/api/pastHomeData?from=1115339914

Tim Moore

unread,
Jan 10, 2018, 12:45:57 AM1/10/18
to Thomas Becker, Lagom Framework Users
Hi Thomas,

Can you confirm which version of Lagom you're using?

If it's 1.4, could you try switching to the netty-based service gateway implementation as described in https://www.lagomframework.com/documentation/1.4.x/scala/ServiceLocator.html#Default-gateway-implementation

My suspicion is that the problem is related to the fact that the service is advertising its hostname/IP address as 0.0.0.0. As a listening address, this means to listen on all devices, but it's not well defined what will happen when trying to connect to that address. It could be system or network-configuration dependent.

This is a long-standing bug in Lagom (https://github.com/lagom/lagom/issues/166) but we haven't previously seen any problems as serious as what you're experiencing so it hasn't gotten fixed yet.

Best,
Tim

On Mon, Jan 8, 2018 at 3:40 AM, Thomas Becker <thomas....@gmail.com> wrote:
I've just noticed that it works fine if I curl the service's port (55976) directly instead of 9000: curl -v http://localhost:55976/api/pastHomeData?from=1115339914

--
You received this message because you are subscribed to the Google Groups "Lagom Framework Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lagom-framework+unsubscribe@googlegroups.com.
To post to this group, send email to lagom-framework@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lagom-framework/7db01ad6-c9f8-46de-bf75-98475bf2d43c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Tim Moore
Lagom Tech Lead, Lightbend, Inc.

Thomas Becker

unread,
Jan 10, 2018, 7:27:34 AM1/10/18
to Tim Moore, Lagom Framework Users
Hi Tim,

thx a lot for the reply. I'm using 1.4.0-RC1 and will try to switch to the netty-based service gateway by today and let you know about the results. Weird thing is that actually nothing (that I know of) changed regarding the service locator configuration. Before switching the service gateway I will also verify what sockets are listening during startup. I tried to reboot my macbook in between which didn't help.

It's indeed weird that lagom seems to try to connect to 0.0.0.0:55976.

I will keep you updated. Thx for the help! Much appreciated and good work on the framework. I like it a lot so far.

Cheers,
Thomas

Tim Moore <tim....@lightbend.com> schrieb am Mi., 10. Jan. 2018 um 06:45 Uhr:
Hi Thomas,

Can you confirm which version of Lagom you're using?

If it's 1.4, could you try switching to the netty-based service gateway implementation as described in https://www.lagomframework.com/documentation/1.4.x/scala/ServiceLocator.html#Default-gateway-implementation

My suspicion is that the problem is related to the fact that the service is advertising its hostname/IP address as 0.0.0.0. As a listening address, this means to listen on all devices, but it's not well defined what will happen when trying to connect to that address. It could be system or network-configuration dependent.

This is a long-standing bug in Lagom (https://github.com/lagom/lagom/issues/166) but we haven't previously seen any problems as serious as what you're experiencing so it hasn't gotten fixed yet.

Best,
Tim

On Mon, Jan 8, 2018 at 3:40 AM, Thomas Becker <thomas....@gmail.com> wrote:
I've just noticed that it works fine if I curl the service's port (55976) directly instead of 9000: curl -v http://localhost:55976/api/pastHomeData?from=1115339914

--
You received this message because you are subscribed to the Google Groups "Lagom Framework Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lagom-framewo...@googlegroups.com.

To post to this group, send email to lagom-f...@googlegroups.com.

Thomas Becker

unread,
Jan 10, 2018, 3:34:53 PM1/10/18
to Tim Moore, Lagom Framework Users
Hi again,

small update only as I had to work long and not a lot of time to play with my own application. I've faced some different issue with my java lagom application. The lagom services couldn't connect to cassandra. That issue suddenly disappeared and changed to some kafka LEADER_NOT_AVAILABLE Exceptions while kafka couldn't connect to itself. I've run the exact same application the last time a couple of weeks ago in lagom's dev environment without any issues. No changes have been made.

After digging a bit deeper I figured that all the socket listeners where up and running. I have been able to connect to them just fine. But the framework couldn't. Somewhen I eventually tried to unplug my network cable (USB-C Ethernet Adapter, MacBook) and also turned of wifi. Both apps are working fine now. Plugging the cable in again breaks the app. Using the wifi device in the exact same network works fine however. I always use the cable connection here so I'm sure it worked fine with the cable connection before.

Sorry I'm in kind of a hurry. If I will finde some time I will dig a bit deeper and summarize some valuable info for you tomorrow if you like and will also try to figure out what's wrong with my MacBook's tcp/ip routing. This is all I got so far and this might as well be an issue with my MacBook's tcp routing or tcp stack.

If it is interesting for you I can try the old netty based service locator as well. 

Cheers,
Thomas

Tim Moore

unread,
Jan 10, 2018, 9:39:30 PM1/10/18
to Thomas Becker, Lagom Framework Users
Do you have a proxy configured for your wired network? Some people have seen problems with that, and solved them by adding their IP address to their http.nonProxyHosts configuration.

If there is a routing/proxy issue, then switching to the netty service gateway probably won't help.

Ultimately the solution should be to make lagom use 127.0.0.1 instead of 0.0.0.0 for everything by default.


Cheers,
Tim

To unsubscribe from this group and stop receiving emails from it, send an email to lagom-framework+unsubscribe@googlegroups.com.

To post to this group, send email to lagom-framework@googlegroups.com.



--
Tim Moore
Lagom Tech Lead, Lightbend, Inc.

Thomas Becker

unread,
Jan 11, 2018, 3:35:07 AM1/11/18
to Tim Moore, Lagom Framework Users
Thx again for looking into this. No proxy configured and it also affects Kafka and Cassandra which both are not http. Anyhow I found out that the problems between both apps are slightly different, but both can be solved by unplugging the cable network. 

Network devices in ifconfig look fine, listener do look fine. Except that for the java app there's no kafka listener and in lagom's logs I found:

Kafka Server closed unexpectedly.

For the scala app the problem is the forward from port 9000 to the service's port by the service locator as stated above. Here both network listener are there.

I can work for now (using wifi) and will try to get some progress done. In the evening I will have a closer look at what's going on as I'm curious now what's going on. But as Kafka doesn't even start on the java app it more and more seems to be some macos network issue and not a lagom issue. But a nasty one...

To unsubscribe from this group and stop receiving emails from it, send an email to lagom-framewo...@googlegroups.com.

To post to this group, send email to lagom-f...@googlegroups.com.



--
Tim Moore
Lagom Tech Lead, Lightbend, Inc.

Tim Moore

unread,
Jan 11, 2018, 4:45:16 AM1/11/18
to Thomas Becker, Lagom Framework Users
It's possible that the Kafka problem in your Java app is unrelated. We have gotten some reports of Kafka being unable to start due to corrupted ZooKeeper data (possibly caused by an abrupt termination at some point in the past). The simplest solution is usually to do a clean build, which also blows away all of the Kafka & ZooKeeper data. You can also manually delete it from the "target/lagom-dynamic-projects/lagom-internal-meta-project-kafka/" directory.

Best,
Tim

To unsubscribe from this group and stop receiving emails from it, send an email to lagom-framework+unsubscribe@googlegroups.com.

To post to this group, send email to lagom-framework@googlegroups.com.



--
Tim Moore
Lagom Tech Lead, Lightbend, Inc.




--
Tim Moore
Lagom Tech Lead, Lightbend, Inc.

Thomas Becker

unread,
Mar 23, 2018, 4:34:12 AM3/23/18
to Tim Moore, Lagom Framework Users
Hi Tim,

sorry for the long delay. I worked around the problem by either working without network or by not using runAll. I've collected some further insides and the problem might be related to macos adding the network's domain to the hostname and kafka not being able to startup the broker properly.

I've seen the exact same issue on a colleagues MacBook. 

Here's what I've found:

- Unplug network, disable wifi, sbt clean, sbt runAll --> everything is working fine
- With network, sbt clean:

I'm getting the following errors in kafka's controller.log in /target/lagom-dynamic-projects/lagom-internal-meta-project-kafka/target/log4j_output

[2018-03-23 09:19:58,610] WARN [Controller-0-to-broker-0-send-thread]: Controller 0's connection to broker thomass-mbp.fritz.box:9092 (id: 0 rack: null) was unsuccessful (kafka.controller.RequestSendThread)
java.io.IOException: Connection to thomass-mbp.fritz.box:9092 (id: 0 rack: null) failed.
        at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:68)
        at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:264)
        at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:218)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
[2018-03-23 09:19:58,713] WARN [Controller-0-to-broker-0-send-thread]: Controller 0's connection to broker thomass-mbp.fritz.box:9092 (id: 0 rack: null) was unsuccessful (kafka.controller.RequestSendThread)
java.io.IOException: Connection to thomass-mbp.fritz.box:9092 (id: 0 rack: null) failed.
        at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:68)
        at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:264)
        at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:218)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)

And in kafka's server.log:

[2018-03-23 09:20:29,902] ERROR [KafkaApi-0] Number of alive brokers '0' does not meet the required replication factor '1' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2018-03-23 09:20:29,953] ERROR [KafkaApi-0] Number of alive brokers '0' does not meet the required replication factor '1' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2018-03-23 09:20:30,110] ERROR [KafkaApi-0] Number of alive brokers '0' does not meet the required replication factor '1' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)

Both errors repeat for a while and eventually after a couple of minutes they stop and kafka seems to be up and running fine. Without network it's just a matter of seconds. However then I've got all sorts of errors as mentioned earlier in the sbt runAll console. It's a bit difficult to isolate the problems as the behaviour is not always exactly the same. Adding my hostname with the domain to /etc/hosts didn't help. netcat shows that connecting thomass-mbp.fritz.box:9092 works just fine. And after a while kafka seems to be able to connect, too. Sometimes the microservices work as expected then. Sometimes they don't. 

Adding: lagomKafkaAddress in ThisBuild := "localhost:9092"
to build.sbt also didn't help and kafka still seems to use the hostname + domain internally.

I'd love to provide you more consistent information, but as the behaviour changes (probably depending on timing) it's tough. If you like to have a look yourself am happy to do a screenshare. 

Cheers,
Thomas

To unsubscribe from this group and stop receiving emails from it, send an email to lagom-framewo...@googlegroups.com.

To post to this group, send email to lagom-f...@googlegroups.com.



--
Tim Moore
Lagom Tech Lead, Lightbend, Inc.




--
Tim Moore
Lagom Tech Lead, Lightbend, Inc.

Reply all
Reply to author
Forward
0 new messages