No log output using cf logs command

John McTeague

unread,

Oct 28, 2014, 12:20:11 PM10/28/14

to vcap...@cloudfoundry.org

I am getting no log output when running cf logs <appname> on a fresh cf-188 deployment and I believe it is related to the following error in the metron log file on the DEA server:

{"timestamp":1414512399.666425228,"process_id":3118,"source":"metron","log_level":"error","message":"can't forward message: loggregator client pool is empty","data":null,"file":"/var/vcap/data/compile/metron_agent/loggregator/src/metron/main.go","line":198,"method":"main.forwardMessagesToDoppler"}

It seems fairly straightforward, there are no loggregators available to handle the message, however I believe my deployment is correct.

I have attached it for reference. Any advice would be welcome.

John

cf.yml

Greg Oehmen

unread,

Oct 28, 2014, 2:06:08 PM10/28/14

to vcap-dev

Hi John:

What version of the CLI are you using (`cf -v`)?

Thanks

Greg

--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/d17d45b3-05b5-4d1a-b50b-1dc8fce242c6%40cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

John Tuley

unread,

Oct 28, 2014, 4:46:03 PM10/28/14

to vcap...@cloudfoundry.org

John,

Can you please verify that your installation's etcd servers are in good shape? The error message you're seeing appears when metron is unable to find entries for the doppler (née loggregator) servers in etcd. (It's also possible that doppler is unable to report itself to etcd for some reason.)

Thanks,

– John Tuley & Georg Apitz

CF LAMB team

John McTeague

unread,

Oct 28, 2014, 6:27:43 PM10/28/14

to vcap...@cloudfoundry.org

Greg, cli is v6.6.0

John, I will check etcd. Is there anything specific in the logs I should look out for?

John Tuley

unread,

Oct 28, 2014, 6:37:47 PM10/28/14

to vcap...@cloudfoundry.org

Nothing in particular. I just want to eliminate the possibility that it's in a bad state, because your manifest looked ok.

--
You received this message because you are subscribed to a topic in the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/700f3e59-9927-439e-8118-2cf431cf0802%40cloudfoundry.org.

--

– John Tuley

John McTeague

unread,

Oct 29, 2014, 6:29:04 AM10/29/14

to vcap...@cloudfoundry.org

I expanded etcd to 3 nodes, I had 2 previously which is not recommended, but the problem persists with 3.

From the DEA's I can access all etcd nodes. There are only a couple of keys present:

 curl -L http://172.16.3.66:4001/v2/keys/
{"action":"get","node":{"key":"/","dir":true,"nodes":[{"key":"/healthstatus","dir":true,"modifiedIndex":343,"createdIndex":343},{"key":"/hm","dir":true,"modifiedIndex":4,"createdIndex":4}]}}

All three machines appear in the etcd config:

curl -L http://172.16.3.66:7001/v2/admin/machines
[{"name":"etcd_z1-0","state":"leader","clientURL":"http://172.16.3.32:4001","peerURL":"http://172.16.3.32:7001"},{"name":"etcd_z2-0","state":"follower","clientURL":"http://172.16.3.34:4001","peerURL":"http://172.16.3.34:7001"},{"name":"etcd_z2-1","state":"follower","clientURL":"http://172.16.3.66:4001","peerURL":"http://172.16.3.66:7001"}]

The doppler.log file contains:

{"timestamp":1414575176.026412249,"process_id":2231,"source":"doppler","log_level":"error","message":"AppStoreWatcher: Got error while waiting for ETCD events: unexpected end of JSON input","data":null,"file":"/var/vcap/data/compile/doppler/loggregator/src/github.com/cloudfoundry/loggregatorlib/store/app_service_store_watcher.go","line":78,"method":"github.com/cloudfoundry/loggregatorlib/store.(*AppServiceStoreWatcher).Run"}

The etcd nodes are accessible from the doppler servers using curl.

There are alot of heartbeat timeout errors in the etcd_ctl.log file for the leader:

[etcd] Oct 29 10:20:47.315 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:20:47.356 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-0" missed=1 backoff="2s"
[etcd] Oct 29 10:20:54.765 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:21:36.815 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:21:36.855 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-0" missed=1 backoff="2s"
[etcd] Oct 29 10:21:46.315 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:22:41.906 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-0" missed=1 backoff="2s"
[etcd] Oct 29 10:22:42.242 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:22:50.315 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:22:50.356 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-0" missed=1 backoff="2s"
[etcd] Oct 29 10:22:52.616 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=2 backoff="4s"
[etcd] Oct 29 10:22:52.657 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-0" missed=2 backoff="4s"

There are no routing issues that I can see however.

John McTeague

unread,

Oct 29, 2014, 6:52:45 AM10/29/14

to vcap...@cloudfoundry.org

Some further key values from etcd:

curl -L http://172.16.3.32:4001/v2/keys/healthstatus/doppler/z1/loggregator_z1
{"action":"get","node":{"key":"/healthstatus/doppler/z1/loggregator_z1","dir":true,"nodes":[{"key":"/healthstatus/doppler/z1/loggregator_z1/0","value":"172.16.3.52","expiration":"2014-10-29T10:50:00.070898193Z","ttl":8,"modifiedIndex":17763,"createdIndex":6184}],"modifiedIndex":343,"createdIndex":343}}

curl -L http://172.16.3.32:4001/v2/keys/healthstatus/doppler/z2/loggregator_z2
{"action":"get","node":{"key":"/healthstatus/doppler/z2/loggregator_z2","dir":true,"nodes":[{"key":"/healthstatus/doppler/z2/loggregator_z2/0","value":"172.16.3.54","expiration":"2014-10-29T10:50:51.137657592Z","ttl":9,"modifiedIndex":17947,"createdIndex":4319}],"modifiedIndex":416,"createdIndex":416}}

jrodr...@pivotal.io

unread,

Oct 29, 2014, 12:43:03 PM10/29/14

to vcap...@cloudfoundry.org

John,

It looks like the right keys are in etcd, so the next debugging step would be check the runner's copy of /var/vcap/jobs/metron/config/metron.json and make sure that it looks correct; in particular, it needs the correct addresses for etcd.

If that looks good, set the "metron_agent.debug" property to true and redeploy; look for log output that would indicate that it can't reach etcd. (E.g. "ServerAddressList.Run: Unable to recursively find keys with prefix" or "ServerAddressList.Run: Timed out talking to store; will try again soon", or anything else regarding etcd.)

Basically, it looks like metron is unable to get the healthstatus keys out of etcd. Since the requisite keys are there, it seems to be a communication problem in metron.

- John Tuley & Joe Rodriguez