No log output using cf logs command

1,075 views
Skip to first unread message

John McTeague

unread,
Oct 28, 2014, 12:20:11 PM10/28/14
to vcap...@cloudfoundry.org
I am getting no log output when running cf logs <appname> on a fresh cf-188 deployment and I believe it is related to the following error in the metron log file on the DEA server:

{"timestamp":1414512399.666425228,"process_id":3118,"source":"metron","log_level":"error","message":"can't forward message: loggregator client pool is empty","data":null,"file":"/var/vcap/data/compile/metron_agent/loggregator/src/metron/main.go","line":198,"method":"main.forwardMessagesToDoppler"}

It seems fairly straightforward, there are no loggregators available to handle the message, however I believe my deployment is correct.

I have attached it for reference. Any advice would be welcome.

John

cf.yml

Greg Oehmen

unread,
Oct 28, 2014, 2:06:08 PM10/28/14
to vcap-dev
Hi John:

What version of the CLI are you using (`cf -v`)?

Thanks

Greg

--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/d17d45b3-05b5-4d1a-b50b-1dc8fce242c6%40cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

John Tuley

unread,
Oct 28, 2014, 4:46:03 PM10/28/14
to vcap...@cloudfoundry.org
John,

Can you please verify that your installation's etcd servers are in good shape? The error message you're seeing appears when metron is unable to find entries for the doppler (née loggregator) servers in etcd. (It's also possible that doppler is unable to report itself to etcd for some reason.)

Thanks,
– John Tuley & Georg Apitz
CF LAMB team

John McTeague

unread,
Oct 28, 2014, 6:27:43 PM10/28/14
to vcap...@cloudfoundry.org
Greg, cli is v6.6.0

John, I will check etcd. Is there anything specific in the logs I should look out for?

John Tuley

unread,
Oct 28, 2014, 6:37:47 PM10/28/14
to vcap...@cloudfoundry.org
Nothing in particular. I just want to eliminate the possibility that it's in a bad state, because your manifest looked ok.
--
You received this message because you are subscribed to a topic in the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/700f3e59-9927-439e-8118-2cf431cf0802%40cloudfoundry.org.


--
– John Tuley

John McTeague

unread,
Oct 29, 2014, 6:29:04 AM10/29/14
to vcap...@cloudfoundry.org
I expanded etcd to 3 nodes, I had 2 previously which is not recommended, but the problem persists with 3.

From the DEA's I can access all etcd nodes. There are only a couple of keys present:

 curl -L http://172.16.3.66:4001/v2/keys/
{"action":"get","node":{"key":"/","dir":true,"nodes":[{"key":"/healthstatus","dir":true,"modifiedIndex":343,"createdIndex":343},{"key":"/hm","dir":true,"modifiedIndex":4,"createdIndex":4}]}}

All three machines appear in the etcd config:

curl -L http://172.16.3.66:7001/v2/admin/machines
[{"name":"etcd_z1-0","state":"leader","clientURL":"http://172.16.3.32:4001","peerURL":"http://172.16.3.32:7001"},{"name":"etcd_z2-0","state":"follower","clientURL":"http://172.16.3.34:4001","peerURL":"http://172.16.3.34:7001"},{"name":"etcd_z2-1","state":"follower","clientURL":"http://172.16.3.66:4001","peerURL":"http://172.16.3.66:7001"}]

The doppler.log file contains:

{"timestamp":1414575176.026412249,"process_id":2231,"source":"doppler","log_level":"error","message":"AppStoreWatcher: Got error while waiting for ETCD events: unexpected end of JSON input","data":null,"file":"/var/vcap/data/compile/doppler/loggregator/src/github.com/cloudfoundry/loggregatorlib/store/app_service_store_watcher.go","line":78,"method":"github.com/cloudfoundry/loggregatorlib/store.(*AppServiceStoreWatcher).Run"}

The etcd nodes are accessible from the doppler servers using curl.

There are alot of heartbeat timeout errors in the etcd_ctl.log file for the leader:

[etcd] Oct 29 10:20:47.315 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:20:47.356 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-0" missed=1 backoff="2s"
[etcd] Oct 29 10:20:54.765 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:21:36.815 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:21:36.855 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-0" missed=1 backoff="2s"
[etcd] Oct 29 10:21:46.315 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:22:41.906 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-0" missed=1 backoff="2s"
[etcd] Oct 29 10:22:42.242 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:22:50.315 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=1 backoff="2s"
[etcd] Oct 29 10:22:50.356 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-0" missed=1 backoff="2s"
[etcd] Oct 29 10:22:52.616 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-1" missed=2 backoff="4s"
[etcd] Oct 29 10:22:52.657 INFO      | etcd_z1-0: warning: heartbeat time out peer="etcd_z2-0" missed=2 backoff="4s"


There are no routing issues that I can see however.

John McTeague

unread,
Oct 29, 2014, 6:52:45 AM10/29/14
to vcap...@cloudfoundry.org
Some further key values from etcd:

curl -L http://172.16.3.32:4001/v2/keys/healthstatus/doppler/z1/loggregator_z1
{"action":"get","node":{"key":"/healthstatus/doppler/z1/loggregator_z1","dir":true,"nodes":[{"key":"/healthstatus/doppler/z1/loggregator_z1/0","value":"172.16.3.52","expiration":"2014-10-29T10:50:00.070898193Z","ttl":8,"modifiedIndex":17763,"createdIndex":6184}],"modifiedIndex":343,"createdIndex":343}}

curl
-L http://172.16.3.32:4001/v2/keys/healthstatus/doppler/z2/loggregator_z2
{"action":"get","node":{"key":"/healthstatus/doppler/z2/loggregator_z2","dir":true,"nodes":[{"key":"/healthstatus/doppler/z2/loggregator_z2/0","value":"172.16.3.54","expiration":"2014-10-29T10:50:51.137657592Z","ttl":9,"modifiedIndex":17947,"createdIndex":4319}],"modifiedIndex":416,"createdIndex":416}}



jrodr...@pivotal.io

unread,
Oct 29, 2014, 12:43:03 PM10/29/14
to vcap...@cloudfoundry.org
John,

It looks like the right keys are in etcd, so the next debugging step would be check the runner's copy of /var/vcap/jobs/metron/config/metron.json and make sure that it looks correct; in particular, it needs the correct addresses for etcd.

If that looks good, set the "metron_agent.debug" property to true and redeploy; look for log output that would indicate that it can't reach etcd. (E.g. "ServerAddressList.Run: Unable to recursively find keys with prefix" or "ServerAddressList.Run: Timed out talking to store; will try again soon", or anything else regarding etcd.)

Basically, it looks like metron is unable to get the healthstatus keys out of etcd. Since the requisite keys are there, it seems to be a communication problem in metron.

- John Tuley & Joe Rodriguez

John McTeague

unread,
Oct 30, 2014, 12:15:52 PM10/30/14
to vcap...@cloudfoundry.org, jrodr...@pivotal.io
So just before I was about to try the metron_agent.debug option, I started getting logs for my apps.

I plan on debugging further on this because I do not like things that suddenly do or do not work.

John McTeague

unread,
Nov 11, 2014, 11:19:18 AM11/11/14
to vcap...@cloudfoundry.org, jrodr...@pivotal.io
I believe my problem may have been as simple as start order, my DEA's started before both loggregator and traffic controller jobs. When I restart all processes on the DEA's I can see (via debug logs) that messages are being received and forwarded to the loggregator server.

However my command line still getting no output from the cf logs command. Is this something in the loggreagator_trafficcontroller that I should now focus on?

John Tuley

unread,
Nov 11, 2014, 12:06:32 PM11/11/14
to vcap...@cloudfoundry.org, jrodr...@pivotal.io
Is there any error message from the CLI, or does it show a successful connection and then simply not have any messages?

John McTeague

unread,
Nov 11, 2014, 4:39:54 PM11/11/14
to vcap...@cloudfoundry.org, jrodr...@pivotal.io
It just sits and waits for output. With CF_TRACE on I get

c
:\>cf logs sample-app

VERSION
:
6.6.2-0c953cf


REQUEST
: [2014-11-11T21:16:55Z]
GET
/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/apps?q=name%3Asample-app&inline-relations-depth=1 HTTP/1.1
Host: api.cf.example.com
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/json
User-Agent: go-cli 6.6.2-0c953cf / windows



RESPONSE
: [2014-11-11T21:16:56Z]
HTTP
/1.1 200 OK
Content-Length: 4874
Content-Type: application/json;charset=utf-8
Date: Tue, 11 Nov 2014 21:16:46 GMT
Server: nginx
X
-Cf-Requestid: 529a5f7e-f647-47ba-4697-cab35a4042a7
X
-Content-Type-Options: nosniff
X
-Vcap-Request-Id: c4b2c0e8-f9b2-4296-72df-eee090bc0b93::f3d3f7f2-8b77-4bd0-aed4-cf0da66b215a

{
 
"total_results": 1,
 
"total_pages": 1,
 
"prev_url": null,
 
"next_url": null,
 
"resources": [
   
{
     
"metadata": {
       
"guid": "70bffffc-6a91-49d1-af30-0aba02b7ab25",
       
"url": "/v2/apps/70bffffc-6a91-49d1-af30-0aba02b7ab25",
       
"created_at": "2014-11-11T15:31:34+00:00",
       
"updated_at": "2014-11-11T21:16:23+00:00"
     
},
     
"entity": {
       
"name": "sample-app",
       
"production": false,
       
"space_guid": "39a40ce1-ab0b-4de9-b40c-96fae086dd18",
       
"stack_guid": "44b9b343-2886-47de-a647-ef0631439f8a",
       
"buildpack": null,
       
"detected_buildpack": "java-buildpack=v2.4-offline-https://github.com/cloudfoundry/java-buildpack.git#7cdcf1a op
en-jdk-jre=1.7.0_60 spring-auto-reconfiguration=1.4.0_RELEASE tomcat-access-logging-support=2.2.0_RELEASE tomcat-instanc
e=7.0.54 tomcat-lifecycle-support=2.2.0_REL..."
,
       
"environment_json": {

       
},
       
"memory": 1024,
       
"instances": 1,
       
"disk_quota": 1024,
       
"state": "STARTED",
       
"version": "b6d6fe4e-abc1-47c6-a46f-e49ddd1dbb9f",
       
"command": null,
       
"console": false,
       
"debug": null,
       
"staging_task_id": "e05966224e5e436aa71a82a441b0cb2d",
       
"package_state": "STAGED",
       
"health_check_timeout": null,
       
"staging_failed_reason": null,
       
"docker_image": null,
       
"package_updated_at": "2014-11-11T15:31:34+00:00",
       
"detected_start_command": "JAVA_HOME=$PWD/.java-buildpack/open_jdk_jre JAVA_OPTS=\"-Djava.io.tmpdir=$TMPDIR -XX:
OnOutOfMemoryError=$PWD/.java-buildpack/open_jdk_jre/bin/killjava.sh -Xmx768M -Xms768M -XX:MaxPermSize=104857K -XX:PermS
ize=104857K -Xss1M -Daccess.logging.enabled=false -Dhttp.port=$PORT\" $PWD/.java-buildpack/tomcat/bin/catalina.sh run"
,
       
"space_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18",
       
"space": {
         
"metadata": {
           
"guid": "39a40ce1-ab0b-4de9-b40c-96fae086dd18",
           
"url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18",
           
"created_at": "2014-11-11T14:49:31+00:00",
           
"updated_at": null
         
},
         
"entity": {
           
"name": "testspace",
           
"organization_guid": "5595be4e-9cfc-4af0-abd5-7509632ebb3d",
           
"space_quota_definition_guid": null,
           
"organization_url": "/v2/organizations/5595be4e-9cfc-4af0-abd5-7509632ebb3d",
           
"developers_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/developers",
           
"managers_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/managers",
           
"auditors_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/auditors",
           
"apps_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/apps",
           
"routes_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/routes",
           
"domains_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/domains",
           
"service_instances_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/service_instances",
           
"app_events_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/app_events",
           
"events_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/events",
           
"security_groups_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18/security_groups"
         
}
       
},
       
"stack_url": "/v2/stacks/44b9b343-2886-47de-a647-ef0631439f8a",
       
"stack": {
         
"metadata": {
           
"guid": "44b9b343-2886-47de-a647-ef0631439f8a",
           
"url": "/v2/stacks/44b9b343-2886-47de-a647-ef0631439f8a",
           
"created_at": "2014-11-11T14:16:10+00:00",
           
"updated_at": null
         
},
         
"entity": {
           
"name": "lucid64",
           
"description": "Ubuntu 10.04 on x86-64"
         
}
       
},
       
"events_url": "/v2/apps/70bffffc-6a91-49d1-af30-0aba02b7ab25/events",
       
"service_bindings_url": "/v2/apps/70bffffc-6a91-49d1-af30-0aba02b7ab25/service_bindings",
       
"service_bindings": [

       
],
       
"routes_url": "/v2/apps/70bffffc-6a91-49d1-af30-0aba02b7ab25/routes",
       
"routes": [
         
{
           
"metadata": {
             
"guid": "25c9d7db-6424-4cbb-a6a4-dcc58f71ed7e",
             
"url": "/v2/routes/25c9d7db-6424-4cbb-a6a4-dcc58f71ed7e",
             
"created_at": "2014-11-11T15:31:17+00:00",
             
"updated_at": null
           
},
           
"entity": {
             
"host": "sample-app",
             
"domain_guid": "3beb9c92-c9f5-4e4f-9a63-dd9544fcb644",
             
"space_guid": "39a40ce1-ab0b-4de9-b40c-96fae086dd18",
             
"domain_url": "/v2/domains/3beb9c92-c9f5-4e4f-9a63-dd9544fcb644",
             
"space_url": "/v2/spaces/39a40ce1-ab0b-4de9-b40c-96fae086dd18",
             
"apps_url": "/v2/routes/25c9d7db-6424-4cbb-a6a4-dcc58f71ed7e/apps"
           
}
         
}
       
]
     
}
   
}
 
]
}

WEBSOCKET REQUEST
: [2014-11-11T21:16:56Z]
GET
/tail/?app=70bffffc-6a91-49d1-af30-0aba02b7ab25 HTTP/1.1
Host: wss://loggregator.cf.example.com:443
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: [HIDDEN]
Origin: http://localhost
Authorization: [PRIVATE DATA HIDDEN]


WEBSOCKET RESPONSE
: [2014-11-11T21:17:01Z]
HTTP
/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-Websocket-Accept: ZD2htcrCjR/0llSoxSnS0a0wTjE=

Connected, tailing logs for app sample-app in org testorg / space testspace as admin...

John McTeague

unread,
Nov 12, 2014, 9:10:11 AM11/12/14
to vcap...@cloudfoundry.org, jrodr...@pivotal.io

The following appears in my doppler logs frequently:

{"timestamp":1415800834.813282728,"process_id":1618,"source":"doppler","log_level":"error","message":"AppStoreWatcher: Got error while waiting for ETCD events: store request timed out","data":null,"file":"/var/vcap/data/compile/doppler/loggregator/src/github.com/cloudfoundry/loggregatorlib/store/app_service_store_watcher.go","line":78,"method":"github.com/cloudfoundry/loggregatorlib/store.(*AppServiceStoreWatcher).Run"}

I have no trouble accessing the etcd api from the doppler servers.

       
<span style="color: #080;" class="st
...

John Tuley

unread,
Nov 12, 2014, 12:56:25 PM11/12/14
to vcap...@cloudfoundry.org
John,

The DEA logging agent is supposed to put information into etcd to forward app logs to an external syslog drain (under the '/loggregator/services/' directory), and doppler reads that information to do the forwarding. If you've never bound a syslog drain to an app, that key won't exist, and the AppStoreWatch will throw those errors. That shouldn't prevent the rest of doppler from working. (By the way, getting that error instead of something else about etcd suggests that the connection is ok, but that the key doesn't exist – which is expected if you've never set up a syslog drain for an app.)

Without further information about the specific failures, I don't have any fixes to suggest to you. I do have a suggestion to get more data:
  1. Restart both doppler and loggregator_trafficcontroller with the --debug flag to get more logs. You can either do this by setting debug: true in the properties for those jobs in the manifest, or by SSHing to their VMs, editing the _ctl file to include the flag in the invocation, and using monit to restart the job.
  2. Deploy an app that you know emits logs frequently. I recommend this one from the CF Acceptance Tests, which just emits a log message once per second.
  3. Watch the logs on both traffic controller to make sure that it's receiving the connections from the CLI, and opening connections to the doppler servers.
  4. Watch the logs on doppler to see that it's receiving and processing messages from the app (with --debug on, this will be very verbose, with several loglines per received message as each processing stage works on the message) and that it's receiving connections from traffic controller.
Hopefully something sticks out in those logs that will give us more information to diagnose the problem.

– John Tuley.

Neil Aitken

unread,
Nov 12, 2014, 2:54:40 PM11/12/14
to vcap...@cloudfoundry.org
Hi,

I've been working with John McTeague on this issue (in the same environment). After redeploying loggregator_trafficcontroller & doppler with the debug flag, I pushed the ruby sample app you linked from the acceptance tests.

This is what I could see from the debug logs:

The traffic controller instances post a couple of debug messages every 5 seconds (we have 2 router instances, one in each AZ):

{"timestamp":1415821090.113283873,"process_id":3449,"source":"loggregator trafficcontroller","log_level":"debug","message":"Reregistered with router","data":null,"file":"/var/vcap/data/compile/loggregator_trafficcontroller/loggregator/src/github.com/cloudfoundry/loggregatorlib/cfcomponent/registrars/routerregistrar/router_registrar.go","line":120,"method":"github.com/cloudfoundry/loggregatorlib/cfcomponent/registrars/routerregistrar.func·004"}
{"timestamp":1415821090.144928217,"process_id":3449,"source":"loggregator trafficcontroller","log_level":"debug","message":"Reregistered with router","data":null,"file":"/var/vcap/data/compile/loggregator_trafficcontroller/loggregator/src/github.com/cloudfoundry/loggregatorlib/cfcomponent/registrars/routerregistrar/router_registrar.go","line":120,"method":"github.com/cloudfoundry/loggregatorlib/cfcomponent/registrars/routerregistrar.func·004"}


On attaching a tail with cf logs, I can see this picked up by one of the traffic controller instances:

{"timestamp":1415821184.299514294,"process_id":3449,"source":"loggregator trafficcontroller","log_level":"debug","message":"legacy proxy: ServeHTTP entered with request \u0026{GET /tail/?app=27ad2f68-ee54-4dca-ba1f-f864fafcb415 HTTP/1.1 1 1 map[User-Agent:[Go 1.1 package http] Connection:[Upgrade] Origin:[http://localhost] Sec-Websocket-Version:[13] Upgrade:[websocket] X-Cf-Requestid:[c0692334-dd5f-4fc7-40ce-35e77e8809b6] X-Forwarded-For:[10.100.138.13, 172.16.2.31] X-Forwarded-Proto:[https] Authorization:[*******] Sec-Websocket-Key:[******] X-Request-Start:[1415821184671] X-Vcap-Request-Id:[4e8763fb-2747-4d4c-650b-56c59d1b4b13]] 0xb9dd60 0 [] false loggregator.cf2.cloudlab.jpmchase.net:443 map[] map[] \u003cnil\u003e map[] 172.16.3.59:3663 /tail/?app=27ad2f68-ee54-4dca-ba1f-f864fafcb415 \u003cnil\u003e}","data":null,"file":"/var/vcap/data/compile/loggregator_trafficcontroller/loggregator/src/trafficcontroller/legacyproxy/legacy_proxy.go","line":60,"method":"trafficcontroller/legacyproxy.(*Proxy).ServeHTTP"}
{"timestamp":1415821184.300549030,"process_id":3449,"source":"loggregator trafficcontroller","log_level":"debug","message":"doppler proxy: ServeHTTP entered with request \u0026{GET /apps/27ad2f68-ee54-4dca-ba1f-f864fafcb415/stream?app=27ad2f68-ee54-4dca-ba1f-f864fafcb415 HTTP/1.1 1 1 map[User-Agent:[Go 1.1 package http] Connection:[Upgrade] Origin:[http://localhost] Sec-Websocket-Version:[13] Upgrade:[websocket] X-Cf-Requestid:[c0692334-dd5f-4fc7-40ce-35e77e8809b6] X-Forwarded-For:[10.100.138.13, 172.16.2.31] X-Forwarded-Proto:[https] Authorization:[*******] X-Request-Start:[1415821184671] X-Vcap-Request-Id:[4e8763fb-2747-4d4c-650b-56c59d1b4b13]] 0xb9dd60 0 [] false loggregator.cf2.cloudlab.jpmchase.net:443 map[app:[27ad2f68-ee54-4dca-ba1f-f864fafcb415]] map[] \u003cnil\u003e map[] 172.16.3.59:3663 /tail/?app=27ad2f68-ee54-4dca-ba1f-f864fafcb415 \u003cnil\u003e}","data":null,"file":"/var/vcap/data/compile/loggregator_trafficcontroller/loggregator/src/trafficcontroller/dopplerproxy/doppler_proxy.go","line":70,"method":"trafficcontroller/dopplerproxy.(*Proxy).ServeHTTP"}
{"timestamp":1415821184.406283855,"process_id":3449,"source":"loggregator trafficcontroller","log_level":"debug","message":"websocket handler: ServeHTTP entered with request \u0026{GET /apps/27ad2f68-ee54-4dca-ba1f-f864fafcb415/stream?app=27ad2f68-ee54-4dca-ba1f-f864fafcb415 HTTP/1.1 1 1 map[User-Agent:[Go 1.1 package http] Connection:[Upgrade] Origin:[http://localhost] Sec-Websocket-Version:[13] Upgrade:[websocket] X-Cf-Requestid:[c0692334-dd5f-4fc7-40ce-35e77e8809b6] X-Forwarded-For:[10.100.138.13, 172.16.2.31] X-Forwarded-Proto:[https] Authorization:[********] Sec-Websocket-Key:[*******] X-Request-Start:[1415821184671] X-Vcap-Request-Id:[4e8763fb-2747-4d4c-650b-56c59d1b4b13]] 0xb9dd60 0 [] false loggregator.cf2.cloudlab.jpmchase.net:443 map[app:[27ad2f68-ee54-4dca-ba1f-f864fafcb415]] map[] \u003cnil\u003e map[] 172.16.3.59:3663 /tail/?app=27ad2f68-ee54-4dca-ba1f-f864fafcb415 \u003cnil\u003e}","data":null,"file":"/var/vcap/data/compile/loggregator_trafficcontroller/loggregator/src/github.com/cloudfoundry/loggregatorlib/server/handlers/websocket_handler.go","line":23,"method":"github.com/cloudfoundry/loggregatorlib/server/handlers.(*websocketHandler).ServeHTTP"}


There's also some go stuff in the stderr log for traffic controller, but I can't tell if this is just go debug info or something to worry about:

goroutine 34 [select]:
net/http.(*persistConn).writeLoop(0xc210068580)
        /usr/local/go/src/pkg/net/http/transport.go:791 +0x271
created by net/http.(*Transport).dialConn
        /usr/local/go/src/pkg/net/http/transport.go:529 +0x61e

goroutine 36 [select]:
net/http.(*persistConn).writeLoop(0xc210068600)
        /usr/local/go/src/pkg/net/http/transport.go:791 +0x271
created by net/http.(*Transport).dialConn
        /usr/local/go/src/pkg/net/http/transport.go:529 +0x61e

goroutine 48 [runnable]:
github.com/cloudfoundry/storeadapter/workerpool.func·001()
        /var/vcap/data/compile/loggregator_trafficcontroller/loggregator/src/github.com/cloudfoundry/storeadapter/workerpool/worker_pool.go:41
        /var/vcap/data/compile/loggregator_trafficcontroller/loggregator/src/github.com/cloudfoundry/storeadapter/workerpool/worker_pool.go:43 +0xc2



...

On the doppler (loggregator) logs, in addition to the ETCD error which prior discussion on this thread suggests is a non-issue, the only other thing logged is every 10 seconds some kind of healthcheck / heartbeat:
{"timestamp":1415821609.506625891,"process_id":3135,"source":"doppler","log_level":"debug","message":"Health updates channel pushed true at time 2014-11-12 19:46:49.506600841 +0000 UTC","data":null,"file":"/var/vcap/data/compile/doppler/loggregator/src/doppler/main.go","line":186,"method":"main.func·014"}


Again, on attaching a cf logs tail, there are signs that the doppler instances are detecting this:
{"timestamp":1415821672.153996706,"process_id":3135,"source":"doppler","log_level":"debug","message":"WebsocketServer.ServeHTTP: starting","data":null,"file":"/var/vcap/data/compile/doppler/loggregator/src/doppler/sinkserver/websocketserver/websocket_server.go","line":76,"method":"doppler/sinkserver/websocketserver.(*WebsocketServer).ServeHTTP"}
{"timestamp":1415821672.154967785,"process_id":3135,"source":"doppler","log_level":"debug","message":"WebsocketServer: Requesting a wss sink for app 27ad2f68-ee54-4dca-ba1f-f864fafcb415","data":null,"file":"/var/vcap/data/compile/doppler/loggregator/src/doppler/sinkserver/websocketserver/websocket_server.go","line":148,"method":"doppler/sinkserver/websocketserver.(*WebsocketServer).streamLogs"}
{"timestamp":1415821672.155472040,"process_id":3135,"source":"doppler","log_level":"debug","message":"SinkManager: Sink with identifier 172.16.3.58:52106 requested. Opened it.","data":null,"file":"/var/vcap/data/compile/doppler/loggregator/src/doppler/sinkserver/sinkmanager/sink_manager.go","line":123,"method":"doppler/sinkserver/sinkmanager.(*SinkManager).RegisterSink"}
{"timestamp":1415821672.155981302,"process_id":3135,"source":"doppler","log_level":"debug","message":"Websocket Sink 172.16.3.58:52106: Running for appId [27ad2f68-ee54-4dca-ba1f-f864fafcb415]","data":null,"file":"/var/vcap/data/compile/doppler/loggregator/src/doppler/sinks/websocket/websocket_sink.go","line":52,"method":"doppler/sinks/websocket.(*WebsocketSink).Run"}
{"timestamp":1415821672.156505346,"process_id":3135,"source":"doppler","log_level":"debug","message":"Websocket Sink 172.16.3.58:52106: Waiting for activity","data":null,"file":"/var/vcap/data/compile/doppler/loggregator/src/doppler/sinks/websocket/websocket_sink.go","line":57,"method":"doppler/sinks/websocket.(*WebsocketSink).Run"}



All this, but still no actual log info from cf logs:

ubuntu@cf2-bastion:~$ cf logs ruby-sample
Connected, tailing logs for app ruby-sample in org clouddev / space default as admin...

^Cubuntu@cf2-bastion:~$ cf logs ruby-sample --recent
Connected, dumping recent logs for app ruby-sample in org clouddev / space default as admin...

ubuntu@cf2-bastion:~$ cf logs ruby-sample
Connected, tailing logs for app ruby-sample in org clouddev / space default as admin...



thanks,
Neil

John McTeague

unread,
Nov 13, 2014, 9:26:00 AM11/13/14
to vcap...@cloudfoundry.org
Hi, we have identified the problem.

Dropsonde/metron uses port 3457 for sending messages to loggregator and 3456 for legacy messages. The documentation at http://docs.cloudfoundry.org/deploying/common/security_groups.html only mentions 3456. As soon as we opened 3457 our logs started to stream.

Can we get those docs updated.

Thanks,
John

Alexander Jackson

unread,
Nov 13, 2014, 10:41:24 AM11/13/14
to vcap...@cloudfoundry.org
Hi John,
   Sorry about those docs being out of date.    I created a tracker bug so we don't loose this.   https://www.pivotaltracker.com/story/show/82657784

Thanks.
      - Alex.

--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/fc4cbfd3-1b96-4649-bd03-225ea1535982%40cloudfoundry.org.

Johannes Hiemer

unread,
Dec 12, 2014, 3:40:24 AM12/12/14
to vcap...@cloudfoundry.org
Guys,
thanks a lot for this thread. This issue was driving me nuts again, until I googled and found that someone has already found it. :-)

Dopplers my headache issue number one at the moment, when upgrading. :-)

Alexander Jackson

unread,
Dec 12, 2014, 9:37:23 AM12/12/14
to vcap...@cloudfoundry.org
Johannes,
   Are you having additional issues with doppler other than the port during upgrades?
         - Alex.

Stephen Byers

unread,
Dec 16, 2014, 5:51:37 PM12/16/14
to vcap...@cloudfoundry.org
I had this same issue which was resolved when I attached the appropriate security groups to the internal CF VMs created by BOSH.  Everything has been logging fine; however, today I was running a test where I scaled up two services that talk to each other to 5 instances each and then I pushed a little more load through.  I noticed at some point logging stopped via the "cf logs" command but I was still seeing messages from the router when requests were sent to the instances, but no application logs.  I deleted the apps and pushed them again and it started working again.  Any known issues with logging when verbose logs from numerous instances under load occur?

Thanks

Alexander Jackson

unread,
Dec 16, 2014, 8:42:18 PM12/16/14
to vcap...@cloudfoundry.org
Hi Stephen,
    Currently there is no known issues around the behavior you're describing.   Which version of CF are you running?   How much load are we talking about (ie X log lines / second)?    

      - Alex.

On Tue, Dec 16, 2014 at 3:51 PM, Stephen Byers <smb...@gmail.com> wrote:
I had this same issue which was resolved when I attached the appropriate security groups to the internal CF VMs created by BOSH.  Everything has been logging fine; however, today I was running a test where I scaled up two services that talk to each other to 5 instances each and then I pushed a little more load through.  I noticed at some point logging stopped via the "cf logs" command but I was still seeing messages from the router when requests were sent to the instances, but no application logs.  I deleted the apps and pushed them again and it started working again.  Any known issues with logging when verbose logs from numerous instances under load occur?

Thanks

--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.

Stephen B.

unread,
Dec 17, 2014, 10:20:13 AM12/17/14
to vcap...@cloudfoundry.org
Hi Alex,

  I'm running CF v194.  I'm only logging roughly 250-300 lines per second across a couple of containers.  I just ran some additional tests with two clients viewing the logs at the same time as before and am not seeing the issue right now.  Maybe some other testing was impacting me (e.g. network testing, etc.).  I'll keep an eye out to see if it occurs again.  It was a very odd situation and I was doing some changes with security groups at the time as well so I could have stepped on the rules the container needed to send the log information?

Thank you,
Stephen

Stephen B.

unread,
Dec 18, 2014, 2:30:18 PM12/18/14
to vcap...@cloudfoundry.org
Alex and others,

  It looks like application logging has stopped again.  I simply continued to run tests with a single-threaded client accessing a couple of services deployed in CF and now the only logs I get are from the router showing the request made it to the app.  Logging for both services/apps stopped.  I do see on the loggregator instance I have gotten several of the following errors in the stdout log but not sure if that is related:

{"timestamp":1418930260.438480377,"process_id":1564,"source":"doppler","log_level":"error","message":"AppStoreWatcher: Got error while waiting for ETCD events: store request timed out","data":null,"file":"/var/vcap/data/compile/doppler/loggregator/src/github.com/cloudfoundry/loggregatorlib/store/app_service_store_watcher.go","line":78,"method":"github.com/cloudfoundry/loggregatorlib/store.(*AppServiceStoreWatcher).Run"}

  Any other logs or areas I should investigate that would help identify the problem?

Thanks,
Stephen

Stephen B.

unread,
Dec 19, 2014, 1:11:50 PM12/19/14
to vcap...@cloudfoundry.org
In relation to halted app logging, I used bosh to restart etcd, loggregator, and loggregator_trafficcontroller but it did not fix the problem.  However, when I restarted the runner, the apps logs started showing up again.  Still not sure what the root cause is or if this is a bug somewhere.

Thanks
Reply all
Reply to author
Forward
0 new messages