Debugging applications which fail to stage

836 views
Skip to first unread message

Troy Astle

unread,
Feb 4, 2014, 1:17:48 PM2/4/14
to vcap...@cloudfoundry.org
Hi All,

One of the most common pain points I am seeing as new developers come to Cloud Foundry is that applications will fail to stage and developers don't know how to debug the failure.

For example:

troytest $ cf push test2 --path . --buildpack https://github.com/cloudfoundry/java-buildpack.git  --command "/app/websocketd --port=$PORT --devconsole" --no-manifest


...


Uploading test2... OK
Preparing to start test2... OK
-----> Downloaded app package (15M)
Initialized empty Git repository in /tmp/buildpacks/java-buildpack/.git/
-----> Java Buildpack source: https://github.com/cloudfoundry/java-buildpack.git#6f502e7b2355c16d45b0d5e2c7e572ea01e6df39
Checking status of app 'test2'...Application failed to stage
troytest $ cf crashlogs test2
No crashed instances found.


There is a good blog post from Cornelia Davis on debugging applications that fail to stage on her blog:

However this very quickly gets to the point of SSHing into Cloud Foundry VMs and warden containers, which is not possible for developers using the PaaS.
Is there a way to surface more information about applications which fail to stage?

Thanks,
Troy.
 

Patrick Mueller

unread,
Feb 4, 2014, 1:59:48 PM2/4/14
to vcap-dev
Did you get anything from:
 
    cf events <app>
    cf logs <app> --recent

(this is using cf v6)

Hopefully there's something in there which gives you a clue.

I think one of the issues I'm seeing in this area is that people aren't being verbose where they probably should: in staging, and at app startup.

The node buildpack spits out a few steps along the way, so you can see some of what it's doing; it also include `npm install` output which kind of hogs stuff up; `cf logs --recent` doesn't show me all the staging log, for instance.

It would be nice if the buildpacks could be a little more verbose about what they were doing.  Or less, in the case of capturing `npm install` output :-)

And make sure you have have some hot stdout/stderr action at your app's startup time, so you can catch if a service you're trying to bind to isn't there, or whatever.


To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.



--
Patrick Mueller
http://muellerware.org

James Bayer

unread,
Feb 4, 2014, 5:34:54 PM2/4/14
to vcap...@cloudfoundry.org
troy, in working on buildpacks recently, i know that this can be frustrating especially before cf-release v155 [1], which came out today and addresses many of these "swallow the staging error" because now staging log output are now delivered incrementally in loggregator as of v155. prior to v155, staging logs were only delivered when staging was finished, which may be too late to get to loggregator if there is an error during staging.

there still is a known race condition in v155 where staging start and app start messages may not be captured before loggregator starts listening. that should be addressed by this story in the runtime backlog that will work with loggregator [2]. so if you get bit by this race condition, then you can still try a "sleep" after the error occurs in the buildpack compile step to give your buildpack compile log output more time to show up in loggregator before exiting.

if staging happens ok, but the app startup is failing because the process dies or something, then patrick is correct that "cf events" should show you the reason. one simple way to do more troubleshooting in this scenario is via providing a custom start command like this:
 while true; do echo hello; sleep 5; done
then you can explore the container contents using the "cf files" command. of course the websocketd console can work too and i've used that to help people, it's certainly a bit higher friction. 

we're working hard on smoothing over all of these rough patches. please keep surfacing all of these issues.

[2] As a CF user, I expect to see all log output from applications regardless of how fast they start or fail in gcf https://www.pivotaltracker.com/story/show/64088856
Thank you,

James Bayer

Cornelia Davis

unread,
Feb 4, 2014, 10:39:00 PM2/4/14
to vcap...@cloudfoundry.org
James,

Are there any caveats to getting your sleep trick to work? Should it work with all buildpacks? Or at least the system buildpacks?  A really cool idea that I think could prove very useful in poking around how things get laid down in the warden container.  Just isn't working for me trying to push a java app.

Thanks,
Cornelia

James Bayer

unread,
Feb 5, 2014, 12:26:49 AM2/5/14
to vcap...@cloudfoundry.org

On Tue, Feb 4, 2014 at 7:39 PM, Cornelia Davis <cda...@gopivotal.com> wrote:

James,

Are there any caveats to getting your sleep trick to work?

you can’t have a route bound to your app and use the sleep command because then the port won’t be open in your container and the health check for the port listening for traffic will fail . i suppose you could have the simplest server possible like netcat, but for whatever reason ‘/bin/nc -l $PORT’ isn’t working for me to pass the health check for whatever reason.

Should it work with all buildpacks? Or at least the system buildpacks?  A really cool idea that I think could prove very useful in poking around how things get laid down in the warden container.  Just isn't working for me trying to push a java app.

this only applies for apps that complete staging where you want to inspect the droplet file system. it isn’t too useful for running a command. for that you’d need something like the websocketd thing. here is what you can do if you don’t want to repush:

cf curl /v2/apps/YOURAPPGUIDHERE -X PUT -d '{ "command":"while true; do    echo hello;    sleep 5; done" }'

{
  "metadata": {
    "guid": "43a91121-7ba8-45ac-b940-4620b0a93cc3",
    "url": "/v2/apps/43a91121-7ba8-45ac-b940-4620b0a93cc3",
    "created_at": "2013-12-12T07:44:31+00:00",
    "updated_at": "2014-02-05T05:21:41+00:00"
  },
  "entity": {
    "name": "websocketd",
    "production": false,
    "space_guid": "67588493-59ec-47dc-8ce1-916fbda278c0",
    "stack_guid": "50688ae5-9bfc-4bf6-a4bf-caadb21a32c6",
    "buildpack": null,
    "detected_buildpack": "Ruby/Rails",
    "environment_json": {

    },
    "memory": 256,
    "instances": 1,
    "disk_quota": 1024,
    "state": "STOPPED",
    "version": "097de1ff-4446-4036-b5a5-c02f128e681b",
    "command": "while true; do    echo hello;    sleep 5; done",
    "console": true,
    "debug": null,
    "staging_task_id": "ce5aad1856d2414eb67133641725dbb3",
    "package_state": "STAGED",
    "health_check_timeout": null,
    "system_env_json": {
      "VCAP_SERVICES": {

      }
    },
    "space_url": "/v2/spaces/67588493-59ec-47dc-8ce1-916fbda278c0",
    "stack_url": "/v2/stacks/50688ae5-9bfc-4bf6-a4bf-caadb21a32c6",
    "service_bindings_url": "/v2/apps/43a91121-7ba8-45ac-b940-4620b0a93cc3/service_bindings",
    "routes_url": "/v2/apps/43a91121-7ba8-45ac-b940-4620b0a93cc3/routes",
    "events_url": "/v2/apps/43a91121-7ba8-45ac-b940-4620b0a93cc3/events"
  }
}

$cf logs websocketd --recent

2014-02-04T21:20:17.59-0800 [API]     OUT Updated app with guid 43a91121-7ba8-45ac-b940-4620b0a93cc3 ({"state"=>"STOPPED"})
2014-02-04T21:21:41.69-0800 [API]     OUT Updated app with guid 43a91121-7ba8-45ac-b940-4620b0a93cc3 ({"command"=>"PRIVATE DATA HIDDEN"})
2014-02-04T21:23:42.04-0800 [API]     OUT Updated app with guid 43a91121-7ba8-45ac-b940-4620b0a93cc3 ({"route"=>"f03c44d4-ab36-438b-a869-d74387d9b5fa"})
2014-02-04T21:23:59.16-0800 [DEA]     OUT Starting app instance (index 0) with guid 43a91121-7ba8-45ac-b940-4620b0a93cc3
2014-02-04T21:24:02.60-0800 [App/0]   OUT hello
2014-02-04T21:24:02.60-0800 [App/0]   ERR
2014-02-04T21:24:07.52-0800 [App/0]   OUT hello
2014-02-04T21:24:12.53-0800 [App/0]   OUT hello
2014-02-04T21:24:17.53-0800 [App/0]   OUT hello

$ cf files websocketd staging_info.yml
Getting files for app websocketd in org jbayer-normal-org / space development as jXX...@gopivotal.com...
OK

---
detected_buildpack: Ruby/Rails
start_command: /app/websocketd-cloudfoundry/websocketd --port=$PORT --devconsole --dir=/app/websocketd-cloudfoundry

James Bayer

unread,
Feb 5, 2014, 2:14:47 AM2/5/14
to vcap...@cloudfoundry.org

cornelia,

sleep can work if you use a script like this [1] called startChildProcesses.sh.

put it in the root of your app and send the start command that you are trying to configure

cf push env-test-james -c './startChildProcesses.sh ./startHello.sh'

where my startHello.sh script looks like this [2]. then you can have 5 minutes to review the container logs or files (or longer if you change the sleep).

$ cf push env-test-james -c './startChildProcesses.sh ./startHello.sh'

then once it starts

$ cf logs env-test-james --recent

2014-02-04T22:48:12.19-0800 [App/0]   ERR [2014-02-05 06:48:12] INFO  WEBrick 1.3.1
2014-02-04T22:48:12.19-0800 [App/0]   ERR [2014-02-05 06:48:12] INFO  ruby 1.9.3 (2013-11-22) [x86_64-linux]
2014-02-04T22:48:12.19-0800 [App/0]   ERR [2014-02-05 06:48:12] INFO  WEBrick::HTTPServer#start: pid=38 port=63005
2014-02-04T22:48:12.19-0800 [App/0]   OUT Process "./startHello.sh" started with pid 36
2014-02-04T22:48:26.17-0800 [App/0]   ERR 108.216.154.88, 10.10.2.181 - - [05/Feb/2014 06:48:26] "GET / HTTP/1.1" 200 44 0.0023
2014-02-04T22:48:26.48-0800 [App/0]   ERR 108.216.154.88, 10.10.2.181 - - [05/Feb/2014 06:48:26] "GET /favicon.ico HTTP/1.1" 404 18 0.0005
2014-02-04T22:48:30.88-0800 [App/0]   ERR 108.216.154.88, 10.10.2.181 - - [05/Feb/2014 06:48:30] "GET / HTTP/1.1" 200 44 0.0004
2014-02-04T22:48:31.36-0800 [App/0]   ERR 108.216.154.88, 10.10.2.181 - - [05/Feb/2014 06:48:31] "GET / HTTP/1.1" 200 44 0.0004

but if you go this far, you might as well do the websocketd approach [3] since you can type interactive commands.

[1]

#!/bin/bash

function startChildProcesses
{
    for cmd in "$@"; do {
      $cmd & pid=$!
      echo "Process \"$cmd\" started with pid $pid";
      PID_LIST+=" $pid";
    } done

    PID_ARRAY=($PID_LIST)
}

startChildProcesses "$@"
echo 'you have 5 minutes to debug this before this process exits.'
sleep 300

[2]

 #!/bin/bash

bundle exec rackup config.ru -p $PORTj

[3] http://www.iamjambay.com/2013/12/send-interactive-commands-to-cloud.html

Ben Hale

unread,
Feb 5, 2014, 3:30:49 AM2/5/14
to vcap...@cloudfoundry.org
Troy,

The Java Buildpack has long given a better error than what you're seeing, however as part of the loggergator integration, that input was being logs (you can see our tracking of this problem, starting 6 November, here).  I've just validated how cf v6 works against v155 and I think you'll be much happier with the results.  Using an intentionally broken buildpack we now see the following:

Starting app java-main-application in org bhale / space development as bhale@gopivotal.com...
OK
-----> Downloaded app package (14M)
-----> Downloaded app buildpack cache (92M)





Initialized empty Git repository in /tmp/buildpacks/java-buildpack/.git/
-----> Java Buildpack source: https://github.com/nebhale/java-buildpack#21c61f57daa9d7012855bf88efdcc9a044f281c7
-----> Downloading Open Jdk JRE 1.8.0_M8 from http://download.pivotal.io.s3.amazonaws.com/openjdk/lucid/x86_64/openjdk-1.8.0_M8.tar.gz
(4.4s)
       
Expanding Open Jdk JRE to .java-buildpack/open_jdk_jre
(1.6s)
-----> Downloading App Dynamics Agent 3.7.14 from http://download.pivotal.io.s3.amazonaws.com/app-dynamics/app-dynamics-3.7.14.zip
(0.0s)
       
Expanding App Dynamics Agent to .java-buildpack/app_dynamics_agent
(0.1s)
-----> Downloading Spring Auto Reconfiguration 0.8.7 from http://download.pivotal.io.s3.amazonaws.com/auto-reconfiguration/auto-reconfiguration-0.8.7.jar
(0.0s)
[Buildpack]                      ERROR Release failed with exception #<RuntimeError: 'permgen' is not a valid memory size>
'permgen' is not a valid memory size
/var/vcap/packages/dea_next/buildpacks/lib/installer.rb:24:in `release_info': Release info failed: (RuntimeError)
    from /var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:105:in `
release_info'
    from /var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:140:in `start_command'

   
from /var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:80:in `save_buildpack_info'
    from /var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:56:in `
block in stage_application'
    from /var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:50:in `chdir'

   
from /var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:50:in `stage_application'
    from /var/vcap/packages/dea_next/buildpacks/bin/run:10:in `<main>'


FAILED
Server error, status code: 400, error code: 170001, message: Staging error: cannot get instances since staging failed


TIP: use '
cf logs java-main-application --recent' for more information

I believe that if you upgrade everything you'll see the same.  I hope that between the default output and judicious use of JBP_LOG_LEVEL, you can diagnose all of the problems you experience.  If you find that you are still having trouble using the provided diagnostics, please let me know so that we can improve the buildpack for you.


-Ben Hale
Cloud Foundry Java Experience

Troy Astle

unread,
Feb 5, 2014, 5:15:00 AM2/5/14
to vcap...@cloudfoundry.org
Patrick, Ben, James and Cornelia
Thanks for all your replies. Sounds like I need to drag myself into the bright shining future that is CF155 and try again.

@Ben and James, thanks for your efforts on making this better over time. I hit some similar issues a few months back and it looks like things have come along in leaps and bounds. 

Thanks,
Troy.
Reply all
Reply to author
Forward
0 new messages