apps won't start : insufficient running resources

yaco...@gmail.com

unread,

Oct 5, 2014, 8:21:00 AM10/5/14

to vcap...@cloudfoundry.org

Hi.

My CF instance (bosh nise, single server) is acting weird, won't start any apps:

cf restart iserveProxy
Stopping app iserveProxy in org myorg / space myspace as admin...
OK

Starting app iserveProxy in org myorg / space myspace as admin...
FAILED
Server error, status code: 503, error code: 150003, message: One or more instances could not be started because of insufficient running resources.

newCF@cfserver:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          5981       3248       2732          0         28        674
-/+ buffers/cache:       2545       3435
Swap:         5883         12       5871

I Clearly have more than enough memory.
The stacktrace of cloud_controller_ng.log is:

{"timestamp":1412511536.2771611,"message":"Request failed: 503: {\"code\"=>150003, \"description\"=>\"One or more instances could not be started because of insufficient running resources.\", \"error_code\"=>\"CF-InsufficientRunningResourcesAvailable\", \"backtrace\"=>[\"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/dea/client.rb:126:in `start_instances'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/dea/client.rb:253:in `start_instances_in_range'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/dea/client.rb:23:in `start'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/dea/backend.rb:32:in `start'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/app_observer.rb:58:in `react_to_state_change'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/app_observer.rb:27:in `updated'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/models/runtime/app.rb:532:in `after_commit'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/model/base.rb:1871:in `block in _save'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/database/transactions.rb:186:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/database/transactions.rb:186:in `block in after_transaction_commit'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/database/transactions.rb:186:in `each'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/database/transactions.rb:186:in `after_transaction_commit'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/database/transactions.rb:286:in `remove_transaction'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/database/transactions.rb:149:in `_transaction'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/database/transactions.rb:100:in `block in transaction'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/database/connecting.rb:229:in `block in synchronize'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/connection_pool/threaded.rb:104:in `hold'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/database/connecting.rb:229:in `synchronize'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sequel-4.11.0/lib/sequel/database/transactions.rb:89:in `transaction'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/controllers/base/model_controller.rb:63:in `update'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/controllers/base/base_controller.rb:77:in `dispatch'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/rest_controller/routes.rb:16:in `block in define_route'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1540:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1540:in `block in compile!'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in `[]'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in `block (3 levels) in route!'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:966:in `route_eval'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:950:in `block (2 levels) in route!'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:987:in `block in process_route'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:985:in `catch'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:985:in `process_route'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:948:in `block in route!'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:947:in `each'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:947:in `route!'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1059:in `block in dispatch!'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in `block in invoke'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in `catch'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in `invoke'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1056:in `dispatch!'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:882:in `block in call!'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in `block in invoke'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in `catch'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1041:in `invoke'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:882:in `call!'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:870:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-protection-1.5.0/lib/rack/protection/xss_header.rb:18:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-protection-1.5.0/lib/rack/protection/path_traversal.rb:16:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-protection-1.5.0/lib/rack/protection/json_csrf.rb:18:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-protection-1.5.0/lib/rack/protection/base.rb:49:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-protection-1.5.0/lib/rack/protection/base.rb:49:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-protection-1.5.0/lib/rack/protection/frame_options.rb:31:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/head.rb:11:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:175:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/sinatra-1.4.3/lib/sinatra/base.rb:1949:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/builder.rb:138:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/urlmap.rb:65:in `block in call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/urlmap.rb:50:in `each'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/urlmap.rb:50:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/builder.rb:138:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:86:in `block in pre_process'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:84:in `catch'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/thin-1.6.2/lib/thin/connection.rb:84:in `pre_process'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:1037:in `call'\", \"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:1037:in `block in spawn_threadpool'\"]}","log_level":"error","source":"cc.api","data":{"request_guid":"14946c8c-06a4-4416-72c9-01b0f915543e::37b8f84e-c5ff-4cb6-9e98-5d35f11e794c"},"thread_id":45174800,"fiber_id":36036080,"process_id":4741,"file":"/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/sinatra/vcap.rb","lineno":53,"method":"block in registered"}

What can be the problem?

Dieu Cao

unread,

Oct 6, 2014, 1:29:32 PM10/6/14

to vcap...@cloudfoundry.org

This error was added as part of this story [1]

https://www.pivotaltracker.com/story/show/76503676

--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/89900d6a-ea40-4a6b-94f4-eca11183d5cc%40cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

yaco...@gmail.com

unread,

Oct 6, 2014, 3:32:20 PM10/6/14

to vcap...@cloudfoundry.org

Hi.

Thanks for the info, but I don't see why my DEA is not available.
The log on the dea reads:


{"timestamp":1412511189.6162288,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.find.droplet","data":{"droplet":"798d8af4-abf5-4750-b0cd-771aa4891975","include_stats":true,"states":["RUNNING"],"version":"77d72ffe-bf58-497c-be55-bcafede11e0c"}},"thread_id":9128060,"fiber_id":17495840,"process_id":17581,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412511189.7831948,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.find.droplet","data":{"droplet":"798d8af4-abf5-4750-b0cd-771aa4891975","states":["STARTING","RUNNING"],"version":"77d72ffe-bf58-497c-be55-bcafede11e0c"}},"thread_id":9128060,"fiber_id":17495840,"process_id":17581,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412511189.8334136,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.find.droplet","data":{"droplet":"798d8af4-abf5-4750-b0cd-771aa4891975","include_stats":true,"states":["RUNNING"],"version":"77d72ffe-bf58-497c-be55-bcafede11e0c"}},"thread_id":9128060,"fiber_id":17495840,"process_id":17581,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412511487.8479583,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.stop","data":{"droplet":"cb522722-f2d1-4320-9e5a-eacc9ecb1398"}},"thread_id":9128060,"fiber_id":17495840,"process_id":17581,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412511535.7510302,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.stop","data":{"droplet":"cb522722-f2d1-4320-9e5a-eacc9ecb1398"}},"thread_id":9128060,"fiber_id":17495840,"process_id":17581,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412517999.8242369,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.stop","data":{"droplet":"cb522722-f2d1-4320-9e5a-eacc9ecb1398"}},"thread_id":9128060,"fiber_id":17495840,"process_id":17581,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412518018.774033,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.stop","data":{"droplet":"cb522722-f2d1-4320-9e5a-eacc9ecb1398"}},"thread_id":9128060,"fiber_id":17495840,"process_id":17581,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412577416.8380778,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.stop","data":{"droplet":"cb522722-f2d1-4320-9e5a-eacc9ecb1398"}},"thread_id":9128060,"fiber_id":17495840,"process_id":17581,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412623435.7622225,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.stop","data":{"droplet":"cb522722-f2d1-4320-9e5a-eacc9ecb1398"}},"thread_id":9128060,"fiber_id":17495840,"process_id":17581,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}

But Why is this happening? after I restart the dea_next and start a few apps- the dea stops:

{"timestamp":1412623768.9234602,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.stop","data":{"droplet":"798d8af4-abf5-4750-b0cd-771aa4891975"}},"thread_id":16439420,"fiber_id":24806660,"process_id":13843,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412623772.002646,"message":"Saving snapshot took: 0.003s","log_level":"debug","source":"Snapshot","data":{},"thread_id":16439420,"fiber_id":18930820,"process_id":13843,"file":"/var/vcap/packages/dea_next/lib/dea/snapshot.rb","lineno":41,"method":"save"}

בתאריך יום שני, 6 באוקטובר 2014 20:29:32 UTC+3, מאת Dieu Cao:

yaco...@gmail.com

unread,

Oct 6, 2014, 3:49:08 PM10/6/14

to vcap...@cloudfoundry.org, yaco...@gmail.com

And now after restarting the dea again, and restarting some apps I get a "no available stagers" error.
I have enough memory (see below) and the dea is configured as below


Server error, status code: 400, error code: 170001, message: Staging error: no available stagers
newCF@abiell:~$ free -m


             total       used       free     shared    buffers     cached

Mem:          5981       3609       2371          0         37        819
-/+ buffers/cache:       2752       3228
Swap:         5883         16       5867

This is part of the dea.yml file:



intervals:
  heartbeat: 10
  advertise: 5

resources:
  memory_mb: 8000
  memory_overcommit_factor: 1
  disk_mb: 32000
  disk_overcommit_factor: 1

bind_mounts:
- src_path: /var/vcap/packages/buildpack_cache

staging:
  enabled: true
  max_staging_duration: 900
  memory_limit_mb: 1024
  disk_limit_mb: 4096
  cpu_limit_shares: 512
  disk_inode_limit: 200000
  environment:
    BUILDPACK_CACHE: "/var/vcap/packages/buildpack_cache"

dea logs don't contain anything interesting:

{"timestamp":1412624156.8373418,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.find.droplet","data":{"droplet":"798d8af4-abf5-4750-b0cd-771aa4891975","include_stats":true,"states":["RUNNING"],"version":"00688df0-ea77-4a4e-a14b-147fb39e8ffa"}},"thread_id":7678080,"fiber_id":16054200,"process_id":20982,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412624159.0342371,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.find.droplet","data":{"droplet":"798d8af4-abf5-4750-b0cd-771aa4891975","states":["STARTING","RUNNING"],"version":"00688df0-ea77-4a4e-a14b-147fb39e8ffa"}},"thread_id":7678080,"fiber_id":16054200,"process_id":20982,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412624161.0541346,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.find.droplet","data":{"droplet":"798d8af4-abf5-4750-b0cd-771aa4891975","include_stats":true,"states":["RUNNING"],"version":"00688df0-ea77-4a4e-a14b-147fb39e8ffa"}},"thread_id":7678080,"fiber_id":16054200,"process_id":20982,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412624256.62914,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.stop","data":{"droplet":"2fed97ea-8177-4e89-a73b-a84d94043d0f"}},"thread_id":7678080,"fiber_id":16054200,"process_id":20982,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}
{"timestamp":1412624270.8585129,"message":"nats.message.received","log_level":"debug","source":"Dea::Nats","data":{"subject":"dea.stop","data":{"droplet":"2fed97ea-8177-4e89-a73b-a84d94043d0f"}},"thread_id":7678080,"fiber_id":16054200,"process_id":20982,"file":"/var/vcap/packages/dea_next/lib/dea/nats.rb","lineno":148,"method":"handle_incoming_message"}

any ideas? :(

Eric Malm

unread,

Oct 7, 2014, 1:32:09 PM10/7/14

to vcap...@cloudfoundry.org, yaco...@gmail.com

Hi,

Could you please give us some information about the total number of apps you have started on the system and the amount of memory allocated to each one? We see that your DEA is allocated only 8 GB of memory, and the DEA advertises its capacity based on the amount of memory CC allocates to apps, not how much memory is free on the VM. Running `cf apps` in each org and space you have created in your deployment would tell us this information.

Thanks,

Eric and Chris, for the CF Runtime team

yaco...@gmail.com

unread,

Oct 7, 2014, 2:14:47 PM10/7/14

to vcap...@cloudfoundry.org, yaco...@gmail.com

Hi Eric&Chris.

I have 3 apps total running with 1GB max memory each.

Can you please elaborate on how the calculation is done by the DEA and why it decides to stop?

בתאריך יום שלישי, 7 באוקטובר 2014 20:32:09 UTC+3, מאת Eric Malm:

Eric Malm

unread,

Oct 8, 2014, 8:56:29 PM10/8/14

to vcap...@cloudfoundry.org, yaco...@gmail.com

Hi,

That desired state alone should not prevent you from starting or staging new apps. Could you provide some more information about how much memory the DEA thinks it has allocated to app containers? It would be great to see the DEA's instances database, in /var/vcap/data/dea_next/db/instances.json (suitably sanitized to remove credentials and other potentially sensitive information, of course).

It would also be helpful to know how much capacity the DEA is advertising over the NATS message bus. You would be able to find this by subscribing to the 'dea.advertise' and 'staging.advertise' subjects on NATS. A way to do this is to run `bundle exec nats-sub 'dea.advertise' -s nats://(nats-user):(nats-password)@(nats-ip):4222` from some place such as /var/vcap/packages/nats/nats or any other location where the Ruby nats gem is vendored. The user, password, and ip should come from your NATS config (/var/vcap/jobs/nats/config/nats.conf).

The DEA determines its advertised capacity by subtracting the memory it has allocated for its containers from the amount of memory specified in its dea.yml config: https://github.com/cloudfoundry/dea_ng/blob/master/lib/dea/resource_manager.rb#L71-L73

Also, what version of CF are you running?

Thanks,

Eric

yaco...@gmail.com

unread,

Oct 9, 2014, 7:52:42 AM10/9/14

to vcap...@cloudfoundry.org, yaco...@gmail.com

Hi Eric.

instances.json content:

http://pastebin.com/tcB239Yq

As for the dea advertise thing:


[#1] Received on [dea.advertise] : '{"id":"0-f1fa6b2d1273452ba5f6e6df98158db9","stacks":["lucid64"],"available_memory":64,"available_disk":21760,"app_id_to_count":{"31d9d698-7eb3-4647-8a09-0efb443abd41":3,"029a1269-67fe-46e2-85f7-e1b0b5d34193":3,"1c917344-f098-4738-a572-d342e9185b44":1,"798d8af4-abf5-4750-b0cd-771aa4891975":2,"cb522722-f2d1-4320-9e5a-eacc9ecb1398":1},"placement_properties":{"zone":"default"}}'
[#2] Received on [dea.advertise] : '{"id":"0-f1fa6b2d1273452ba5f6e6df98158db9","stacks":["lucid64"],"available_memory":64,"available_disk":21760,"app_id_to_count":{"31d9d698-7eb3-4647-8a09-0efb443abd41":3,"029a1269-67fe-46e2-85f7-e1b0b5d34193":3,"1c917344-f098-4738-a572-d342e9185b44":1,"798d8af4-abf5-4750-b0cd-771aa4891975":2,"cb522722-f2d1-4320-9e5a-eacc9ecb1398":1},"placement_properties":{"zone":"default"}}'

staging advertise:

[#1] Received on [staging.advertise] : '{"id":"0-f1fa6b2d1273452ba5f6e6df98158db9","stacks":["lucid64"],"available_memory":64,"available_disk":21760}'
[#2] Received on [staging.advertise] : '{"id":"0-f1fa6b2d1273452ba5f6e6df98158db9","stacks":["lucid64"],"available_memory":64,"available_disk":21760}'

Now, after I restarted the warden and dea_next and some tomcat app in order to have more fee physical memory:


[#1] Received on [dea.advertise] : '{"id":"0-2547e23c49e84580b35eadda61ba689b","stacks":["lucid64"],"available_memory":576,"available_disk":20736,"app_id_to_count":{"31d9d698-7eb3-4647-8a09-0efb443abd41":5,"029a1269-67fe-46e2-85f7-e1b0b5d34193":3,"1c917344-f098-4738-a572-d342e9185b44":1,"798d8af4-abf5-4750-b0cd-771aa4891975":1,"94c4ae77-bdb6-4d52-9277-e26c73699d67":1},"placement_properties":{"zone":"default"}}'
[#2] Received on [dea.advertise] : '{"id":"0-2547e23c49e84580b35eadda61ba689b","stacks":["lucid64"],"available_memory":576,"available_disk":20736,"app_id_to_count":{"31d9d698-7eb3-4647-8a09-0efb443abd41":5,"029a1269-67fe-46e2-85f7-e1b0b5d34193":3,"1c917344-f098-4738-a572-d342e9185b44":1,"798d8af4-abf5-4750-b0cd-771aa4891975":1,"94c4ae77-bdb6-4d52-9277-e26c73699d67":1},"placement_properties":{"zone":"default"}}'

username@server:/var/vcap/packages/nats/nats$ free -m


             total       used       free     shared    buffers     cached

Mem:          5981       5823        158          0         57        764
-/+ buffers/cache:       5001        979
Swap:         5883         57       5826

At this point, the dea refuses to restart existing apps.

Even if I restart all the CF infrastructure processes (to clear up some memory) the dea still advertises a too small amount of available memory:

cf restart universalbroker
Stopping app universalbroker in org myorg / space myspace as admin...
OK

Starting app universalbroker in org myorg / space myspace as admin...


FAILED
Server error, status code: 503, error code: 150003, message: One or more instances could not be started because of insufficient running resources.


user@server:/var/vcap/packages/nats/nats$ free -m


             total       used       free     shared    buffers     cached

Mem:          5981       4839       1142          0         70       2256
-/+ buffers/cache:       2511       3469
Swap:         5883         19       5864


Listening on [staging.advertise]
[#1] Received on [staging.advertise] : '{"id":"0-a5f30462cb894d3084c1624d23b2627f","stacks":["lucid64"],"available_memory":576,"available_disk":20736}'
[#2] Received on [staging.advertise] : '{"id":"0-a5f30462cb894d3084c1624d23b2627f","stacks":["lucid64"],"available_memory":576,"available_disk":20736}'

At this time, there are only 4 total warden instances running, 2 of them are 256MB java apps, and instances.json file reads:

http://pastebin.com/paZLm3mJ

Now, when I shut these 2 down, the universalbroker app that previously couldn't start up because of insufficient resources, now comes up and the dea publishes:

[#1] Received on [staging.advertise] : '{"id":"0-a5f30462cb894d3084c1624d23b2627f","stacks":["lucid64"],"available_memory":64,"available_disk":21760}'

Now, when I do:

cat /var/vcap/data/dea_next/db/instances.json | grep mem 
        "mem": 256,
        "mem": 256,
        "mem": 256,
        "mem": 1024,
        "mem": 1024,
        "mem": 1024,
        "mem": 1024,
        "mem": 1024,
        "mem": 1024,
        "mem": 1024,

Some of these instances are not running, since only 3 apps are running at the moment.
Can the DEA simply take these in account even though they're not currently running?
Also I see an "overcommitment factor" atribute in the dea config:

resources:
  memory_mb: 8000
  memory_overcommit_factor: 1
  disk_mb: 32000
  disk_overcommit_factor: 1

If I raise it to, like- 10 - will it use 80GB in order to calculate the available memory published?
Does the dea think of an app that can have 1GB memory at maximum like it currently takes up 1GB? (it's logical to assume that it does think provisioning-like calculation, because normally most apps consume less memory than their thresholds).

About the version: AFAIK it's 183.

Many thanks in advance (and also thanks for the info)

בתאריך יום חמישי, 9 באוקטובר 2014 03:56:29 UTC+3, מאת Eric Malm:

Eric Malm

unread,

Oct 9, 2014, 11:02:36 AM10/9/14

to vcap...@cloudfoundry.org, yaco...@gmail.com

Hi,

Fantastic, thanks for the detailed diagnostics! I've used jq to extract the relevant state and memory data from the instances.json file you provided:

○ → cat instances.json | jq '.instances[] | {"warden_handle": .warden_handle, "mem":.limits.mem, "state": .state}'

{

"state": "STOPPING",

"mem": 256,

"warden_handle": "18385577t1j"

}

{

"state": "STOPPING",

"mem": 256,

"warden_handle": "1848o8f0aiq"

}

{

"state": "STOPPING",

"mem": 256,

"warden_handle": "1848o8f0air"

}

{

"state": "STOPPING",

"mem": 1024,

"warden_handle": "1848o8f0aj5"

}

{

"state": "STARTING",

"mem": 1024,

"warden_handle": "18493pc1vre"

}

{

"state": "STOPPING",

"mem": 1024,

"warden_handle": "184doi4mphh"

}

{

"state": "STOPPING",

"mem": 1024,

"warden_handle": "184doi4mpi4"

}

{

"state": "RUNNING",

"mem": 1024,

"warden_handle": "184lcukrbls"

}

{

"state": "RUNNING",

"mem": 1024,

"warden_handle": "184lcukrblt"

}

{

"state": "RUNNING",

"mem": 1024,

"warden_handle": "184lcukrblu"

}

This is showing snapshots of your 3 running instances, but also several snapshots corresponding to containers in STOPPING or STARTING states. Together, they account for 7*1024 + 3*256= 7936 MB of memory, so your DEA thinks it has only 64 MB left to allocate (hence the 64 MB in its advertisements). What's probably happening is that the warden containers for those other instances no longer exist, but when the DEA process restarts it fails to clear those out of its instances.json file and then thinks those containers are still present. If you check in /var/vcap/data/warden/depot, I expect you'll see directories only for the containers that are in the RUNNING state. We definitely saw some issues with these wedged container states around v183, and we've introduced a few fixes since then to correct them (primarily https://github.com/cloudfoundry/dea_ng/commit/9d85f4fcb1db3b4277901675bf0f86e076063b72).

If you don't care about keeping those particular app instances, you should be able to stop warden and the DEA, delete the instances.json file, and restart warden and the DEA. All the containers will be gone, but CC and HM9000 will eventually restart them.

Please let me know if that works, or if you encounter other issues.

Thanks,

Eric

...

yaco...@gmail.com

unread,

Oct 9, 2014, 12:24:00 PM10/9/14

to vcap...@cloudfoundry.org, yaco...@gmail.com

Thanks a lot Eric! 

That indeed did the trick, and now I can push new apps, and the dea publishes:

[#1] Received on [staging.advertise] : '{"id":"0-4f3e3a8adcf44878ab8d82eada058cf6","stacks"["lucid64"],"available_memory":3904,"available_disk":26880}'
[#2] Received on [staging.advertise] : '{"id":"0-4f3e3a8adcf44878ab8d82eada058cf6","stacks":["lucid64"],"available_memory":3904,"available_disk":26880}'


Now- I pushed some more apps, and now I've got:

{
  "warden_handle": "184vt2go7r7",
  "mem": 1024,
  "state": "RUNNING"
}
{
  "warden_handle": "184vt2go7r8",
  "mem": 1024,
  "state": "RUNNING"
}
{
  "warden_handle": "184vt2go7r9",
  "mem": 1024,
  "state": "RUNNING"
}
{
  "warden_handle": "184vt2go7ra",
  "mem": 1024,
  "state": "RUNNING"
}
{
  "warden_handle": "184vt2go7rl",
  "mem": 1024,
  "state": "RUNNING"
}
{
  "warden_handle": "184vt2go7rn",
  "mem": 1024,
  "state": "CRASHED"
}
{
  "warden_handle": "184vt2go7rq",
  "mem": 1024,
  "state": "RUNNING"
}
{
  "warden_handle": "184vt2go7rr",
  "mem": 1024,
  "state": "RUNNING"
}

free -m
             total       used       free     shared    buffers     cached

Mem:          5981       5643        337          0        109       2657
-/+ buffers/cache:       2876       3104


And the dea advertises: 
[#1] Received on [staging.advertise] : '{"id":"0-4f3e3a8adcf44878ab8d82eada058cf6","stacks":["lucid64"],"available_memory":832,"available_disk":23808}'


1) It doesn't look like the dea gets rid of the CRASHED instance, even though I deleted the app from CF.
2) Does it still count as taking memory? 
3) What's this memory overcomittment factor? can I increase it to over-provision apps? Or should I simply replace the memory threshold (which is 8) in dea.yml to something bigger? 
4) If I have a simple hello-world type node.js app that consumes:

#0   running   2014-10-09 05:51:12 PM   0.0%   8.8M of 1G   26M of 1G   
#1   running   2014-10-09 06:00:29 PM   0.0%   8.1M of 1G   26M of 1G   


How much memory does it take from the dea total memory threshold? 2000 or < 20? 

5) I guess it takes 2GB, because I now have 8 total apps in the instances.json, and each is 1GB and when I try to push I get:


Server error, status code: 400, error code: 170001, message: Staging error: no available stagers

When I deleted all the small hello-world type node.js apps I now get:
[#1] Received on [staging.advertise] : '{"id":"0-4f3e3a8adcf44878ab8d82eada058cf6","stacks":["lucid64"],"available_memory":3904,"available_disk":26880}'

And I can push new apps.

Eric Malm

unread,

Oct 10, 2014, 11:51:46 AM10/10/14

to vcap...@cloudfoundry.org, yaco...@gmail.com

Responses inline!

On Thursday, October 9, 2014 9:24:00 AM UTC-7, yaco...@gmail.com wrote:

1) It doesn't look like the dea gets rid of the CRASHED instance, even though I deleted the app from CF.

2) Does it still count as taking memory?

Right, the DEA will keep crashed containers around for a period of time, controlled by the 'crash_lifetime_secs' config option, which defaults to 1 hour. The DEA won't count them towards its memory usage, but it still counts them as using disk until the crashed container is reaped: https://github.com/cloudfoundry/dea_ng/blob/bbddaa1fc6c671797b168e377fa6046a3000ff8d/lib/dea/starting/instance.rb#L314-L331

3) What's this memory overcomittment factor? can I increase it to over-provision apps? Or should I simply replace the memory threshold (which is 8) in dea.yml to something bigger?

The DEA multiplies its memory_mb by that overcommitment factor when it does its internal accounting, so you can use it to make the DEA think it has more memory than it does and thus over-provision apps. Changing that factor would be equivalent to only increasing the memory_mb to equal memory_mb * overcommit, but I suppose a non-unity overcommit factor is a convenient reminder that you're lying to the DEAs. :) We don't recommend overcommitting in production systems, but it can be convenient for test environments running on less capacious hardware.

4) If I have a simple hello-world type node.js app that consumes: #0 running 2014-10-09 05:51:12 PM 0.0% 8.8M of 1G 26M of 1G #1 running 2014-10-09 06:00:29 PM 0.0% 8.1M of 1G 26M of 1G How much memory does it take from the dea total memory threshold? 2000 or < 20? 5) I guess it takes 2GB

Yep, from the DEA's perspective, those app instances will consume 2 GB of its memory pool.

Best,

Eric

yaco...@gmail.com

unread,

Oct 10, 2014, 12:24:22 PM10/10/14

to vcap...@cloudfoundry.org, yaco...@gmail.com

Yep, from the DEA's perspective, those app instances will consume 2 GB of its memory pool.

It means the CF users need to be responsible and specify very carefully the memory requirements of their apps, or they'll badly effect the entire "echosystem" unless I fine tune the quotas of each org.

Anyway, Thanks a ton! Not only you solved the problem but I also learned a lot about CF which is even better!

Denilson Nastacio

unread,

Oct 25, 2014, 1:27:41 PM10/25/14

to vcap...@cloudfoundry.org, yaco...@gmail.com

For whatever it is worth, we hit issues when we migrated from CF 171 to CF 180 build where after a restart of a DEA (like you would have during an upgrade) , the DEA kept on reporting its pre-restart memory situation.
This was pretty consistent across four different environments. The only solution was to:

1. monit stop all
2. wait until jobs were stopped
3. rm -rf /var/vcap/data/warden/depot/*
4. rm -rf /var/vcap/data/dea_next/db/instances.json
5. reboot -f (yes, OS-level reboot)

I know "monit stop all" does not perform an orderly drain of all apps, but we had little option at the time in terms of applying orderly draining for all DEAs.

When we migrated from CF180 to CF189 the problem completely went away.

Reply all

Reply to author

Forward