Determining Docker Swarm 1.12.1 Utilization

731 views
Skip to first unread message

Steve Poe

unread,
Jan 9, 2017, 4:00:02 PM1/9/17
to docker-dev
In earlier versions of Docker Swarm, I could see the total CPUs and RAM and (I think) the utilization. With the exception of running docker stats on each node, is there a simple way on a Docker Swarm master, I can see the total CPUs and RAM available and the environment's utilization?

sebastiaa...@docker.com

unread,
Jan 9, 2017, 6:31:13 PM1/9/17
to Steve Poe, docker-dev
Hi Steve,

No, contrary to "standalone swarm" (https://github.com/docker/swarm), "swarm mode" (i.e., the integrated swarm features introduced in docker 1.12), currently does not have a stats command to collect stats for all nodes.

There's a couple of reasons for this. First of all, Standalone Swarm and Swarm "mode" use different concepts. Standalone Swarm works by directly communicating with each node's remote API. Doing so allows it to control the daemons on those nodes, and perform any action you'd be able to perform on a conventional, single node Docker installation; including collecting stats.

While this concept works great; it doesn't scale well; on, say, a 1000 node swarm, the manager has to communicate with, and manage 1000 daemons.

For that reason (and many others), the Swarm features introduced in docker 1.12 were redesigned from scratch (through the SwarmKit project). Instead of controlling nodes directly, SwarmKit uses a distributed ("Raft") store that holds the "desired", and actual state of services. Each worker node runs an agent that is responsible for executing the tasks it gets assigned; and all communication goes through the distributed store.

In that model, there's no direct connection between the manager and each node's remote API, hence, no way to get stats. (Well, it would probably _technically_ be possible to have agents collect the stats, and store them in the distributed store)

While we're not there yet; work has started on providing metrics through a dedicated API, using the Prometheus (https://prometheus.io/) standard. An initial implementation (limited to low level metrics from the daemon itself) will be available in the upcoming 1.13 release as an "experimental" feature, and the intent is to expand on this in future to provide more targeted metrics per application / service / container.

We're looking for feedback on this topic, so feel free to add your use case on the roadmap;

More information can also be found in this presentation; https://m.youtube.com/watch?v=McKORo4ZgVI

Please note that if you're currently running Standalone Swarm, nothing should change; you can still run Standalone Swarm as before on Docker 1.12 or 1.13; Docker 1.12 and up are fully backward compatible with older versions, and "swarm mode" is an optional feature.

I hope this answers your questions, but happy to provide more information if needed!

cheers,

Sebastiaan van Stijn
"thaJeztah" on GitHub



On 9 Jan 2017, at 22:00, Steve Poe <stev...@gmail.com> wrote:

In earlier versions of Docker Swarm, I could see the total CPUs and RAM and (I think) the utilization. With the exception of running docker stats on each node, is there a simple way on a Docker Swarm master, I can see the total CPUs and RAM available and the environment's utilization?

--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Steve Poe

unread,
Jan 10, 2017, 1:46:29 AM1/10/17
to sebastiaa...@docker.com, docker-dev
Hi Sebastiaan,

Thank you for your time and response to my question. I never really though of Standalone Swarm and Swarm "mode" using different concepts before. :-)  

For what is worth, I am testing a Swarm mode 1.12.1 in small environment (20 nodes) with a mixture of bare metal and VMs. I’f the agent on the node determines if the task should run on that node, i’d like to be able to control with a constraint (e.g. —cpu-utilization=30) while you cannot pass > or < values (I think) having the agent control with a constraint to deploy only if/when cpu utilization is less than 30%.  

If I use Standalone Swarm, then I cannot use “docker create service” and other features, right? I love how Swarm mode automatically secures connection between my nodes. I don’t want to go back. :-)

In my environment, I will have several masters that will be used to login to by team. I want them to know what the current usage is at the command line (or a why to add the status to the MOTD when they login) since they have in mind what they want to accomplish. I don’t want them to have to stare a graphs, but a calculated CPU//RAM use percentage may help them know if the need to wait before they run their service(s). Or, As a fellow administrator/architect, I may look towards a histogram of utilization for my container environments. 

I will look over the roadmap and presentation. If I feel like my use case hasn’t been shared of issues, I will add mine.  

Best regards,

Steve

Thiago Donizetti Corredor

unread,
Jan 10, 2017, 4:36:42 AM1/10/17
to Steve Poe, sebastiaa...@docker.com, docker-dev
Hi all,

Is anyone using Docker running on VMWare platform?

Thanks

Sebastiaan van Stijn

unread,
Jan 10, 2017, 7:09:15 AM1/10/17
to Steve Poe, docker-dev
Hi Steve,

i’d like to be able to control with a constraint (e.g. —cpu-utilization=30) while you cannot pass > or < values (I think) having the agent control with a constraint to deploy only if/when cpu utilization is less than 30%.  

In a way, you can; when creating a service you can specify both a “reservation” and a “limit” for resource utilization. For example;

     docker service create --reserve-cpu=0.5 --limit-cpu=1.0 --name myservice nginx:alpine


Creates a service that will only be deployed on nodes that have at least 0.5 CPU available that is not reserved (this
may not be what’s _actually_ in use at the moment). The tasks of that service are allowed to consume up to 1 CPU each
when available. Basically, you’re telling Docker what the minimum requirements are for your service (well, the tasks
_backing_ the service) to run, and the amount of resources they are _allowed_ to use.

It’s highly advisable to always set limits on your services (and containers if you’re using “docker run”); by
default, there’s no restriction on the amount of resources containers can consume, so even though they are
protected in what capabilities they have, they can still cause a node to run out of resources.




If I use Standalone Swarm, then I cannot use “docker create service” and other features, right?

Correct. Also standalone swarm requires you to configure an external k/v store, which is not needed with Swarm “mode”


I love how Swarm mode automatically secures connection between my nodes. I don’t want to go back. :-)


It’s definitely a lot easier in use than standalone Swarm.

Note that while the “control plane” (i.e. the API that’s used between managers and workers in the swarm) is
protected, the “data plane” for overlay networks between nodes is not encrypted by default. If overlay
networking between nodes passes untrusted networks, you can enable encryption using the “--opt encrypted”
option when creating the overlay network;


If encryption is enabled, docker uses IPSEC tunnels for overlay networks. There _is_ a performance
penalty to this, so only enable if needed.


I will look over the roadmap and presentation. If I feel like my use case hasn’t been shared of issues, I will add mine.  

Thanks, that’s appreciated! Estimating “how” people want to use features is difficult. A description
of actual use cases greatly helps to get the design “right”.


- Sebastiaan

Steve Poe

unread,
Jan 10, 2017, 4:20:43 PM1/10/17
to docker-dev
Hi Sebastiaan,

In a way, you can; when creating a service you can specify both a “reservation” and a “limit” for resource utilization. For example;

     docker service create --reserve-cpu=0.5 --limit-cpu=1.0 --name myservice nginx:alpine

I like that idea. I rather not force my team to think about assigning cpu limits unless they have minimums they need to establish. Can I force a default reservation of --reserve-cpu=0.5? If I can, do I need to touch each node in my swarm or just each swarm master?

It’s highly advisable to always set limits on your services (and containers if you’re using “docker run”); by
default, there’s no restriction on the amount of resources containers can consume, so even though they are
protected in what capabilities they have, they can still cause a node to run out of resources.

This is the issue I want to manage before I open the flood gates to use the swarm environment.

Also see https://docs.docker.com/engine/swarm/services/#/reserving-memory-or-number-of-cpus-for-a-service

The link you posted here did not discuss
--limit-cpu parameter. Is valid in 1.12?


Note that while the “control plane” (i.e. the API that’s used between managers and workers in the swarm) is
protected, the “data plane” for overlay networks between nodes is not encrypted by default. If overlay
networking between nodes passes untrusted networks, you can enable encryption using the “--opt encrypted”
option when creating the overlay network;


If encryption is enabled, docker uses IPSEC tunnels for overlay networks. There _is_ a performance
penalty to this, so only enable if needed.

I have experimented with the encrypted network. I like it and will have a case to use it where business needs require it. I expected the performance penalty to be the cost of the business requirements. Has any documented what the performance penalty hit percentage is and/or how to minimize the cost (e.g. bonding network interfaces)? Again, I imagine each use case can very, but implementing containers inside virtual environments is common utilization.

I will look over the roadmap and presentation. If I feel like my use case hasn’t been shared of issues, I will add mine.  

Thanks, that’s appreciated! Estimating “how” people want to use features is difficult. A description
of actual use cases greatly helps to get the design “right”.

In my roadmap for usage, I see for my team wanting to know who launched what containers/services. Whether done by CI system or user who login to a master node to initiate a service. For example, if I ran 'docker service create --replicas 17 --name folding jordan0day/folding-at-home', I would should be able to run 'docker service inspect folding' to see what person or CI job initiated this request.

I realize I am now getting off-topic from the initial question, so I'll consider your next response as "closed" if you have any feedback.

Thanks again.

Steve

Jérôme Petazzoni

unread,
Jan 11, 2017, 12:33:16 PM1/11/17
to Thiago Donizetti Corredor, Steve Poe, sebastiaa...@docker.com, docker-dev
Hi Thiago,

Yes, many people do!

However, I would advise that you ask again on either:
- the docker user mailing-list,
- the docker forums,
- StackOverflow,
- the Docker community Slack channels.

You can get more information here:

This mailing list is mostly for people involved in the development of Docker itself (e.g. "I'm trying to add a new command to the Docker CLI so that I can do "docker coffee" and Docker would then caffeinate my container; I tried to modify vendors/philz/coffee.go but I'm getting the following compilation error on ARM platforms: ...").

Thank you!

--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages