docker daemon - single point of failure

543 views
Skip to first unread message

Mudit Verma

unread,
Mar 18, 2015, 5:22:51 AM3/18/15
to docke...@googlegroups.com
Hi All, 

We wish to use docker. However, I have some concerns about the docker daemon, it being single point of failure. 

In a simple test, I made Docker run a few containers having simple counter programs and then I killed (kill -9) the docker daemon. This is what I observed

1. Docker daemon comes back up (it uses upstart?) 
2. Containers also get killed & restart.
3. Dockerised Counter starts all over again after restart (starts a fresh). 


My doubts in this regard are:

1. Abrupt fault/crash at daemon is going to kill all the running containers and if they do not manage (persist) their state, their config/state might get corrupted or they might show faulty behaviour.
2. Why can't we isolate the running containers from daemon process?  Why this was not considered while designing docker eco system?  
3. This might make debugging difficult for docker application, wherein the actual fault might lie beyond their scope (daemon). 

In their any work being done to address this? 

looking forward for your inputs. 

Thanks
Mudit

Michael Crosby

unread,
Mar 18, 2015, 2:11:00 PM3/18/15
to Mudit Verma, docke...@googlegroups.com
Hey,

This is true and there is work going on to help make this better but if you are really concerned about having a single point of failure and you are running your container's on a single machine, you are doomed anyways.  It does depend on your datastore but you still need to distribute your app across multiple machines if you want to remove a single point of failure in your apps.  

Let me know what type of tech you use and maybe make some suggestions.

--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

-----------------------------------------
Michael Crosby
@crosbymichael

Greg Olszewski

unread,
Mar 18, 2015, 7:03:21 PM3/18/15
to michael...@gmail.com, docke...@googlegroups.com

On 03/18/2015 11:10 AM, Michael Crosby wrote:
Hey,

This is true and there is work going on to help make this better but if you are really concerned about having a single point of failure and you are running your container's on a single machine, you are doomed anyways.  It does depend on your datastore but you still need to distribute your app across multiple machines if you want to remove a single point of failure in your apps.  

Let me know what type of tech you use and maybe make some suggestions.


Hi,

This seems a rather harsh and opaque analysis. While there are usually SPOF stemming from running software on a single physical host, generally, few of them live in software, and it is prudent to consider new ones prior to introduction into one's system.

Could you elaborate on "working going on to help make this better?"

Kindly,
Greg

James Mills

unread,
Mar 18, 2015, 7:07:27 PM3/18/15
to docker-dev

On Thu, Mar 19, 2015 at 9:03 AM, Greg Olszewski <no...@trap.mtview.ca.us> wrote:
This seems a rather harsh and opaque analysis. While there are usually SPOF stemming from running software on a single physical host, generally, few of them live in software, and it is prudent to consider new ones prior to introduction into one's system.

What I don't understand about all this
(and we've seen this topic several time snow)
is that the SPOF argument can be said
about any init system really or even the kernel.

So I don't understand what the underlying issue/fear is here.

cheers

Michael Crosby

unread,
Mar 18, 2015, 7:13:06 PM3/18/15
to Greg Olszewski, michael...@gmail.com, docke...@googlegroups.com
Decoupling the daemon process from the container supervisor so that it can be upgraded without killing the containers.  The problem is reconnecting to stdio of the container after the daemon goes down so this would probably be solved by not having the daemon as the direct parent of the container's.

Also, Greg, sorry if you thought that was harsh.  I thought it's pretty good advice that anyone should give someone who cares enough to bring up the subject of SPOF as hardware and networking still fail, even today, which is not related to software.  You still need to protect against. 

Greg Olszewski

unread,
Mar 18, 2015, 7:19:50 PM3/18/15
to Michael Crosby, michael...@gmail.com, docke...@googlegroups.com
Hi Michael,


On 03/18/2015 04:12 PM, Michael Crosby wrote:
Decoupling the daemon process from the container supervisor so that it can be upgraded without killing the containers.  The problem is reconnecting to stdio of the container after the daemon goes down so this would probably be solved by not having the daemon as the direct parent of the container's.


Great, glad to hear it, I've had concerns along this line.



Also, Greg, sorry if you thought that was harsh.  I thought it's pretty good advice that anyone should give someone who cares enough to bring up the subject of SPOF as hardware and networking still fail, even today, which is not related to software.  You still need to protect against. 


The harsh was only meant in context. I saw a reasonable question which seemed to be brushed off the table by more compelling concerns, which I thought might or might not be. With regards to the "good advice" I generally agree.


Regards,

Mudit Verma

unread,
Mar 31, 2015, 8:32:29 AM3/31/15
to docke...@googlegroups.com, crosby....@gmail.com, michael...@gmail.com
Thanks everyone for their valuable comments. 

How about splitting daemon into multiple parts, wherin only basic minimal (just like init) remains as the parent of all running containers, and rest of the stuff move to other processes. 

Just to understand it better what all does a daemon do?  using pstree i could see deamon do have many child processes for daemon itself (in blue). What are that for?  Also, if there is any link/documentation, that do actually describe the design and architecture of daemon and its different components?


docker,2806,2806 -d

  |-supervisord,4027,4027 /usr/bin/supervisord

  |   |-cron,4055,4055 -f

  |   `-mysqld,4056,4056 --no-defaults --datadir=/mysql/datadir --tmpdir=/mysql/tmp/ --lc-messages-dir=/usr/share/mysql --character-sets-dir=/usr/share/mysql/charsets --sock=/mysql/mysql.sock --pid-file=/mysql/mysql.pid...

  |       |-{mysqld},4087,4056

  |       |-{mysqld},4088,4056

  |       |-{mysqld},4089,4056

  |       |-{mysqld},4090,4056

  |       |-{mysqld},4091,4056

  |       |-{mysqld},4092,4056

  |       |-{mysqld},4093,4056

  |       |-{mysqld},4094,4056

  |       |-{mysqld},4095,4056

  |       |-{mysqld},4096,4056

  |       |-{mysqld},4098,4056

  |       |-{mysqld},4099,4056

  |       |-{mysqld},4100,4056

  |       |-{mysqld},4101,4056

  |       `-{mysqld},4102,4056

  |-{docker},2816,2806

  |-{docker},2817,2806

  |-{docker},2818,2806

  |-{docker},2819,2806

  |-{docker},2820,2806

  |-{docker},2821,2806

  |-{docker},2822,2806

  |-{docker},3808,2806

  |-{docker},4025,2806

  |-{docker},4030,2806

  `-{docker},4038,2806 

Thanks
Mudit

Mudit Verma

unread,
Jun 15, 2015, 9:05:46 AM6/15/15
to docke...@googlegroups.com, michael...@gmail.com, crosby....@gmail.com

Hi All, 

We have put out a  proposal for docker SPoF and hot upgrades issue. 


Please take a look. 

Thanks
Mudit


On Thursday, March 19, 2015 at 4:49:50 AM UTC+5:30, greg olszewski wrote:
Reply all
Reply to author
Forward
0 new messages