Hi,
It's from a lot of time that I have these questions and possible ideas
and I'd like to share them and get some feedback. I searched for issues
on flannel and kubernetes but I didn't found something really similar to
this.
Basically, as flannel or kubernetes can/will be used to deal with
containers that have some persistent data (typical example: a database)
and this data should be accessed only by a container at a time (to avoid
data corruption), there's the need to avoid starting multiple containers
accessing the same data.
One of the possible examples, to keep all simple and clean, is a coreos
cluster with all nodes seeing the same data (ceph rbd, cephfs, glusterfs
etc...), flannel that starts a database container on node A mounting its
data directory from the shared storage. Then node A dies/panics/lose
network connection etc..., at the moment flannel, will start the
container on another machine without any check to verify that the
previous container is really down. This is a typical split brain event.
At the moment there are various ways to avoid this. For example just
avoid shared storage and tie the container to a single machine. Then, to
get an high availability, use the database replication features and
other containers tied on other nodes as replicas, and, as a plus,
automate the master/slaves elections with various tools (for example
redis-sentinel).
Another solution, will be the make the "cluster" manager, able to really
isolate the "failed" node hosting the containers before starting another
one on a new node. This is the basic work of a "classic" ha clusters:
fencing (the ability to isolate a node).
Now I have various doubts and questions:
*) Is this something that has a sense in a "container" world where
containers can have persistent data (and become stateful). Or are there better
ways to handle this?
*) If so, where should it be implemented?
For examples on a coreos cluster there can be two, semi independent,
cluster managers: flannel and kubernetes (semi independent as kubernetes
can be launched by flannel) with different logic and requirements that
will clash if both implements a sort of fencing.
My initial idea will be to implement a "fencing" service that fences
nodes (in multiple possible ways) when it detects them unreachable and
an api that cluster managers like flannel and kubernetes can use to know
the node/minion state before initiating other operations (like starting
a container on another node).
Now there can be other and better ideas, solutions and probably a lot of
problems should be addressed but, for the moment, I want to stop here to
to hear your thoughts.
FYI, I started writing (some months ago) a basic fencing library that
you can find here (
https://github.com/go-fence/fence) just with the idea
to use it for these needs.
Thanks!
Simone