Container Migration

356 views
Skip to first unread message

Wenqi Cao

unread,
Jun 8, 2016, 2:19:56 PM6/8/16
to kubernetes-dev
I am wondering why kubernetes does't provide one approach for online migrating containers. Even in the latest proposal, kubernetes talks about it's going to create one rescheduler, which terminates a pod that is managed by a controller, and the controller will create a replacement pod that is then scheduled by the pod's scheduler. This approach works for stateless applications, since it simply shutdowns old ones and creates new ones. However, what is the solution for stateful application? 
Moreover, how do you think the following statement

"Stateful things scale vertically, stateless things scale horizontally", is this the idea that Googlers are using for designing kubernets?


Thanks

David Aronchick

unread,
Jun 8, 2016, 2:37:08 PM6/8/16
to Wenqi Cao, kubernetes-dev
What you are referring to is frequently called "improved stateful application support" (codename: petset).


To be clear, it is a best practice NOT to make any changes and/or use local disk and/or care about the exact size and shape of your container/node. Ideally, you'd mount in any persistent storage from externally (e.g. NFS store, external database, etc). By breaking up your components in this way, you'll likely achieve better utilization and more fault tolerant infrastructure.

--
You received this message because you are subscribed to the Google Groups "kubernetes-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To post to this group, send email to kuberne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/28fe04c4-06df-49a3-ab68-ab422c183098%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim Hockin

unread,
Jun 8, 2016, 2:50:43 PM6/8/16
to David Aronchick, Wenqi Cao, kubernetes-dev
There are 2 answers. PetSet is coming, and that will make clustered
apps like zookeeper more happy. But Simple PersistentVolumes also
work well for stateful apps - it decouples the lifetimes of the state
and the containers.
> https://groups.google.com/d/msgid/kubernetes-dev/CADSfKX%3Dini_Yp2DtT6Txkt%3DAxv2__DGUpj7R2bEXwA2bhrjxsA%40mail.gmail.com.

David Oppenheimer

unread,
Jun 8, 2016, 4:29:20 PM6/8/16
to Tim Hockin, David Aronchick, Wenqi Cao, kubernetes-dev
"Container migration" can mean a lot of things -- the app doesn't know it is killed, or the data is transparently migrated (if local) and reattached, or the IP address stays the same when you move to a new node, etc.

AFAIK there is no "true" container migration solution available today, in Kubernetes or any other system, that is a full solution (i.e. totally transparent to the application) analogous to GCE Live Migration. But you can address pieces of it, for example as David and Tim said by using a networked Persistent Volume.
 

Wenqi Cao

unread,
Jun 8, 2016, 4:34:28 PM6/8/16
to kubernetes-dev, tho...@google.com, aron...@google.com, wenq...@gmail.com
Actually what I am thinking is how to migrate data stored in memory from one container to another. If everything is stored in disk, just simply migrating container from A place to B is ok, and transparent for application level. However, what if data in memory also needs to migrate. how to solve this?

Thanks

David Oppenheimer

unread,
Jun 8, 2016, 5:10:42 PM6/8/16
to Wenqi Cao, kubernetes-dev, Tim Hockin, David Aronchick
On Wed, Jun 8, 2016 at 1:34 PM, Wenqi Cao <wenq...@gmail.com> wrote:
Actually what I am thinking is how to migrate data stored in memory from one container to another. If everything is stored in disk, just simply migrating container from A place to B is ok, and transparent for application level. However, what if data in memory also needs to migrate. how to solve this?

Yeah, that's exactly the kind of thing I was talking about when I said  "there is no "true" container migration solution available today, in Kubernetes or any other system, that is a full solution (i.e. totally transparent to the application) analogous to GCE Live Migration"

If you do a web search for [container live migration] you can find information about some of the experimental work in this space, e.g. CRIU.


Wenqi Cao

unread,
Jun 8, 2016, 5:39:08 PM6/8/16
to kubernetes-dev, wenq...@gmail.com, tho...@google.com, aron...@google.com
Ok. I got. Thanks for all your answers.

Tim Hockin

unread,
Jun 8, 2016, 6:27:18 PM6/8/16
to David Oppenheimer, Wenqi Cao, kubernetes-dev, David Aronchick
There's an opportunity for "warm" migration, but it requires careful
application design and orchestration. Specifically, you can use a
tmpfs and store all of your "durable" state there (and maybe mmap() it
into your app). An orchestrator (kube does *not* do this) could
bulk-copy that data to the destination. Your app would bounce when it
moves but you could save the important state.

I don't know anyone who is doing that.

David Aronchick

unread,
Jun 8, 2016, 6:28:52 PM6/8/16
to Tim Hockin, David Oppenheimer, Wenqi Cao, kubernetes-dev
Can you say more about your requirements? Are you trying to mimic live VM migration (e.g. on Google Compute Platform or with VMWare)? 

Wenqi Cao

unread,
Jun 8, 2016, 6:32:27 PM6/8/16
to kubernetes-dev, davi...@google.com, wenq...@gmail.com, aron...@google.com
There are some pretty mature technique and algorithm for live migration, but in virtual machine level. For example, pre-copy and post-copy. Are you planning to merge them into Kubernetes?

Tim Hockin

unread,
Jun 8, 2016, 6:58:09 PM6/8/16
to Wenqi Cao, kubernetes-dev, David Oppenheimer, David Aronchick
On Wed, Jun 8, 2016 at 3:32 PM, Wenqi Cao <wenq...@gmail.com> wrote:
> There are some pretty mature technique and algorithm for live migration, but
> in virtual machine level. For example, pre-copy and post-copy. Are you
> planning to merge them into Kubernetes?

In short, no. At least, not any time soon. Live migration of
arbitrary running code is a *very* hard problem and not something
we're currently aiming at. It would bring with it a lot of changes to
Kubernetes that would (in my opinion) move is the wrong direction. If
someone can eventually demonstrate a 95% success rate for
live-migrating arbitrary user code, we can talk about the requirements
on the core system.

I'd much rather approach it from base requirements and understand what
problems people are really trying to solve.
> https://groups.google.com/d/msgid/kubernetes-dev/17bb4b18-8daf-4194-ba5b-2265daabd86d%40googlegroups.com.

Jeremy Ong

unread,
Jun 8, 2016, 7:02:35 PM6/8/16
to Tim Hockin, Wenqi Cao, kubernetes-dev, David Oppenheimer, David Aronchick
You could create a ram disk approach and change the malloc/calloc/free/new/delete implementation to DMA to the ramdisk instead using the mmap as suggested earlier. The ramdisk can persist after the container restarts and you'd effectively have a container hotswap with persistent RAM. This is something I've been considering for our own internal deploys, but of course you need to work out compatibility issues if struct/class definitions change or even if the compiler changes (things may not be memory aligned or padded the same way). Optimization level changes to the build may affect it too. The approach being considered is to use a ramdisk only for things supported by a reflection/serialization library to take care of compatibility issues.


For more options, visit https://groups.google.com/d/optout.



--
Jeremy Ong
PlexChat CTO

David Aronchick

unread,
Jun 8, 2016, 7:23:30 PM6/8/16
to Tim Hockin, Wenqi Cao, kubernetes-dev, David Oppenheimer

On Wed, Jun 8, 2016 at 3:57 PM, Tim Hockin <tho...@google.com> wrote:
I'd much rather approach it from base requirements and understand what
problems people are really trying to solve.

100% agree - Wenqi, can you help us understand what you're trying to do?
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages