Documenting container lifecycles

180 views
Skip to first unread message

W. Trevor King

unread,
Sep 14, 2015, 11:49:20 PM9/14/15
to d...@opencontainers.org
The current lifecycle docs are pretty sparse [1]. As part of my
attempt to get consistent file naming [2], I've fleshed them out a bit
to get something like:

A typical lifecyle progresses like this:

1. There is no container or running application
2. A user tells the runtime to create a container
3. The runtime creates the container
4. A user tells the runtime to start an application
5. The runtime executes any pre-start hooks
6. The runtime executes the application
7. The application is running
8. A user tells the runtime to stop
9. The runtime sends a termination signal to the application
10. The application exits
11. The runtime executes any post-stop hooks
12. A user tells the runtime to destroy the container
13. The runtime removes the container

With steps 8 and 9, the user is explicitly stopping the application
(via the runtime), but it's also possible that the application could
exit for other reasons. In that case we skip directly from 7 to 10.

## Create

Create the container: file system, namespaces, cgroups,
capabilities. The invoked process continues running for the life of
the container. This is *not* the process configured in config.json,
its just the runtime holding the container open (e.g. preserving the
PID namespace). On Linux, this is the process whose state is logged
to /run/opencontainer/containers.

## Start (process)

Run the pre-start hooks and launch a process in a container. Can be
invoked several times. This is the process configured in
config.json.

On Linux hosts, some information for this execution may come from
outside the `config.json` and `runtime.json` specifications. See
the Linux-specific notes for details [this is the "additional file
descriptor" stuff that landed in specs#113].

## Stop (process)

Send a termination signal to the application process (can optionally
send other signals to the application process, e.g. a kill signal).
After the process exits, run the post-stop hooks. If the process
does not exit after a suitable time, exit with an error without
running the post-stop hooks.

## Destroy

Remove the container: unmount file systems, remove namespaces, etc.
This is the inverse of create. This should include terminating the
container-process launched during the ‘create’ step.

The lifecyle I'm proposing here is along the lines of the
container.json / application.json proposal I spun off of a comment by
Julz [3]. That hasn't received any push-back on the list, but my
impression from discussions on issues and in meetings is that it's not
universally popular. But I'm less concerned about getting *this*
lifecycle documentation landed in the spec than I am about getting
*any* lifecycle documentation landed in the spec ;).

So how do these notes sound? What do I have to change to get them
landed? Or who wants to pick up an alternative lifecycle and document
that?

Cheers,
Trevor

[1]: https://github.com/opencontainers/specs/blob/dca1dfdd92405129fba7ba0c91cfade2dae1d615/runtime.md#lifecycle
[2]: https://github.com/opencontainers/specs/pull/126
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/kPWu-ZavQjs
Message-ID: <20150826204...@odin.tremily.us>

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
signature.asc

David Walter

unread,
Sep 15, 2015, 12:42:27 AM9/15/15
to W. Trevor King, d...@opencontainers.org
systemd may not be universally popular, but are it's process life cycle stages inclusive of the goals for containers? 

Is it a sensible starting point, for example, would the unit file attributes provide a basis for a discussion?


I don't know that a first cut would need this many properties or not. maybe a subset?


To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@opencontainers.org.

David Liang

unread,
Sep 15, 2015, 6:20:03 AM9/15/15
to dev
One question about the 'start' can be invoked several times:
Should a runtime check the process status before calling 'start' to make sure the process is unique?
Or just leave it to the process to handle if multiply `processes` could be running at the same time.

On Tuesday, September 15, 2015 at 11:49:20 AM UTC+8, W. Trevor King wrote:
The current lifecycle docs are pretty sparse [1].  As part of my
attempt to get consistent file naming [2], I've fleshed them out a bit
to get something like:
...

Glyn Normington

unread,
Sep 15, 2015, 6:55:13 AM9/15/15
to dev
+1 to fuller lifecycle docs, preferably with a state transition diagram which would make it easier to spot omissions. (I'm happy draw it once a draft lifecycle doc is agreed.)

I'd prefer to keep these docs self-contained and well-defined, even if they do borrow from systemd terminology.

I like Trevor's starting point below. (But haven't gone over it in detail in case it gets thrown out immediately.)

Mrunal Patel

unread,
Sep 15, 2015, 11:28:42 AM9/15/15
to Glyn Normington, dev
Yes, it makes sense to flesh this out more. I could take a stab at how it works today and we
could refine it from there.
Trevor's lifecycle has a separate create step which we don't have today. IIRC, we discussed
the issues with creating all the namespaces in one of the calls. 

Thanks,
Mrunal

To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@opencontainers.org.

W. Trevor King

unread,
Sep 15, 2015, 2:33:19 PM9/15/15
to David Liang, dev
On Tue, Sep 15, 2015 at 03:20:02AM -0700, David Liang wrote:
> Should a runtime check the process status before calling 'start' to
> make sure the process is unique?

What do you mean by “the process is unique”? For an example of what I
have in mind with multiple start calls starting multiple containers,
see the ‘--id’ argument for ‘start’ in runtime-API specs I've been
working on with Julz [1]. That would let you spawn multiple
containers from the same bundle by leaving off the ‘--id’ argument (so
the runtime generates unique IDs) or by using ‘--id $(uuidgen)’ or
some such to generate your own unique IDs.

Cheers,
Trevor

[1]: https://gist.github.com/wking/1d69118ba8b750f85bfc#start
signature.asc

W. Trevor King

unread,
Sep 15, 2015, 2:37:39 PM9/15/15
to David Walter, d...@opencontainers.org
On Mon, Sep 14, 2015 at 11:42:26PM -0500, David Walter wrote:
> Is it a sensible starting point, for example, would the unit file
> attributes provide a basis for a discussion?
>
> http://www.freedesktop.org/software/systemd/man/systemd.unit.html

There's a lot going on in there ;). Can you take a stab at boiling it
down to a lifecycle chart? I see some references to OnFailure and
such that sound sort of like our post-stop hooks. Although our
post-stop hooks don't currently have a way to tell if the process
failed or not. Maybe we need to add an exit code to state JSON once
the application exits but before cleanup is finished?

Cheers,
Trevor
signature.asc

W. Trevor King

unread,
Sep 15, 2015, 2:41:55 PM9/15/15
to Mrunal Patel, Glyn Normington, dev
On Tue, Sep 15, 2015 at 08:28:39AM -0700, Mrunal Patel wrote:
> I could take a stab at how it works today and we could refine it
> from there.

Great, thanks :).

> Trevor's lifecycle has a separate create step which we don't have
> today.

In my defense, there's currently a separate create step in the docs
[1] ;). But if the specs end up combining container creation with the
initial application launch, folks like me who prefer a separate
creation step can always have the “main application” be a dummy
process that just holds the container open, and then exec in more
processes to do the real work.

Cheers,
Trevor

[1]: https://github.com/opencontainers/specs/blob/v0.1.1/runtime.md#create
signature.asc

David Liang

unread,
Sep 15, 2015, 9:29:24 PM9/15/15
to dev, liang...@gmail.com
Got it, it was clear after reading https://gist.github.com/wking/1d69118ba8b750f85bfc#start.
I misunderstand the 'start' progress. 

W. Trevor King

unread,
Sep 19, 2015, 5:03:10 PM9/19/15
to Mrunal Patel, Glyn Normington, dev
On Tue, Sep 15, 2015 at 08:28:39AM -0700, Mrunal Patel wrote:
> I could take a stab at how it works today and we could refine it
> from there.

It's been a few days now, so I thought I'd take a stab at this. If we
have enough lead-time we might be able to get this finalized at next
weeks meeting (on Thursday [1]). This write-up is just about Linux
containers, since I don't understand the other systems well enough to
do them justice.

# Typical lifecycle

A typical lifecyle progresses like this:

1. There is no container or running application
2. A user tells the runtime to start a container+application
3. The runtime creates the container
4. The runtime executes any pre-start hooks
5. The runtime executes the application
6. The application is running
7. A user tells the runtime to send a termination signal to the application
8. The runtime sends a termination signal to the application
9. The application exits
10. The runtime executes any post-stop hooks
11. The runtime removes the container

With steps 7 and 8, the user is explicitly stopping the application
(via the runtime), but it's also possible that the application could
exit for other reasons. In that case we skip directly from 6 to 9.

Failure in a pre-start hook or other setup task can cause a jump
straight to 10.

## Create

Create the container: file system, namespaces, cgroups,
capabilities. The invoked process forks, with one branch that stays
in the host namespace and another that enters the container. The
host process caries out all container setup actions, and continues
running for the life of the container so it can perform teardown
after the container process exits. The container process performs
tasks such as username-lookups [2], and then drops privileges in
preparation for the application start. At this point, the host
process writes the state.json file with the host-side version of the
container-process's PID (the container process may be in a PID
namespace) [3].

[This is where standard streams get a bit tricky, because both the
host and container processes are sharing the same streams.
Untangling this will probably involve logging (instead of stderr
writing) for the host process, although we'll have to figure out
when to cutover from stderr to logs. Probably some time before we
fork, and definitely before we exec the application.]

## Pre-start hooks

The pre-start hooks are executed after container creation by the
host process.

## Start (process)

After the pre-start hooks complete, the host process signals the
container process to execute the runtime. The runtime execs the
process defined in config.json's ‘process’ attribute [4].

On Linux hosts, some information for this execution may come from
outside the `config.json` and `runtime.json` specifications. See
the Linux-specific notes for details [this is the "additional file
descriptor" stuff that landed in specs#113, see also [5]].

## Stop (process)

Send a termination signal to the application process (can optionally
send other signals to the application process, e.g. a kill signal).
When the process exits, the host process collects it's exit status
to return as its own exit status. If there are any remaining
processes in the container's cgroup (and we only support unified
cgroups [6]), the host process kills and reaps them.

[On IRC on 2015-09-15, Michael said: “if the main process dies in
the container, all other process are killed” and “we actually freeze
first, send the KILL, then unfreeze so we don't have races”. “The
main process” is probably “the container process associated with the
host process that created the cgroup”, to distinguish it from
container processes that have subsequently joined the cgroup. And
KILL seems like a harsh starting point, so it might be “TERM, wait
on a clean exit for $TIMEOUT, if processes are still running,
freeze, KILL, unfreeze, and reap”.]

## Post-stop hooks

The post-stop hooks are executed after container creation by the
host process.

[I'm not clear on what state.json looks like for these processes.
Does it still have a PID? The container process is dead by this
point and some of the container (e.g. PID namespaces) won't even
exist anymore. How are post-stop hooks supposed to get information
about the container?]

## Cleanup

The host process removes the container: unmounting file systems,
removing namespaces, etc. This is the inverse of create. The host
process then exits with the application's exit status [Julz has
pointed out that this makes it hard to report on the host-process's
own teardown errors].

# Joining existing containers

Joining an existing container looks just like the usual workflow,
except that the container process joins the target container [7] at
the beginning of step three. It can then, depending on its
configuration, continue to create an additional child cgroup
underneath the one it joined.

When exiting, the reaping logic in the ‘stop’ phase is the same. If
the container process created a child cgroup, all other processes in
that child cgroup are reaped. But no other processes in the joined
cgroup (which the container process did not create) are reaped.

Does that sound close to what we have now in runC? Can anyone suggest
edits or complete rewrites where I got things wrong? Add clarity to
my [bracketed confusion]?

Cheers,
Trevor

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/pnM9vDNJgrg
[2]: https://github.com/opencontainers/specs/pull/191
[3]: https://github.com/opencontainers/specs/blob/v0.1.1/runtime.md#state
[4]: https://github.com/opencontainers/specs/blob/v0.1.1/config.md#process-configuration
[5]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/D-3t4XHOqnU
Message-ID: <CAK4o1WzT7rVv16rAG=EGHfHKMY+kc9HR-k...@mail.gmail.com>
[6]: https://github.com/opencontainers/specs/blob/v0.1.1/runtime-config-linux.md#control-groups
“The Spec does not support split hierarchy.”
[7]: https://github.com/opencontainers/specs/blob/v0.1.1/runtime-config-linux.md#control-groups
cgroupsPath
signature.asc

Jojy Varghese

unread,
Sep 22, 2015, 2:27:39 PM9/22/15
to W. Trevor King, Mrunal Patel, Glyn Normington, dev
Hi Trevor
  Thanks for taking the initiative. I don’t know if there is a formal state diagram. I just started to take a stab at it to understand the details.


It needs more details like:
- Events that can trigger state transitions
- Finer states within the major states.
- Formal names of states.


-jojy



Sebastiaan van Stijn

unread,
Sep 22, 2015, 5:30:12 PM9/22/15
to Jojy Varghese, W. Trevor King, Mrunal Patel, Glyn Normington, dev
For inspiration, this is the state (and events) diagram currently in the Docker documentation;


-- Sebastiaan

Jojy Varghese

unread,
Sep 22, 2015, 6:17:14 PM9/22/15
to Sebastiaan van Stijn, W. Trevor King, Mrunal Patel, Glyn Normington, dev
Great! Thanks Sebastian.


-Jojy

Jojy Varghese

unread,
Sep 23, 2015, 12:45:33 AM9/23/15
to Sebastiaan van Stijn, W. Trevor King, Mrunal Patel, Glyn Normington, dev
Got inspired and updated the state diagram. It could use some notes to describe examples of hooks and what they could be doing.

thanks
Jojy


On Sep 22, 2015, at 2:30 PM, Sebastiaan van Stijn <thaj...@gmail.com> wrote:

Mrunal Patel

unread,
Sep 23, 2015, 11:54:41 AM9/23/15
to Jojy Varghese, Sebastiaan van Stijn, W. Trevor King, Glyn Normington, dev
Thanks for creating the diagram Jose. A few corrections on the status quo:
  1. We don't have a pre-launch hook today. It could probably be called by higher orchestration without support in the spec.
  2. pre-exec is what we call Prestart.
  3. There is no Prestop either, today.
Prestart hooks could be used to setup the network namespace and networking of the container as an example.
Poststop could tear those changes down.

Thanks,
Mrunal

W. Trevor King

unread,
Sep 23, 2015, 12:37:06 PM9/23/15
to Mrunal Patel, Jojy Varghese, Sebastiaan van Stijn, Glyn Normington, dev
On Wed, Sep 23, 2015 at 08:54:39AM -0700, Mrunal Patel wrote:
> Thanks for creating the diagram Jose. A few corrections on the
> status quo…

Any feedback / clarification / corrections on the textual version of
the “no explicit create step” lifecycle [1]? It seems like it might
be easier to iterate on the text, and then convert it to a diagram
once the text is finalized. Because text → diagram is going to be
lossy (unless you have things like the definition of “container
processes” in mouseovers ;).

Cheers,
Trevor

[1]: https://groups.google.com/a/opencontainers.org/d/msg/dev/frUXLljXy8Y/aKFRX56qAAAJ
Message-ID: <20150919210...@odin.tremily.us>
signature.asc

Jojy Varghese

unread,
Sep 23, 2015, 12:44:19 PM9/23/15
to W. Trevor King, Mrunal Patel, Sebastiaan van Stijn, Glyn Normington, dev
I agree that we should solidify the text version first. The intend behind the diagram was to understand the flow for people like me who understand pictures better.

Regarding the hooks, the intend behind pre-stop hook was to enable specialized behavior for special stops such as OOM.

-Jojy

Glyn Normington

unread,
Sep 24, 2015, 3:51:34 AM9/24/15
to Jojy Varghese, W. Trevor King, Mrunal Patel, Sebastiaan van Stijn, dev
Hi Jojy

The diagram helps people see the wood for the trees, so thanks for drawing it, and the text adds detail, precision, etc., so I agree it's the thing to focus on next.

Regards,
Glyn
--
Regards,
Glyn

Jojy Varghese

unread,
Sep 24, 2015, 10:29:17 AM9/24/15
to Glyn Normington, W. Trevor King, Mrunal Patel, Sebastiaan van Stijn, dev
Thanks Glyn. 

Wondering if there is any formal way of adding namespace context in the life cycle? This would help us understand which state and hooks would be executed in host context  of the host and which in the context of the container.

-Jojy

Glyn Normington

unread,
Sep 24, 2015, 10:32:51 AM9/24/15
to Jojy Varghese, W. Trevor King, Mrunal Patel, Sebastiaan van Stijn, dev
Use colo(u)rs?
--
Regards,
Glyn

Jojy Varghese

unread,
Sep 24, 2015, 10:35:04 AM9/24/15
to Glyn Normington, W. Trevor King, Mrunal Patel, Sebastiaan van Stijn, dev
Sure. For the text part, does the spec say anything about hook contexts?

-Jojy

W. Trevor King

unread,
Sep 24, 2015, 12:52:24 PM9/24/15
to Jojy Varghese, Glyn Normington, Mrunal Patel, Sebastiaan van Stijn, dev
On Thu, Sep 24, 2015 at 07:35:01AM -0700, Jojy Varghese wrote:
> Sure. For the text part, does the spec say anything about hook contexts?

The spec isn't clear (yet ;), but [1] is clear (and it matches with my
runC tests [2]).
[2]: https://github.com/opencontainers/runc/pull/160#issuecomment-138383886
signature.asc

W. Trevor King

unread,
Sep 28, 2015, 3:00:35 PM9/28/15
to Mrunal Patel, Glyn Normington, dev
Here's v2 of my attempt at capturing runC's current lifecycle.
Changes since v1 [1]:

* Adding the:

“All hooks execute in the host environment (e.g. the same
namespace, cgroups, etc. that apply to the host process).”

clarification to the hook sections, since my previous “executed … by
the host process” doesn't necessarily mean “in the host
environment”.

* Reword the initial sentence in each hook section to fixing the
post-stop copy/paste error “after container creation” → “after it
completes the stop” in the post-stop section.

Here's the lifecycle:
The pre-start hooks are executed by the host process after container
creation. All hooks execute in the host environment (e.g. the same
namespace, cgroups, etc. that apply to the host process).
The post-stop hooks are executed by the host process after it
completes the stop. All hooks execute in the host environment
(e.g. the same namespace, cgroups, etc. that apply to the host
process).

[I'm not clear on what state.json looks like for these processes.
Does it still have a PID? The container process is dead by this
point and some of the container (e.g. PID namespaces) won't even
exist anymore. How are post-stop hooks supposed to get information
about the container?]

## Cleanup

The host process removes the container: unmounting file systems,
removing namespaces, etc. This is the inverse of create. The host
process then exits with the application's exit status [Julz has
pointed out that this makes it hard to report on the host-process's
own teardown errors].

# Joining existing containers

Joining an existing container looks just like the usual workflow,
except that the container process joins the target container [7] at
the beginning of step three. It can then, depending on its
configuration, continue to create an additional child cgroup
underneath the one it joined.

When exiting, the reaping logic in the ‘stop’ phase is the same. If
the container process created a child cgroup, all other processes in
that child cgroup are reaped. But no other processes in the joined
cgroup (which the container process did not create) are reaped.

Cheers,
Trevor

ps. And backing up my previous tests [8], on IRC today Alexander
confirmed the intended hook environment:

11:28 < jojy_mesos> i guess i was expecting an additional attribute
for hooks that specifies whether the hooks runs in the host
namespace or in the container's namespace
11:28 < lk4d4> they all run in host namespaces

There's also been some discussion of post-start hooks around [9], but
I've left that out until we have a clearer picture of whether it will
land and what it will look like.
[8]: https://github.com/opencontainers/runc/pull/160#issuecomment-138383886
[9]: https://github.com/opencontainers/specs/issues/20#issuecomment-143367767
signature.asc

W. Trevor King

unread,
Sep 30, 2015, 2:56:38 PM9/30/15
to d...@opencontainers.org
On Mon, Sep 28, 2015 at 11:58:32AM -0700, W. Trevor King wrote:
> Here's v2 of my attempt at capturing runC's current lifecycle…

After their time cooking on the list, we decided at today's meeting to
move these docs into a PR [1], so I just filed [2]. Feedback welcome
:).

Cheers,
Trevor

[1]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-09-30-17.00.html
[2]: https://github.com/opencontainers/specs/pull/207
signature.asc
Reply all
Reply to author
Forward
0 new messages