[PATCH] runtime: Document container ID charset and uniqueness domain

39 views
Skip to first unread message

W. Trevor King

unread,
Sep 3, 2015, 6:09:34 PM9/3/15
to d...@opencontainers.org
Allow the runtime to use it's own scheme, but let the caller use UUIDs
if they want. Jonathan asked for clarification as part of #87, but
didn't suggest a particular approach [1]. When we discussed it in the
2015-08-26 meeting [2], the consensus was to just allow everything.
With container IDs like 'a/b/c' leading to state entries like
'/var/oci/containers/a/b/c/state.json'. But that could get ugly with
container IDs that contain '../' etc. And perhaps there are some
filesystems out there that cannot represent ASCII characters
(actually, I'm not even sure what charset our JSON is in ;). I'd
rather pick this minimal charset which can handle UUIDs, and make life
easy for runtime implementers and safe for bundle consumers at a
slight cost of flexibility for bundle-authors.

[1]: https://github.com/opencontainers/specs/pull/87#discussion_r35828046
[2]: https://github.com/opencontainers/specs/wiki/Meeting-Minutes-2015-08-26

Reported-by: Jonathan Boulle <jonatha...@gmail.com>
Signed-off-by: W. Trevor King <wk...@tremily.us>
---

This seems like a significant-enough semantic change to be worth
discussing on the list before I file a PR (although I've pushed this
commit if folks want to pull it [3]). I thought the patch format
would help clarify the changes I was suggesting.

Does anyone have a use case for container IDs outside the range I'm
suggesting?

Cheers,
Trevor

[3]: https://github.com/wking/opencontainer-specs/tree/container-ID-charset-and-uniqueness

runtime.md | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/runtime.md b/runtime.md
index be50458..673db9e 100644
--- a/runtime.md
+++ b/runtime.md
@@ -11,6 +11,9 @@ By providing a default location that container state is stored external applicat

* **version** (string) Version of the OCI specification used when creating the container.
* **id** (string) ID is the container's ID.
+ Only ASCII letters, numbers, and hyphens are valid.
+ This value must be unique for a given host, but need not be universally unique.
+ Runtimes must allow the caller to set this ID, so that callers may choose, for example, to use [UUIDs][uuid] for universal uniqueness.
* **pid** (int) Pid is the ID of the main process within the container.
* **root** (string) Root is the path to the container's bundle directory.

@@ -86,3 +89,5 @@ If a hook returns a non-zero exit code, then an error is logged and the remainin
```

`path` is required for a hook. `args` and `env` are optional.
+
+[uuid]: https://tools.ietf.org/html/rfc4122
--
2.1.0.60.g85f0837
signature.asc

W. Trevor King

unread,
Sep 3, 2015, 6:15:11 PM9/3/15
to d...@opencontainers.org
On Thu, Sep 03, 2015 at 03:07:33PM -0700, W. Trevor King wrote:
> And perhaps there are some filesystems out there that cannot
> represent ASCII characters…

I meant *non-ASCII* characters.

Cheers,
Trevor

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
signature.asc

Glyn Normington

unread,
Sep 4, 2015, 7:08:28 AM9/4/15
to dev
Would it be sensible to restrict "ASCII letters" in container IDs to a..z & A..Z since letters like š may have varying codes depending on the character set in use. I guess it would only affect anyone trying to read a container ID, but even so...

Regards,
Glyn

Jonathan Boulle

unread,
Sep 4, 2015, 1:46:49 PM9/4/15
to Glyn Normington, dev
Why not just skip the middle ground and stipulate that they should conform to the UUID format - if implementations don't care about the uniqueness requirement they can trivially generate something arbitrary that conforms to it anyway.

To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@opencontainers.org.

W. Trevor King

unread,
Sep 4, 2015, 11:38:06 PM9/4/15
to Jonathan Boulle, Glyn Normington, dev
On Fri, Sep 04, 2015 at 10:46:29AM -0700, Jonathan Boulle wrote:
> Why not just skip the middle ground and stipulate that they should
> conform to the UUID format - if implementations don't care about the
> uniqueness requirement they can trivially generate something
> arbitrary that conforms to it anyway.

Then container IDs would have to be 36 characters long, with hyphens
in certain places [1]. While you could make a workable system like
that, it's harder to make human-readable IDs (I currently get
‘oci-gentoo-minimal’ as a container ID, and that's a lot easier to
recognize than a UUID ;).

The middle ground (restricted charset that allows, but doesn't
require, UUIDs) gives you the freedom to pick your personal balance
between universal uniqueness and human readability, but protects users
from path-based shenanigans.

On Fri, Sep 4, 2015 at 4:08 AM, Glyn Normington wrote:
> Would it be sensible to restrict "ASCII letters" in container IDs to
> a..z & A..Z since letters like š may have varying codes depending on
> the character set in use. I guess it would only affect anyone trying
> to read a container ID, but even so...

I don't think š is ASCII (it's just the 7-bit stuff [2], and š is
U+0161):

$ python3 -c "'š'.encode('ascii')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u0161' in position 0: ordinal not in range(128)

But explicit is always good, so I'm happy to use other phrasing if
you'd prefer. In any case, the set I mean (a–z, A–Z, 0–9, and -) gets
spit out by the following Python 3 script:

import unicodedata

# http://www.unicode.org/reports/tr44/tr44-4.html#GC_Values_Table
category = {
'Ll': 'lowercase letter',
'Lu': 'uppercase letter',
'Nd': 'decimal number',
'Pd': 'dash punctuation',
}

for i in range(1<<7):
char = chr(i)
abbr = unicodedata.category(char)
if abbr[0] in ['L', 'N'] or abbr == 'Pd':
cat = category[abbr]
print('{:02x} {} {}'.format(i, char, cat))

Cheers,
Trevor

[1]: http://tools.ietf.org/html/rfc4122#section-3
[2]: https://en.wikipedia.org/wiki/ASCII#ASCII_printable_code_chart
signature.asc

W. Trevor King

unread,
Oct 5, 2015, 12:49:39 PM10/5/15
to d...@opencontainers.org
On Thu, Sep 03, 2015 at 03:07:33PM -0700, W. Trevor King wrote:
> + Only ASCII letters, numbers, and hyphens are valid.

Vincent floated colons as well [1]. I doubt anyone serious about
containers is running on Mac OS 9 (where colons were the path
separators), but it looks like colons aren't allowed in Windows
filenames [2]. So I'd rather hear from someone with access to a
Windows box before adding colons to the list of container ID
characters.

Cheers,
Trevor

[1]: https://github.com/opencontainers/specs/pull/211#issuecomment-145539887
[2]: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx#file_and_directory_names
signature.asc
Reply all
Reply to author
Forward
0 new messages