Strawman for distribution-spec supporting arbitrary artifacts

217 views
Skip to first unread message

Jimmy Zelinskie

unread,
Jan 18, 2019, 2:10:23 PM1/18/19
to dev
This is a follow up from the last dev call where we discussed the two existing strategies for storing non-image content in container registries.

The first strategy discussed is used in cnab-to-oci and leverages the OCI index and annotations in order to store extra data. An example manifest for this strategy looks like this.

The second strategy discussed is used in oras and leverages all available mediatype fields present in the OCI manifest. An example manifest for this strategy looks like the first example here.

After some discussion to the pros and cons of each, the call seemed to have both proposers agreeing upon a strawman where the OCI manifests are left unchanged, but only a subset of the available mediatype fields are able to contain arbitrary identifiers: the mediatype fields that are used in the manifest's config block and the mediatype associated with each layer. This enables registries to continue being able to parse manifests for garbage collection purposes and allows them to determine what kind of content is being stored without adding additional code for every new kind of content that users want to store. Registry operators would also still be free to implement a white/blacklist for media types to prevent certain types of artifacts from being stored in their registry.

An example manifest appears as such:

{
    "schemaVersion": 2,
    "mediaType": "application/vnd.oci.image.manifest.v1+json",
    "config": {
        "mediaType": "application/vnd.cncf.helm.chart.config.v1+json",
        "size": 7023,
        "digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7",
        "annotations": {
          "vnd.cncf.helm.chart.name": "nginx",
          "vnd.cncf.helm.chart.version": "0.1.0"
        }
    },
    "layers": [
        {
            "mediaType": "application/vnd.cncf.helm.chart.v1.tar+gzip",
            "size": 32654,
            "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f"
        }
    ]
}

I believe that changes to both the image and distribution specification might have to occur in order to effectively codify this behavior. I yield that discussion to the floor.

Also, a huge thanks to everyone that worked on proposals and everyone who attended our discussion!

Jon Johnson

unread,
Jan 18, 2019, 2:36:17 PM1/18/19
to Jimmy Zelinskie, dev
Thanks for putting this together!

Are the config annotations not redundant with the image repository and tag? This seems to suffer from the same problem that schema 1 images had, where embedding tags and names in manifests broke content-addressability when moving images between repos or re-tagging them. I'm not super familiar with helm, though, so I could be misunderstanding the point of the name/version.

--
You received this message because you are subscribed to the Google Groups "dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@opencontainers.org.

Jimmy Zelinskie

unread,
Jan 18, 2019, 2:52:33 PM1/18/19
to Jon Johnson, dev
So I agree with you that name and version are probably bad examples of metadata to store. I just wanted to demonstrate that you could store application-specific metadata in the annotations if need be. The people working on Helm's implementation will have to decide if they even want to use annotations.

FWIW, Helm charts have a chart.yaml file that contains metadata such as the name and version which could mismatch with whatever tag is used.

Stephen Day

unread,
Jan 18, 2019, 2:58:36 PM1/18/19
to Jimmy Zelinskie, Jon Johnson, dev
While this can cause issues of naming, the main problem with schema1 was that tags would have several aliases, embedded within each version of the manifest. So, to implement the tags 1, 1.1, 1.1.1 and latest, you'd end up with a manifest for each one with a separate digest. With the annotations this way, you'll still have authority problems but there is a single, canonical manifest with a stable digest. However, the registry cannot and should not have to enforce uniqueness, as it can't be verified in the client.

To address some of these problems, we could also add an annotation for the digest of the previous version. That may create more problems though.

Jimmy:

Do we have any ideas the size and scope of changes required in the specifications? Can these changes be done in a backwards compatible way, such that we can issues 1.1 of image-spec with expanded mediaTypes? If not, will you need help identifying that delta?

Stephen.

Gareth Rushgrove

unread,
Jan 20, 2019, 1:18:49 PM1/20/19
to Jimmy Zelinskie, Jon Johnson, dev
On Fri, 18 Jan 2019 at 19:52, Jimmy Zelinskie <jimmyze...@gmail.com> wrote:
>
> So I agree with you that name and version are probably bad examples of metadata to store. I just wanted to demonstrate that you could store application-specific metadata in the annotations if need be. The people working on Helm's implementation will have to decide if they even want to use annotations.
>

I don't think they're bad examples, just incomplete.

I'd wager in lots of cases name might be the repository name, and
version might be the tag, but semantically the repository and tag
don't have specific meaning and could be anything. The namespaced
annotations would have a defined meaning.

For a fuller list of the two examplest, it's worth looking at
https://docs.helm.sh/developing_charts/#the-chart-yaml-file for Helm
and https://github.com/deislabs/cnab-spec/blob/master/schema/bundle.schema.json#L7
for CNAB. It would probably be useful to build up full examples of
both Helm and CNAB annotations (not for the OCI spec, more to test the
theory). I think the schemas for these (linked to the mimetype) are
something Steve Lasker, myself and probably others have discussed
mainly in passing - ie. a shared (non exclusive) registry of content
types.

Gareth
--
Gareth Rushgrove
@garethr

devopsweekly.com
morethanseven.net
garethrushgrove.com

Jimmy Zelinskie

unread,
Jan 21, 2019, 2:15:27 PM1/21/19
to Gareth Rushgrove, Jon Johnson, dev
>Do we have any ideas the size and scope of changes required in the specifications? Can these changes be done in a backwards compatible way, such that we can issues 1.1 of image-spec with expanded mediaTypes? If not, will you need help identifying that delta?

I could probably use the help of someone more familiar with the image spec to bounce ideas off until we can figure out the minimum delta. I suspect it's slight wording changes in the image spec and a section in the distribution spec describing what SHOULD and CAN happen when you evaluate mimetypes.

Sajay Antony

unread,
Jan 21, 2019, 8:13:00 PM1/21/19
to dev
Could we get clarification on how the oras approach can handle multiple media types against the same name? 

Josh Dolitsky

unread,
Jan 21, 2019, 8:58:25 PM1/21/19
to Sajay Antony, dev
Could we get clarification on how the oras approach can handle multiple media types against the same name?

Jimmy and others may have a better idea of how we can/should handle this. It might be better to keep them separate (maybe not?)

In terms of oras specifically, there is an "allowedMediaTypes" arg on pull that allows you to filter which layers to download based on mediatype:
https://github.com/shizhMSFT/oras/blob/master/pkg/oras/pull.go#L16

So, in the case of "helm pull", if a single ref contains both Docker and Helm-related layers, only the ones with mediatypes recognized by Helm would be fetched from the remote.




--

Jimmy Zelinskie

unread,
Jan 21, 2019, 9:46:56 PM1/21/19
to Josh Dolitsky, Sajay Antony, dev
Sajay,

The OCI Index (nee ManifestLists) are already used for determining which artifacts a tool should care about. For example, right now the docker client compares the list of Manifests to determine whether or not $GOOS matches the OS specified in the platform block. In my proposal, we'd clarify that clients need only ignore unknown media types in this section of the index.

Here's an example:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 7143,
      "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f",
      "platform": {
        "architecture": "ppc64le",
        "os": "linux",
      }
    },
    {
      "mediaType": "application/vnd.cncf.helm.manifest.v1+json",
      "size": 7682,
      "digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270",
      "platform": {
        "architecture": "amd64",
        "os": "linux",
      }
    }
  ]
}

Sajay Antony

unread,
Jan 22, 2019, 4:48:30 PM1/22/19
to dev, jdol...@gmail.com, sajay....@gmail.com
Jimmy, 

Thank you for the clarification. This explains the pull scenarios for multiple mime types. I would also like to discuss how pushing to the same repo:tag would work. 

Assuming the tag doesn't exist - 

`docker push hello-world:latest`
  - This leads to a fat manifest of the form 

{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 1885,
"digest": "sha256:eb68d2e2f59a9e5ea880ccc5715672ba5238c3f03d0ad596689564c675a986b4"
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.foreign.diff.tar.gzip",
"size": 92818888,
"digest": "sha256:e46172273a4e4384e1eec7fb01091c828a256ea0f87b30f61381fba9bc511371",
"urls": [
]
},

If we follow that with a `helm push` we need to handle the fact that there are artifacts already in that repo:tag which is not of the corresponding mime type. 

Would this mean 
  1.  Clients pushing to an existing fat manifest will first have to create a manifestList before pushing their own artifact and in a way promote the manifestList to that name? 
  2. Or are clients expected to only push manifestLists(oci-indexes) to tags. 
Also moving a fat manifest down and repointing to a manifestList will potentially require tag resigning when composing with Notary.

Basically I'm trying to reason, what would the recommendation be for an artifact author when uploading to an existing tag. 

Thanks,
Sajay

Atlas Kerr

unread,
Jan 23, 2019, 12:31:34 PM1/23/19
to Sajay Antony, dev, jdol...@gmail.com
Sajay,

Since the manifest and manifest index are computed blobs, pushing to
the same tag would require the client to recompute and reupload the
full manifest to push to the same tag.

Jimmy, what are your thoughts on the manifest and index being hosted
on the same endpoint? If a client is disrespectful, the could end up
wiping out a manifest index by replacing it with a plan ole manifest.
Seems like an easy mistake to make on the client side.

Best,
Atlas

Jimmy Zelinskie

unread,
Jan 23, 2019, 1:22:07 PM1/23/19
to Atlas Kerr, Sajay Antony, dev, jdol...@gmail.com
The point being raised is an existing ergonomic issue with OCI Index right now and is orthogonal to my proposal for arbitrary artifacts.

Using the docker client today, you have to manage ManifestLists client-side and push them to the registry. In all honesty, Quay delayed implementation of the Docker v2-2 specification because of this. Before the docker client added commands for manipulating ManifestLists client-side, Quay considered having the ManifestLists be generated entirely by the registry. However, this breaks the trust model if you're using Notary.

A possible solution would be to require some kind of signal that the client wants to overwrite everything explicitly. Without this signal, the registry could reject ManifestLists or Manifests pushed to the tag that do not respect existing artifacts.

Stephen Day

unread,
Jan 23, 2019, 3:41:20 PM1/23/19
to Jimmy Zelinskie, Atlas Kerr, Sajay Antony, dev, Josh Dolitsky
The main use case of a manifest index/list requires coordination. Let's take the common case of multi-platform. You can't release a new version of the image until all supported platforms are ready for a giving version. This can't really be done in the registry unless it knows the complete list of platforms that it expects before serving the index under a given tag. The same applies for any process. If you make available to your users a helm template, a cnab, and a few docker images across platforms, you need to get all of that built before pushing the artifact or you push an incomplete product. Even if you want to push one platform or artifact before the other, and automatically update client-side, the consumer can pull the updated index, check if their "channel" (helm, cnab, image, etc.) is available and make a decision about an upgrade.

Based on this reasoning, the concept of supporting uncoordinated pushes of different content to the same tag into a single index isn't a required use case. The necessary coordination requires more context, such as that available in a build pipeline or other tool.

> Without this signal, the registry could reject ManifestLists or Manifests pushed to the tag that do not respect existing artifacts.

Having preventative validation is a reasonable compromise but the problem I described above still exists. Let's say that the validation requires it to cover the platform/artifact set of the previous version. How does that get represented? And what if one gets accidentally pushed with say, helm, but the next version doesn't have it?

The solution to this is really to have a build agent enumerating the expected content of the index. When the full content is built and ready, that agent then sends out the final product.

Atlas Kerr

unread,
Jan 23, 2019, 5:13:05 PM1/23/19
to Stephen Day, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
> Based on this reasoning, the concept of supporting uncoordinated pushes of different content to the same tag into a single index isn't a required use case. The necessary coordination requires more context, such as that available in a build pipeline or other tool.

I seems like manifest/index coordination is somewhat limiting right?
What if a project starts out on Linux and gains more architecture
support on the same version after-the-fact?

Could you also expand on the "build agent" concept? Would this be an
addition to the current spec or is it an out-of-process thing?

Stephen Day

unread,
Jan 23, 2019, 5:33:27 PM1/23/19
to Atlas Kerr, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
> I seems like manifest/index coordination is somewhat limiting right?

There are limits to what can be done with CAS. The more you put in the same bucket, the more coordination is required to generate what goes there. If you don't want a combined index, you can use different tags or repos to avoid the coordination requirement.

> What if a project starts out on Linux and gains more architecture support on the same version after-the-fact?

This isn't a problem. You just go back and push the new architecture on each tag you want to add support for. The new architecture gets added to the indexes and that index gets an updated digest. The original clients use the version they were before and the new architecture clients use the new one.

There is only a problem when you have supported an architecture in one version and remove it from the next or don't coordinate the push and overwrite descriptor references in an index.

> Could you also expand on the "build agent" concept? Would this be an addition to the current spec or is it an out-of-process thing?

A build agent would be part of the CI/CD pipeline for an application. You could do something in Jenkins et al that coordinates builds across a bunch of architectures then assembles the manifest at the end of the pipeline. There are a ton of different ways to do it and they all have different properties that may be better suited to certain use cases, so adding a spec doesn't make a whole of sense. It's effectively a tooling problem that can be solved in one of many products that employ OCI. Build is way outside the scope of OCI.

Steve Lasker

unread,
Jan 24, 2019, 1:09:25 PM1/24/19
to Stephen Day, Atlas Kerr, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky

This looks like a classic discussion of tech and scenarios.

We seem to be saying, the existing tech supports A. Can we get scenario B to fit into the existing tech?

As opposed to, we have this new/evolved scenario. How can Tech A support Scenario B, and what, if anything, do we need to do to change Tech A to properly support scenario B?

 

So, let me bubble up and ask a scenario based question:

If we’re looking at registries to support multiple artifact types, do we think the various tools that want to push/pull to registries should know about all the other artifact types?

As an experience, (exact CLI details not important) can I:

Or, in docker-hub syntax

  • docker push stevelasker/helloworld:1.0
  • helm push stevelasker/helloworld:1.0

 

As we expand registries to be artifact stores, I think of them as cloud based file systems.

For the git example, I can save a config.json file to the same git repo as config.yaml. I can save a Foo.doc to the same directory as Foo.ppt.

The problem with this analogy is registries are 3 dimensional.

To date, registries only dealt with them as 2 dimensional

  • Name (stevelasker/helloworld)
  • Version : (1.0)

As we evolve registries to artifact stores, a given artifact has three elements:

  • Name (stevelasker/helloworld)
  • Version : (1.0)
  • Type dockerImage or helmchart

 

You could argue git has versions as well, but they’re not as prominent as a tag/version. You can get the older version, but it’s not a mainline scenario. It’s a fallback. Registries support versions as a primary scenario. We’re saying artifact type is a new primary scenario.

 

I agree a specific client should deal with all the conflicts and resolutions of various elements that make up a named reference. If I push a windows and linux image to the same repo with a multi-arch tag, I’d expect the “docker tooling” to cope with this. It knows these because they have the docker image artifact type.

 

However, if I push a helm chart, a CNAB, docker-compose file, an msix, (name several new artifact types), should I expect each one of those tools to manipulate a common index?

Or, can we hash the names by artifact types?

 

I realize this is a change, possibly a BIG change to what currently exists. If we can agree on the scenario, we can then figure out how to achieve the scenario.

I also realize that getting OCI Distribution 1.0 out is a pressing concern. Stephen, rightly so, keeps bringing this up. Just as getting Helm 3.0 and CNAB out is a priority.  

What I’m asking is can we agree on the scenario, then figure out how to sequence the steps to get to that wonderful place?

Atlas Kerr

unread,
Jan 24, 2019, 1:46:15 PM1/24/19
to Steve Lasker, Stephen Day, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
I'm currently working on fetching manifests from OCI-compliant
registries for distribution-client all of this is really fresh in my
mind.

The reason why new artifact types are hard to use with the spec is
because the image index is not flexible enough to handle it. The
reason why we don't have custom mimetypes for linux, darwin, or
windows is because operating systems have first class support since
the image index exposes the `platform` field.

For example:
{
"manifests": [
{
"digest":
"sha256:e3161859d1779d8330428ed745008710a1ecfb9f494c2e1b062be4cc0ba9ee2a",
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"platform": {
"architecture": "amd64",
"os": "linux"
},
"size": 642,
"urls": [
"http://example.com"
]
},
{
"digest":
"sha256:caf3af6d893b5cb8eae9a90a3054f370a92130863450e3299d742c7a65329d94",
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"platform": {
"architecture": "amd64",
"os": "windows"
},
"size": 582,
"urls": [
"http://example.com"
]
}
],
"mediaType": "application/vnd.oci.image.index.v1+json",
"schemaVersion": 2
}



I think the key to making this all work is to find a way to refactor
the `platform` field in the image index to allow for a richer type
system. To represent a helm chart and a linux bundle in the same
index, it could look something like this:

{
"manifests": [
{
"digest":
"sha256:e3161859d1779d8330428ed745008710a1ecfb9f494c2e1b062be4cc0ba9ee2a",
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"type": "linux/amd64",
"size": 642,
"urls": [
"http://example.com"
]
},
{
"digest":
"sha256:caf3af6d893b5cb8eae9a90a3054f370a92130863450e3299d742c7a65329d94",
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"platform: "helm",
"size": 582,
"urls": [
"http://example.com"
]
}
],
"mediaType": "application/vnd.oci.image.index.v1+json",
"schemaVersion": 2
}

Now clients can know how to fetch specific content without mucking
about with the mediatype fields.

Atlas Kerr

unread,
Jan 24, 2019, 1:48:28 PM1/24/19
to Steve Lasker, Stephen Day, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
Typo! The proposed example should be as follows:

{
"manifests": [
{
"digest":
"sha256:e3161859d1779d8330428ed745008710a1ecfb9f494c2e1b062be4cc0ba9ee2a",
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"type": "linux/amd64",
"size": 642,
"urls": [
"http://example.com"
]
},
{
"digest":
"sha256:caf3af6d893b5cb8eae9a90a3054f370a92130863450e3299d742c7a65329d94",
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"type": "helm",
"size": 582,
"urls": [
"http://example.com"
]
}
],
"mediaType": "application/vnd.oci.image.index.v1+json",
"schemaVersion": 2
}

Stephen Day

unread,
Jan 24, 2019, 2:01:37 PM1/24/19
to Steve Lasker, Atlas Kerr, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
You're confusing scenarios. Type resolution is different than assembly. Having a "new" scenario won't change that.

This is simply how oci provides end to end provenance on content addressable technology. You cannot have the registry assemble the artifact. Artifact type is already supported and these are the limitations. There is no way of changing it without breaking the provenance chain, even if you pull the type up to the index.

However, based on what you're saying, it seems reasonable to pull a type up into the index. Previously, we have solved this by having different manifest types. This is how we can pack docker and oci images into the same index, but since the proposal is reusing the existing manifest, the resolver doesn't know the difference. The options are to create a new manifest type for helm/cnab or pull the types up into annotations.

Jon Johnson

unread,
Jan 24, 2019, 2:03:36 PM1/24/19
to Steve Lasker, Stephen Day, Atlas Kerr, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
I pretty strongly disagree that git's versioning is not prominent. It's a version control system, and its branching model is why it's good. I don't understand what you're saying there.

It seems to me that this is a tooling problem. Right now, "docker push" is essentially the equivalent is "git push --force". Some folks have asked us for "immutable tags" because they don't want to accidentally overwrite tags (also because kubernetes has broken pull policies).

I don't expect that registries would somehow auto-merge artifacts for me (git doesn't), and I don't want to push this artifact-type dimension work onto registry operators if we can avoid it.

If you want OCI registries to "feel" more like git, I'd propose doing the following:

1. Disallow overwriting tags (or make this configurable).
You could even have a "fast forward only" policy where overwriting a tag is only allowed if the new artifact is a superset of the old artifact.
This isn't something I'd expect all registries to want to do. The developer loop of just pushing to "latest" is a nice thing to keep working. We could add a new error type to the distribution spec to indicate that your request was rejected due to a conflicting artifact (or something). 

2. Add new client-side operations.
For manipulating a common index, we really need some equivalent to "git merge" or "git rebase" that can combine two artifacts into an OCI index or append an artifact to an existing OCI index.

Atlas Kerr

unread,
Jan 24, 2019, 2:07:49 PM1/24/19
to Jon Johnson, Steve Lasker, Stephen Day, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
Stephen, thanks for completing my thought!

So a potential solution could be to add a "type" field in the image
index and maybe add some keywords in the platform object that tells
clients that a bundle can work on any os/arch combo?

Stephen Day

unread,
Jan 24, 2019, 2:17:56 PM1/24/19
to Atlas Kerr, Jon Johnson, Steve Lasker, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
Atlas:

Yes. I don't think we should overload platform, because a chart might want to qualify itself with that information, but the solution runs along the lines of pulling the type up into the index.

Jimmy's proposal proposes using the same manifest but has the limitation that we need to pull up the type if we want to do a resolution with *only* the index. If the client walks down the image, it can figure this out with one more look at the manifest.

The safest way is to create a new manifest type for helm/cnab, though.

Stephen Day

unread,
Jan 24, 2019, 2:25:11 PM1/24/19
to Atlas Kerr, Jon Johnson, Steve Lasker, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
Jon:

Great analogy! I think that highlights the main problem with an analogy on git.

There is also a way to prevent the overwrite by having a common value for a tag on update.

{
  "name": "1.0", 
  "digest": "sha256:deadbeef", 
  "mediaType: "...oci.index", 
  "previous": "sha256:deadbeefprev"
}

Basically, if the value of "previous" isn't the current value of digest of this tag, then the update can be rejected. In the case of a race, one client gets rejected, re-downloads the manifest and pushes the new versions with the updates. It creates a strong consitency guarantee on the registry, which makes it harder to deploy, but solves this problem slightly.

Stephen.

Atlas Kerr

unread,
Jan 24, 2019, 3:26:21 PM1/24/19
to Stephen Day, Jon Johnson, Steve Lasker, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
I just don't understand why there is a need for separation between OS
types and custom types. As a registry consumer, I have to use the
`platform` field of the for OS images but annotations for everything
else?? That's really annoying. They both serve the same purpose in the
context of an image index but are located in different places and in
different formats.

Could the `platform` field not serve the purpose of assembly AND type
resolution? Doesn't it already do that? I disagree that adding an
optional and backwards-compatible field `type` in the `platform`
object would be considered "overloading."

If helm gets it's own mediatype, there will be hundreds of other
communities are gonna want to implement one as well. In the future, I
believe it will be very difficult for registries to keep up with the
explosion of custom artifact types that users want to store alongside
their images. On the other hand, if the image-spec extends the image
index type system to support more than OS/arch options then registries
don't have to implement such complicated search mechanisms and
annotation formats. Helm doesn't need a custom mimetype if it knows
that the manifest it pulled is of type helm. Then in can pull a
regular tar blob and unpack it just like everyone else. Robust client
libraries can make the process opaque for package management
implementers.

Steve Lasker

unread,
Jan 24, 2019, 5:16:14 PM1/24/19
to Atlas Kerr, Stephen Day, Jon Johnson, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
Hey Atlas,
We've crossed some threads here.
In this thread, we've been discussing how a registry can support different artifact types. There's an important nuance to whether they're independent artifacts, or a collection of artifacts, defined as a single named thing.

After some offline conversations, let me try to summarize:

The OCI spec allows a namespace/name:tag to represent a collection of things. However, it's still a single index.

(please accept this a mock, exact types aren't as important)
The name: stevelasker/helloworld:1.0 can represent a CNAB which contains a collection of different types:
application/vnd.oci.cnab.index.v1+json
application/vnd.oci.cnab.manifest.v1+json
application/vnd.oci.helm.manifest.v1+json
application/vnd.oci.image.manifest.v1+json
application/vnd.oci.image.manifest.v1+json

This requires coordination from a tool to edit the single OCI index for stevelasker/helloworld:1.0. In the CNAB case, this makes perfect sense as it is a single thing that has a collection of objects and tools like duffle or docker-application can own that coordiantion.
Just as the multi-arch manifest can take a single name and represent windows and linux.

What we don't have is the ability to save independent things to the same name, without them coordinating on the single OCI Index:
docker push stevelasker/helloworld:1.0
application/vnd.oci.image.index.v1+json
application/vnd.oci.image.manifest.v1+json

helm push stevelasker/helloworld:1.0
application/vnd.oci.helm.index.v1+json
application/vnd.oci.helm.manifest.v1+json

duffle push stevelasker/helloworld:1.0
application/vnd.oci.cnab.index.v1+json
application/vnd.oci.helm.manifest.v1+json
application/vnd.oci.image.manifest.v1+json

Running each of those commands independently would overwrite the name with different artifact types.
The way to accomplish the above would require the three independent tools to edit the single index. And the pull of each command to find their thing they support.

As a multi-artifact registry, I would like to support different independent types, and provide some amount of safety so customers don't accidently change the meaning of an artifact.
I was hoping the underlying indexing system could support different types.
Just as I can save a word, excel and powerpoint document to the same directory, all named myThing. The word, excel and powerpoint client runtimes don't coordinate over the file system to merge the filesystem index. The file system maintains an index for that directory, independent of the tool.

Without the type being part of the underlying index, it seems each registry can decide how they want to handle this.
I'd suggest we should likely have an option, similar to tag locking, that provides a safety for a user to not accidently override a named thing. But, that's a registry by registry decision.

So, to close this long thread, while an OCI index can represent a collection of things, that collection should be thought of as a single thing. Such as a multi-arch manifest for an image. A CNAB bundle of things. But independent tools should not attempt to update n existing index, changing the type.

Stephen Day

unread,
Jan 24, 2019, 5:18:06 PM1/24/19
to Atlas Kerr, Jon Johnson, Steve Lasker, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
They are separate because the type describes the actual thing (json, tar, etc.) and the annotations/platform describe the context.

You can influence the process by teaching the client more about the context or encoding more functionality into the type. The right choice depends on what you are trying to accomplish.

Search-wise, this isn't that complicated. The main expense is the number of indexes created and how each of those fields are projected during indexing analysis.

Stephen Day

unread,
Jan 24, 2019, 5:22:55 PM1/24/19
to Steve Lasker, Atlas Kerr, Jon Johnson, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
You can safely push all artifacts if the tool grabs the existing index, so if those commands are run in succession, there is no danger:

docker push stevelasker/helloworld:1.0
helm push stevelasker/helloworld:1.0
duffle push stevelasker/helloworld:1.0

The other approach is to assemble each independently, then pack an index:

docker push stevelasker/helloworld:1.0-docker
helm push stevelasker/helloworld:1.0-helm
duffle push stevelasker/helloworld:1.0-duffle
sometool pack stevelasker/helloworld:1.0 stevelasker/helloworld:1.0-docker stevelasker/helloworld:1.0-helm stevelasker/helloworld:1.0-duffle

This workflow also ensures that it will work on a registry with immutable tags.

Stephen.

Steve Lasker

unread,
Jan 24, 2019, 5:32:54 PM1/24/19
to Stephen Day, Atlas Kerr, Jon Johnson, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky

Safely is the concern.

While each client tool can technically pull and update the index, it’s like saying different tools that save files to a directory should be responsible for updating the file system index.

“Just because you can, doesn’t mean you should”

 

The secondary example is more realistic, and is the multi-arch scenario and assumes each sub object is an intentional reference to the name:tag.

Atlas Kerr

unread,
Jan 24, 2019, 7:08:42 PM1/24/19
to Steve Lasker, Stephen Day, Jon Johnson, Jimmy Zelinskie, Sajay Antony, dev, Josh Dolitsky
Sorry for taking this thread off-topic once a day. I yield.

Sajay Antony

unread,
Jan 24, 2019, 7:21:11 PM1/24/19
to dev, steven...@hotmail.com, stev...@gmail.com, jonjo...@google.com, jimmyze...@gmail.com, sajay....@gmail.com, jdol...@gmail.com
Bringing back the original context and proposal that Jimmy was discussing. 
I believe using annotation and mimetypes  for storing the artifact will solve the issue of storing artifacts and that is a good first place to land and support this. 

I would recommend the remaining issues of either limiting artifact pushes to the same mimetype and or manifest list curation/validations  should probably be tackled as a separate item so that we can make progress with this proposal. 

Does that sound reasonable? 

Chen Shou

unread,
Jan 24, 2019, 7:28:17 PM1/24/19
to dev, atla...@gmail.com, stev...@gmail.com, jonjo...@google.com, jimmyze...@gmail.com, sajay....@gmail.com, jdol...@gmail.com
I wonder realistically what's the necessity to have the same name for different artifact types (even though it could be supported)? 

christop...@docker.com

unread,
Jan 25, 2019, 8:26:39 AM1/25/19
to dev

Hi,


Our team is responsible for the CNAB to OCI PoC and I’ve been following this conversation and the various other discussions (OCI meeting, Cloud Native Slack, etc.).


A CNAB references at least an invocation image and, optionally, other images that the bundle requires. This makes an OCI image index a natural starting point.


Our proposal for packaging CNAB is to:

  • Use an OCI image index with media type “application/vnd.oci.image.index.v1+json”

  • Use a CNAB specific config type, something like “application/vnd.oci.cnab.config.v1+json”

    • This descriptor would reference an OCI image manifest (“application/vnd.oci.image.manifest.v1+json”) that wraps the config blob

  • Use a CNAB specific type for the invocation image manifest descriptor, something like “application/vnd.oci.cnab.invocation.v1+json”

    • This descriptor would reference an OCI image manifest (“application/vnd.oci.image.manifest.v1+json”) or an OCI image index (“application/vnd.oci.image.index.v1+json”)

  • Use a CNAB specific type for the component image manifest descriptor, something like “application/vnd.oci.cnab.component.v1+json”

    • This descriptor would also reference an OCI image manifest or an OCI image index

    • Add annotations here for metadata about the component, like it’s original name and the corresponding component name in the bundle.json.

  • Top-level annotations with application metadata

    • We should reuse existing annotations where we can

    • We should define some new ones where Helm, CNAB and other things intersect


Note: The current CNAB to OCI code does not reflect this proposal but the changes would be relatively simple.


An example manifest follows:


{

 "schemaVersion": 2,

 "mediaType": "application/vnd.oci.image.index.v1+json",

 "manifests": [

   {

     "mediaType": "application/vnd.oci.cnab.config.v1+json",

     "digest": "sha256:d59a1aa7866258751a261bae525a1842c7ff0662d4f34a355d5f36826abc0341",

     "size": 291,

   },

   {

     "mediaType": "application/vnd.oci.cnab.invocation.v1+json",

     "digest": "sha256:196d12cf6ab19273823e700516e98eb1910b03b17840f9d5509f03858484d321",

     "size": 506,

   },

   {

     "mediaType": "application/vnd.oci.cnab.component.v1+json",

     "digest": "sha256:6bb891430fb6e2d3b4db41fd1f7ece08c5fc769d8f4823ec33c7c7ba99679213",

     "size": 507,

     "annotations": {

       "io.cnab.component_name": "component-1",

       "io.cnab.original_name": "nginx:2.12",

     }

   },

   {

     "mediaType": "application/vnd.oci.cnab.component.v1+json",

     "digest": "sha256:6bb891430fb6e2d3b4db41fd1f7ece08c5fc769d8f4823ec33c7c7ba99679213",

     "size": 507,

     "annotations": {

       "io.cnab.component_name": "component-2",

       "io.cnab.original_name": "backend:1.5",

     }

   },

 ],

 "annotations": {

   "com.docker.app.format": "cnab",

   "io.cnab.keywords": "[\"keyword1\",\"keyword2\"]",

   "io.cnab.runtime_version": "v1.0.0-WD",

   "io.cnab.type": "io.docker.app",

   "org.opencontainers.image.authors": "[{\"name\":\"docker\",\"email\":\"doc...@docker.com\",\"url\":\"docker.com\"}]",

   "org.opencontainers.image.description": "description",

   "org.opencontainers.image.title": "my-app",

   "org.opencontainers.image.version": "0.1.0"

 }

}


The associated CNAB config manifest would be:


{

   "schemaVersion": 2,

   "mediaType": "application/vnd.oci.image.manifest.v1+json",

   "config": {

       "mediaType": "application/vnd.oci.cnab.config.v1+json",

       "size": 7023,

       "digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7"

   }

}


From my reading of the spec, we would require changes to:

  • The expected media type for index manifest descriptors

  • The expected media type for image manifest config descriptors


More practically, this requires that registries:

  1. Support OCI indices

  2. Support new descriptor media types


I’m not particularly worried about (1). For (2), it will likely be a vendor choice if they want to whitelist a set of media types, coordinate this whitelisting (as suggested by Steve Lasker), or allow any media type.


The CNAB to OCI PoC shows how to circumnavigate both of these requirements: It can fallback to a Docker Manifest List and use annotations to duck type.


Relating this proposal to Jimmy’s proposal for Helm charts, I see:

  • CNAB and Helm can both be packaged as existing high-level OCI manifests (OCI index and OCI image respectively)

  • CNAB and Helm both need support for new descriptor media types

  • Helm needs a new media type for layers

    • I wonder if we could just use “application/vnd.oci.image.layer.v1.tar+gzip” for this?

  • CNAB and Helm can likely find common ground for annotations


If this proposal looks like the right direction, we’re happy to update CNAB to OCI (or get a PR doing so!) or to use whatever library implements the functionality in Docker App. You can find me and some of our team at Docker on the Cloud Native Slack so feel free to reach out. I don’t want to fragment the conversations around the format so we must be sure to post any decisions here too.


Regards,


Chris Crone

Steve Lasker

unread,
Jan 25, 2019, 1:20:23 PM1/25/19
to dev, atla...@gmail.com, stev...@gmail.com, jonjo...@google.com, jimmyze...@gmail.com, sajay....@gmail.com, jdol...@gmail.com, Chen Shou

For all those that feel TLDR;

With this wave of proposals, container registries will quickly evolve to becoming a new cloud based file system. Yes, it’s a registry, but once you can store 2-3 different artifact types, customers will want to store hundreds of different artifact types. And, why wouldn’t they?

The details of these emails surface issues with storing multiple artifact types in a single location.

- If registries support all sorts of artifacts (multiple dozens of types) how do tools and users reason over them?

- how would duffle search know to only list CNABs, Helm search list charts, msix only list msix, ARM or Cloud Formation tooling only list their templates?

- how does a registry listing (like docker hub) list the various artifacts, showing them as the types they are?

- how do vulnerability scanners know what they’re pulling, so they know how to scan them?

- how do registry listings know what actions to put on an artifact? (Run an image, deploy a Chart, install a foo2)?

- how do different teams, who work on independent components of a swath of services, push their independent components to the registry?

- how do we allow tools to work independently? Meaning, duffle, Helm, Docker-application, msix, foo2, shouldn’t have to know about the other artifacts to push to a registry?

- how do we sign artifacts, individually or as a group?

- how easily is it for the next artifact type to be supported by OCI registries? Does a registry owner need to do anything more than possibly know what icon to associate with the artifact type? Can the registry and tools easily understand it’s a foo2, compared to an image or other known artifacts?

- how does this all look when we have dozens and dozens of types? This is the file system directory comparison I’m trying to make with extensions.

 

The decisions we make today, will make an impact for a loooong time and a lot of people. We can either put a thing out there and let people interpret this on their own, and we wind up with wall of lockers, stuffed with awkwardly sized backpacks and duffle bags, all shoved in willy nilly, with the bag straps hanging out.

Take a look at :tags. Are they versions, platforms, architectures? All of the above? Can you do tooling to understand the tag in a consistent fashion?

 

I’m hopeful we can do better, and agree to a structure that can be easily interpreted by humans and tools alike.

Reply all
Reply to author
Forward
0 new messages