๐Ÿ‘€ CNDO #31: Registry changes are coming

๐Ÿ‘€ CNDO #31: Registry changes are coming

Newsletter

๐Ÿ˜
Thanks to today's sponsor, Uffizzi
Uffizzi provides a multi-tenant Kubernetes-native foundation for platform teams. Every engineer gets self-service, sandboxed virtual clusters. It runs on our infrastructure or yours.
Spin up your cluster in under a minute on the Free Starter Tier at uffizzi.com
Preview Environments | The Definitive Guide
Uffizzi helps software team leaders approve code changes in half the time with live preview environments for every pull request. Test branches in isolation, then merge with confidence.

๐Ÿ—“๏ธ What's new this week

๐Ÿ”ด Live show: Cloud Native DevOps Q&A (#234)

This week's livestream is my monthly Q&A about cloud native DevOps. Bring your questions! Thursday at 13:00 US ET (UTC-4)

Cloud Native DevOps: Live Q&A (Ep 234)
Ask-me-anything this week. Iโ€™m focused on your cloud native DevOps questions: containerization, orchestration, automation, infrastructure, and more. A speciaโ€ฆ

๐ŸŽง Podcast #140 OpenSauced with Brian Douglas

We released a podcast last week (Sept 8) in which Nirmal and I talked with Brian Douglas of OpenSauced.

Brian, of GitHub fame, has founded OpenSauced, a cool web app and community of open source developers finding their next contribution and maybe their next job. Brian has many stories of working with open source projects and having conversations with leading open source contributors while previously being a lead developer advocate at GitHub. So we definitely spent time talking through some of those stories and learning some of that we didn't know about GitHub and open source being run on it. We then dig into how to use the OpenSauced platform he's creating to find your next open source project and get noticed by employers. 


๐Ÿ‘‰ How artifacts are changing the OCI registry

Last week, I wrote about the problem of all these package and artifact storage systems in a typical organization. I rarely work with a team with only one container registry, much less a single artifact storage solution.

Well, I think we're on the cusp of the OCI (Docker) Registry becoming the one artifact and package storage system to rule them all... eventually.

As a refresher, the OCI registry you know and love is full of two main data objects: Manifests (metadata) and layer blobs (for a container image, those are gzip tarballs). We've also got tags in the API that point to a manifest (which then may point to another set of manifests if you're building multi-arch images) and then points to one or more layers. Here's a "full-featured" example of the data relationships from Brandon Mitchell's recent OCI Distribution 1.1 RC talk on how registries relate data objects today.

The beginning of the OCI artifact sprawl

Helm was one of the first teams and CLIs to use registries to store non-container image data at a large scale. Helm now has OCI registry support by default in recent versions. It took advantage of the fact that the OCI distribution standard allowed for different media types in the image layers, even if it wasn't a true container image layer. Here are the Helm docs on how to use your existing registries to store charts.

As more people wanted to put more things in their registry besides containers, we started to see conference talks and official proposals about various tools moving to support "OCI artifact" storage rather than their traditional storage.

There are two prominent use cases for why this evolution of container registries is happening. ๐Ÿ‘‰ The first is for storing and connecting data related to a specific image (SBOM, CVE scan, image signing). ๐Ÿ‘‰ Second, are data objects semi-related (or completely unrelated) to containers (Helm, Tekton, Homebrew). They want to take advantage of the ubiquitous and content-addressable nature of OCI registries.

First use case: Image-manifest-adjacent artifacts

From SBOMs to image signing, there's a growing list of things directly related to a specific container image manifest. Primarily led by the software supply chain security movement, many artifacts now relate to an image in a registry that need to be stored somewhere for reference. Turns out OCI registries are good places to store those too.

That's already happened, even with the existing OCI Distribution 1.o spec. Although it's not ideal, current tools create the relationship between the image and the adjacent artifacts using various tricks like specially crafted image tags to connect the dots.

Recently, Docker's own image builder provides provenance attestations adjacent to the image it builds and will push those to the registry along with the image.

But, once we have all these signatures, scan reports, and SBOMs stored in the registry, how can we (more cleanly) find them with an official API to connect all the manifest pieces?

The solution to that problem is called the Referrers API, and it's planned for GA with OCI Distribution 1.1 that will hopefully be released in the 2nd half of 2023.

This had been in the works for a while, and I'll let OCI maintainer (and Docker Captain) Brandon Mitchell explain it better from a recent talk about the challenge of connecting all these new artifact types to an OCI image:

Modifying the Immutable: Attaching Artifacts to OCI Images - Brandon Mitchell, BoxBoat
Modifying the Immutable: Attaching Artifacts to OCI Images - Brandon Mitchell, BoxBoat, an IBM CompanyImages are now being pushed to OCI registries with moreโ€ฆ

2023 latest breakdown of the future Referrers API technicals. 44 minutes

Second use case: General artifact support

As we started deploying containers in mass, we needed other files and objects semi-related to those containers. Sure, we could always use S3 or Git, but you may not have access to those from production container clusters. Helm, Tekton, seccomp/selinux/apparmor, OPA/Gatekeeper, Flux, Wasm, Compose, and Pulumi all fall into this category, and are in various levels of supporting OCI registries as a distribution model.

Then there's the wild-wild west artifacts of "anything that needs common HTTP storage for distribution." Packaged artifacts often need the kind of guarantees that OCI registries can make, such as sha hashing everything and making data universally content-addressable. The Homebrew package manager is a great example, as it switched to using the GitHub Container Registry in 2021 to serve over 50 million packages a month. Here's an example of returning some metadata about a Homebrew package image. It's not perfect (notice it says it's an image in the mediaType) but it clearly works.

{"schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:3b7ebf540cd60769c993131195e796e715ff4abc37bd9a467603759264360664",
      "size": 1977,
      "platform": {
        "architecture": "amd64",
        "os": "darwin",
        "os.version": "macOS 13.0"
      },
      "annotations": {
        "org.opencontainers.image.ref.name": "3.40.1.ventura",
        "sh.brew.bottle.digest": "d3092d3c942b50278f82451449d2adc3d1dc1bd724e206ae49dd0def6eb6386d",
        "sh.brew.tab": "{\"homebrew_version\":\"3.6.16-97-ge76c55e\",\"changed_files\":[\"lib/pkgconfig/sqlite3.pc\"],\"source_modified_time\":1672237605,\"compiler\":\"clang\",\"runtime_dependencies\":[{\"full_name\":\"readline\",\"version\":\"8.2.1\",\"declared_directly\":true}],\"arch\":\"x86_64\",\"built_on\":{\"os\":\"Macintosh\",\"os_version\":\"macOS 13.0\",\"cpu_family\":\"penryn\",\"xcode\":\"14.1\",\"clt\":\"14.1.0.0.1.1666437224\",\"preferred_perl\":\"5.30\"}}"
      }
    }]}

Part of the Homebrew OCI manifest JSON for sqlite

Unlike my "first use case" above, these generic artifact types will likely not need the Referrers API to indicate a relationship to a container image directly, though they may find some internal benefit by using the Referrers API for other manifest-to-manifest relationships. That's yet to be seen.

But what these tools do need is a clear path for how to officially store their various artifact types in a registry. We didn't really have that before, and it always felt a bit hacky to overload the existing registry metadata objects of image and layers to store non-image and non-layer data.

As a workaround years ago, the ORAS project was created and eventually accepted by the CNCF. "OCI Registry As Storage" is both a CLI and a go library that lets you push/pull any data type you want into an existing OCI registry. It's quite popular and used in many other tools and cloud registries to store artifacts.

This idea of overloading the existing OCI Distribution 1.0 spec for "general artifact storage" had side effects. Mainly, support beyond container images doesn't work everywhere because some registries don't support various changes to the manifest. Also, many registry UIs don't know how to handle displaying these data types, often resulting in weird-looking image tags displayed, seeing unknown/unknown types, or the artifacts not being displayed in a UI at all.

The ORAS, CNCF, and OCI teams have a vision though. And this talk by Docker's Steve Lasker (who worked at Azure before Docker) is a great story of all the needs we have for artifacts and what they are doing about it. Note that some of the stuff about new features near the end of this 2022 talk is outdated in the implementation details as we get closer to the OCI Distribution 1.1 release, but it's still good to see the examples.

Distributing Supply Chain Artifacts with OCI & ORAS Artifacts

2022 KubeCon EU overview, vision, and walkthrough of OCI artifacts. 40 minutes

As we get close to OCI Distribution 1.1 release for the image spec and distribution spec, it looks like they will be adding a few additional fields to the manifest to improve the drawbacks of the previous generic artifact implementation, including a artifactType that lets tool creators define their own type that should eventually be supported by any registry that updates to OCI Distribution 1.1.

How can you take advantage of OCI artifacts now and be ready for future OCI changes?

Next week, I'll discuss that as I wrap up this multi-part series on OCI registries.


๐Ÿ—“ Next big thing๏ธ

๐Ÿณ DockerCon 2023 in Los Angeles, CA

Three weeks to go 'til DockerCon. My two talks have been scheduled.

Both talks will be on Wednesday, October 4, 2023: Docker GitHub Actions for Production-Ready Images will be 10:25 AM - 11:10 AM and Docker rocks in Node.js, 2023 Ed. will be 1:10 PM - 1:55 PM.

Pick up some of my fun 80s-themed swag before you head to DockerCon!

There's free [standard US] shipping on purchases made between Sept 12-17.
โšก๏ธ๐Ÿ’ก Here's an idea! If you're not in the US and you're going to DockerCon, consider buying something and ship it to your hotel in Los Angeles now and take advantage of the free shipping promo.

๐Ÿ‘€ In case you missed last week's newsletter

Did you miss last week's newsletter? Read it here.
๐Ÿ‘‰ Part 1 of my write-up about OCI Registries for everything