My Other Registry is a Proxy

date icon

November 16, 2022

author icon

Dan Mangum

read time icon

Reading time: 9 min read


LinkedIn icon
Twitter icon
Facebook icon

Building the Upbound Marketplace to Run Anywhere.

When we set out to build the Upbound Marketplace, we knew the functionality it offered would be useful in a variety of contexts. With that in mind, we architected a system that could fit into existing customer environments while still being able to scale to handle large amounts of traffic from the Crossplane community, which continues to grow at an impressive rate. Today we are going to take a deeper look at some of our design decisions, explore how they were implemented, and talk about the places we can go from here.

Just an OCI Image

One of the decisions we made early on in the Crossplane ecosystem was to standardize on OCI images for package distribution. This choice has paid dividends as we continue to leverage the features offered by the image and distribution specifications. One of the immediate benefits it offered was access: nearly every Crossplane user had access to a registry, and tooling to interact with them generically.

The generic aspect is not to be overlooked. When Crossplane packages were first being designed we considered leveraging OCI artifacts, which were in their early stages of development, as a solution. However, until recently, some registries did not support artifacts and some only supported some subset of them. In contrast, every registry implementation we have heard of Crossplane users interacting with has supported Crossplane packages from day one because they are just OCI images.

Thanks to the tremendous efforts of working groups in the OCI organization, much progress has been made around the standardization of artifacts, as well as the more recently introduced reference types. We are excited to explore how this work may be leveraged in the Crossplane community and the Upbound Marketplace in the future.

Making this decision wasn’t without a cost. Namely, we needed to be explicit about how Crossplane packages bundled as OCI images would be processed by any conformant Crossplane distributions. Like many open source projects, Crossplane is both an API specification and a reference implementation. Downstream distributions, such as Upbound Universal Crossplane (UXP), must handle Crossplane packages in a uniform manner. To enable this, the xpkg specification was proposed and ratified.

Diagram showing that a image manifest points to multiple layers, in this case with one annotated as base and one annotated as upbound.

Defining a dedicated standard in the context of OCI images allowed for some creative optimizations and extension points. For example, Crossplane Provider packages include both YAML content, which is processed by the Crossplane package manager, and controller binaries which ultimately get run as a container on a node in the cluster. Typically, the binary and supporting content is much larger than the YAML content, and because Crossplane only needs to consider the latter, a convention has been established to store the YAML content in a dedicated layer, which is annotated in the image manifest. Crossplane first fetches the manifest, and if any layer is annotated with io.crossplane.xpkg: base only that layer is fetched and processed.

The xpkg specification also defines extension points that can be optionally honored by third-party systems. Extensions can be useful if a given Crossplane distribution or registry wants to add custom functionality to packages. Defining extensions within the bounds of the OCI image specification decouples image storage and distribution from other features that may enrich the package management experience. This model provides the foundation for the Upbound Marketplace. If we take a look at the manifest for Upbound’s Official AWS Provider, we’ll notice not only the base layer, but also a layer annotated as upbound:

// =================================
// Layers omitted for brevity.
// =================================

This layer is not considered by the Crossplane package manager, but includes data that can enrich the Upbound Marketplace experience. In this case, the layer contains a YAML stream of examples that are processed when packages are pushed. The secret behind the Upbound Marketplace is that it is not an OCI registry at all (per se), but rather a proxy that sits in front of any OCI registry. This proxy is internally named shimmer – it is a registry shim that implements the OCI distribution specification but doesn’t handle storing blobs or manifests, it simply passes them along to another registry.

Bring Your Own Registry

This architecture provides a number of advantages. Perhaps the greatest benefit is that it, in tandem with Crossplane’s package specification, allows for flexibility in what registry is used to actually store and serve any given image. This is useful internally as we can easily migrate from one registry to another, or, even more interestingly, proxying across multiple registries. For example, for packages used internally at Upbound we may opt to proxy the images to the registry offered by the cloud provider where our services are running, while for some customers we may proxy to a registry running in a specific geographic region. Similarly, Upbound customers who operate in an air-gapped or heavily regulated environment could deploy an on-prem version of the Upbound Marketplace in front of their own internal registry, allowing them to experience all the benefits of automatic documentation, discovery, and searchability, without compromising on compliance.

Conceptual architecture of shimmer sitting in front of multiple OCI distribution conformant registries.

When packages pass through shimmer, in addition to deciding where to direct them, shimmer enqueues the image for parsing and indexing. Much like Crossplane’s package manager, shimmer’s indexer only fetches the layers that it needs. In this case, the base layer is required to serve documentation about the package and the API types it offers, while the upbound layer provides additional content that is only relevant in the context of the Upbound Marketplace. In other words, a package may support extensions for many different consumers, without increasing the overhead for any one consumer to process just the bits that they care about.

An alternative model to the shimmer proxy approach would be to utilize webhook functionality supported by some existing OCI distribution implementations. However, doing so would both eliminate guaranteed compatibility across all implementations (i.e. webhooks are not part of the OCI distribution specification), and restrict introducing custom behavior when pulling a package. Up to this point we have primarily been concerned with validating and indexing content from packages that are already formatted as OCI images, but sitting in between the client and some registry means that what happens when a user pulls can vary widely. In fact, an “underlying image” doesn’t even have to exist as long as content is returned to the user in a manner conformant with the distribution specification. This opens up some interesting possibilities, such as just-in-time (JIT) package building from source on pull… but that’s a post for another day.

Expanding the Use of Content Addressability

Much is made of the benefits of leveraging content addressability, the practice of referencing a set of data by the digest of its content, to improve security posture. However, two of the other major benefits of such a system are storage and caching. When data is stored with a content addressable identifier, subsequent uploads of the same data does not result in occupying twice the storage space. Instead the system will simply add another reference to the existing data.

Some systems will only partially deduplicate storage. For example, a multi-tenant service may only deduplicate data within the context of a single tenant.

While avoiding storing the same data twice is all about space, caching is all about speed. You have likely experienced the benefits of caching content addressable data if you have ever attempted to docker pull the same image twice. If you haven’t give it a try right now:

$ time docker pull
v0.20.0: Pulling from upbound/provider-aws
d2b9ba0977cf: Already exists 
0b257d565924: Pull complete 
78064cd52c78: Pull complete 
4cc7a770d0c9: Pull complete 
7003c509c3c8: Pull complete 
d985f78ccd60: Pull complete 
a53973be610e: Pull complete 
9b4b6ab8dc0d: Pull complete 
88815e5d0cbc: Pull complete 
90a5a437912a: Pull complete 
Digest: sha256:2c9d049f81e00ab8266d98180c59ab025621f1e6f47e37217e2db85079856a47
Status: Downloaded newer image for

real	0m9.164s
user	0m0.043s
sys	0m0.021s

$ time docker pull
v0.20.0: Pulling from upbound/provider-aws
Digest: sha256:2c9d049f81e00ab8266d98180c59ab025621f1e6f47e37217e2db85079856a47
Status: Image is up to date for

real	0m1.124s
user	0m0.027s
sys	0m0.011s

The second pull was drastically faster because we already had all of the layers cached. How did we know that? The client, in this case Docker, fetched the manifest associated with the v0.20.0 for, and because all of its layers were identified by the hash of their content, the client can evaluate against the locally stored content.

In reality, evaluating whether local content is up to date can be performed even more quickly by caching the manifest and checking whether its digest has changed via a HEAD request.

The behavior of the Upbound Marketplace is not so different from that of an OCI registry. It ingests content (Kubernetes manifests) and serves it to a wide variety of consumers via client applications (browsers). Much like a registry doesn’t want to store unchanged layers every time a new version of an image is pushed, we do not want to store the same manifests used to generate documentation every time a new package is pushed and indexed. To avoid doing so, shimmer stores manifests by digest, and generates a document that somewhat resembles an image manifest for each package version. For example, the resource manifests for the previously referenced upbound/provider-aws:v0.20.0 package can be observed here, and the content of a single manifest, Certificate, can be viewed here.

The reduction in storage is also paired with the ability to perform aggressive caching in many cases. Perhaps the best example of this is the written documentation you see when visiting one of the Official Provider pages. While currently an alpha feature, we intend to support providing documentation for any package in the Marketplace, once again leveraging the Crossplane package extension model to do so. The existing documentation is fairly lightweight, but the service is designed to support serving more extensive content, once again by leveraging content addressable storage.

Screengrab of the documentation page for the Upbound Official AWS Provider at v0.20.0.

The documentation is especially interesting because the Marketplace is built on Next, allowing us to Server-Side Render (SSR) most pages for efficient search engine indexing. However, with potentially large content, such as documentation and its supporting assets, we don’t want to load all of it up front and we want to fetch it as infrequently as possible. To achieve both of those goals, documentation is served from a content addressable URL on a Content Delivery Network (CDN) and fetched client-side. This enables caching the content both at the edge node of the CDN (i.e. a server close to the user) and in the browser itself. Invalidation is straightforward as the documentation index is rendered server-side, meaning that any change in documentation content will be reflected by a new content-addressable URL being supplied to the client. If that URL has been accessed before by the client, it may still be in the browser cache. If it has been accessed by another user in the region it may be cached at the edge node. Making a full round-trip to the origin server becomes an infrequent operation.

Where We Are Going

So where do we go from here? Ultimately, that is up to you! We are excited about continuing to build on the foundational architecture of the Upbound Marketplace. Features such as automatic identification of breaking API changes from one package version to another, searching by API types, and enhanced package consumption metrics are all enabled by the design decisions outlined in this post. If these features, or any others, are top of mind for you, make sure to create an Upbound account today and drop us a note. If you love writing and reading design documents, working on complex multi-tenant systems, and building highly scalable services used by platform builders all over the world, check out our open roles!

Subscribe to the Upbound Newsletter