Verifiable Kubernetes API Compatibility

September 6, 2022

Read time: 16 mins

Dan Mangum

Bringing structure and security to the Kubernetes type system with Crossplane packages.

Organizations running on Kubernetes have been steadily moving to a multi-cluster approach. The life-cycle of any individual cluster is relatively short and provisioning new clusters is a common occurrence. There are a variety of factors leading to this shift, including:

Availability: more clusters means that losing any one cluster is not catastrophic.
Isolation: while RBAC and Namespaces provide isolation within a single cluster, utilizing multiple clusters allows for cluster-scoped isolation, and physical isolation when necessary.
Scalability: depending on the nature of your workloads, the various components that make up a Kubernetes cluster may not scale to accommodate your use case. One such example is the API server struggling to handle a large number of Custom Resource Definitions (CRDs), as detailed in a recent post by my colleague Nic Cope.

This is not a comprehensive list, but the reasons listed are broadly applicable to many organizations. Mitigating these issues has been compelling enough to drive them to adopt a more complex architecture.

As organizations have moved into this new world, there has simultaneously been an explosion in the number of CRDs and controllers that are installed in a given cluster. This greatly enhances the capabilities of Kubernetes, but also increases the management burden of replicating the same custom API across many clusters. As a result, interacting with a Kubernetes cluster has become more like writing a program, but without any of the safety or guarantees offered by a modern programming language. In this post, we’ll explore why a lack of API compatibility can lead to dangerous and unpredictable behavior, and how Crossplane packages make it possible to present a verifiably consistent programming interface across multiple clusters.

The Perils of an Unknown API

What can actually go wrong when API’s diverge across clusters? There are multiple possible outcomes, listed below in order of increasing severity.

The “Early Conflict”: an object being created violates the type definition (CRD) in one cluster, but not the other. This is the best case scenario, but still presents a poor user experience to the developer.
The “Hope I’m Not On-Call”: The object is successfully created in each cluster, but only reaches a healthy status in one due to divergence in reconciliation logic, which is decoupled from the definition of the type.
The “Sleeper Agent”: The object is successfully created in each cluster, and both reach a healthy status, but reconciliation logic has performed different operations in response.

In each of these scenarios, a stronger type system could surface these issues prior to creating the object. We are specifically looking for the following attributes:

Verification that type definitions are equivalent across clusters.
Verification that reconciliation logic is equivalent across clusters.

Crossplane packages can offer these guarantees, but before exploring how, let’s examine how the Kubernetes type system compares to that of popular programming languages.

What’s a type?

Despite the ever-growing API surface area and the movement to a multi-cluster world, we have not seen a fundamental change in Kubernetes architecture. This is mostly a good thing; the stability and predictability of Kubernetes is a major factor in its widespread adoption. However, when we drastically change how we consume Kubernetes, we must also consider the impact that has on how we configure it. One of the most pronounced differences in configuring a Kubernetes cluster in a multi-cluster world is the scoping of types.

Kubernetes comes with some built-in types - Pods, Deployments, etc. - and allows you to define your own with CRDs. This is not dissimilar from most programming languages: there are built-in primitive types, such as boolean, integer, etc., but there are also user-defined types, which may be defined via a built-in type, such as a struct.

I have argued that types defined by CRDs directly are, or should be, considered built-in types, but that’s a conversation for another day.

User-defined types are typically scoped - they are identified relative to where they are defined. For example, in the Go programming language I can define a MyType in the cooltypes package. I can also define a MyType in the okaytypes package. The identifiers for these types internally are cooltypes.MyType and okaytypes.MyType, allowing them to be distinguished when consumed.

package cooltypes

// MyType is a user-defined type.
type MyType struct {
	Message string
	ID int
}

package okaytypes

// MyType is a user-defined type.
type MyType struct {
	Message string
	ID int
}

Zooming out even further, the packages are hosted somewhere on the internet, or at a location on my local machine. This adds further specificity to their scope, guarding against any collisions with another cooltypes package that may be hosted elsewhere. However, with these two types having identical definitions, do we really need to distinguish one from the other? As with many topics in computing: it depends.

Defining Equivalence

There are two popular mechanisms for defining equality in a type system: nominal equivalence and structural equivalence. Nominal equivalence defines two types as being the same if they have the same name. For example, in the following example, the types of variables a and b are considered equivalent, but the type of c is not considered equivalent to a or b.

package main

import (
	"fmt"
	"reflect"

	"github.com/upbound/blog-posts/cooltypes"
	"github.com/upbound/blog-posts/okaytypes"
)

func main() {
	a, b := cooltypes.MyType{
		Message: "hello",
		ID: 1,
	}, cooltypes.MyType{
		Message: "world",
		ID: 2,
	}
	c := okaytypes.MyType{
		Message: "goodbye",
		ID: 3,
	}
	fmt.Println(reflect.TypeOf(a) == reflect.TypeOf(b))
	fmt.Println(reflect.TypeOf(b) == reflect.TypeOf(c))
}

Output:

true
false

This is because Go, like many languages, uses nominal equivalence when evaluating equality of types. Importantly, the name of the type in this context is referring to the fully qualified name. If we compile this program and look at the symbol table, we can see each of the two unique type names defined:

$ readelf -Ws main | grep MyType
  1442: 000000000047dfe0   100 FUNC    GLOBAL DEFAULT    1 type..eq.github.com/upbound/blog-posts/cooltypes.MyType
  1443: 000000000047e060   100 FUNC    GLOBAL DEFAULT    1 type..eq.github.com/upbound/blog-posts/okaytypes.MyType

In a language that uses structural equivalence, such as TypeScript, OCaml, Haskell, and more, cooltypes.MyType and okaytypes.MyType would be considered equivalent because their members (i.e. Message and ID) are equivalent. In fact, Go uses structural equivalence when evaluating whether a given type implements an interface (i.e. the structure of the methods are equivalent).

In Go, dependencies are pinned at the module level (we are going to skip over the new workspaces flow added in 1.18 for this post). When a new dependency is added, the go.mod file is updated with a reference to the location of the module, and the digest of the content of the package (and any transitive dependencies) is added to the go.sum file. This makes it such that you, or anyone else who is adding code to the module, are able to verify that the type you are creating an instance of is exactly the same as it was last time, barring any change in the go.mod or go.sum files. It also means that we can easily verify whether two types in separate modules are equivalent by ensuring that they are nominally equivalent, then comparing the checksums of their source modules. This is not something that is typically required when writing programs because creating instances of types is typically scoped to a single code base (i.e. module).

Kubernetes also uses nominal equivalence in its type system, with each type having a name consisting of its Group, Version, and Kind (GVK). However, unlike a statically compiled binary, Kubernetes has types dynamically added and removed at runtime. This means when you are programming against a cluster’s API, the name of the type you are creating an instance of does not fully reflect the type’s identity—the API may have been modified since the last time you interacted with the cluster, and changing the source or content of the type is decoupled from creating new instances of the type. Furthermore, unlike writing a program, there are typically multiple personas involved when configuring and consuming a Kubernetes cluster.

This becomes an even larger problem when multiple clusters are involved, as the constraint of only one type existing for a single GVK is no longer upheld. When we are treating clusters as ephemeral, the inability to verify whether two types with the same name are equivalent becomes an issue, as the semantics of the “language” we are using is changing beneath our feet. A multi-cluster environment is akin to programming across multiple Go modules, where someone else may be changing types, and the only way to ensure consistency is by a literal byte-to-byte comparison.

In fact, the situation is even more complex than described thus far. While we have primarily been discussing their structure (i.e. definition), types also have associated behavior. In programming languages this is frequently represented as methods that may be invoked imperatively on an instance of the type. In a Kubernetes cluster, associated behavior takes the form of one or more controllers that are invoked based on the declarative life-cycle of the type. Despite these controllers being dependent upon the existence of the types they reconcile. Kubernetes does not natively create a strong association between types and their controllers. Because of this, even comparing the content of a CRD with another with the same GVK does not necessarily indicate equivalence in both structure and behavior.

In an ideal world, we could verify whether creating an instance of a type in one cluster would result in the same operations being performed as in another cluster.

Crossplane Packages and OCI Images

Crossplane packages (xpkgs) are how new types are added to Kubernetes clusters where Crossplane is installed. Their specification is a subset of the OCI image specification (i.e. xpkgs impose additional requirements), which allows for users to take advantage of the vast ecosystem of tooling and distribution.

OCI images are made up of content and manifests. Traditionally, the content includes filesystem changeset layers (i.e. blobs), which can be extracted to form a complete filesystem for a container at runtime. Manifests tie these layers together, specifying how to obtain the content, as well as the order in which the changesets should be applied. Crossplane takes advantage of this flexible model by assigning significance to content in the blobs, specifically a package.yaml file that exists in the root of the final filesystem. This file is a YAML stream containing package metadata, and all the types that Crossplane should install. Crossplane is indifferent as to the content of the rest of the image.

A Provider or Configuration package can be installed by creating an instance of the Provider or Configuration type in the cluster with a reference to the package image.

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-aws
spec:
  package: xpkg.upbound.io/upbound/provider-aws:v0.11.0
  packagePullSecrets:
    - name: package-pull-secret
    

When installing, Crossplane first fetches the digest of the package image manifest, and creates either a ProviderRevision or ConfigurationRevision, depending on the package type, with a name constructed from the manifest digest. Next, the JSON content of the manifest is fetched. As mentioned previously, the manifest contains a list of content blobs in the form of descriptors. A descriptor contains the media type, size, and digest of the layer. It may also include annotations, among other fields, with additional information.

Note that both the manifest, and descriptors in the manifest, may include annotations. In this case we are talking about annotations on descriptors in the manifest.

If the one of the layer descriptors in the package image manifest has a io.crossplane.xpkg: base annotation, Crossplane only fetches that layer, then extracts the package.yaml file from it. If none of the layer descriptors in the manifest have the annotation, all layers in the manifest are fetched, the full filesystem is constructed, then the package.yaml is extracted from the root directory.

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": 7498,
    "digest": "sha256:07c5de8a4b93a2f548bb3e88ff682cdedeb1fe713f208cd0ced7300e70e21da5"
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 3068086,
      "digest": "sha256:c31dc85b098bd8ff0742051c85528ab36f7f24043438087cdbc726d1d0cb0a1e"
    },
	...
  ## Intermediate layers omitted for brevity ##
	...
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 22054194,
      "digest": "sha256:5a4239fc85edc18ca835a2159cb143ed063d88ed64307ad501f1740aea068b50"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 936833,
      "digest": "sha256:e897802575bbd7afe37fe086292f64dcb02d0884429d8765482c94b3dc31d502",
      "annotations": {
        "io.crossplane.xpkg": "base"
      }
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 43948,
      "digest": "sha256:a346c3baba3935f3256f4ca7a235885e07988cba3a507e311ae13a4bd48af799",
      "annotations": {
        "io.crossplane.xpkg": "upbound"
      }
    }
  ]
}

Note that this manifest also includes an io.crossplane.xpkg: upbound annotation, which is an xpkg extension. We’ll cover extensions in a future post, but you can read about how to implement your own in the specification.

Crossplane parses the contents of the package.yaml file, caches it, then begins installing the package. If the package is a Provider, the package.yaml contains metadata and a stream of CRDs, Validating Webhook Configuration, and Mutating Webhook Configurations. If the package is a Configuration, it contains metadata and a stream of Composite Resource Definitions (XRDs) and Compositions. Before applying the objects to the cluster, Crossplane first adds the package revision to a singleton cluster-scoped object of kind Lock. This object plays a similar role to go.mod and go.sum, defining the packages that are installed in the cluster, the dependency relationships between them, and information about their content.

For example, installing xpkg.upbound.io/upbound/provider-aws:v0.11.0 would result in the following entry in the singleton Lock, where the package revision reference, type, and name are defined. The name includes the resolved short digest of the referenced image (e1dde2d0d249).

apiVersion: pkg.crossplane.io/v1beta1
kind: Lock
metadata:
  name: lock
packages:
- dependencies: []
  name: provider-aws-e1dde2d0d249
  source: xpkg.upbound.io/upbound/provider-aws
  type: Provider
  version: v0.11.0
  

Installing objects from a package into the cluster consists of first verifying that the given revision can obtain sole control of each object, and, if so, applying them to the cluster with a controller owner reference to the package revision. This ensures that in a cluster where the Kubernetes API surface is only expanded via Crossplane packages, we can ensure the only one package can ever act on a given type.

This is especially important with Providers, as installing a Provider not only adds new types to the cluster, but also implements each type’s corresponding functionality in the form of a Deployment that runs controllers to watch and take action based on the lifecycle of instances of the installed types. Unless overridden, Crossplane and the controller image from the same package are the only system components that are given permission on any installed type.

(Content) Addressing the Issue

A more subtle advantage of being built on top of OCI is the usage of a content-addressable API. Just like any OCI image, Crossplane packages can be installed by digest rather than by tag (e.g. v0.11.0) to ensure that the content that is installed matches what was expected.

If you don’t think pulling by digest is important, I encourage you to listen to Jon Johnson describe all of the reasons why you can’t trust a registry when pulling by tag.

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-aws
spec:
  package: xpkg.upbound.io/upbound/provider-aws@sha256:e1dde2d0d249fb43d984b09ddcfb0146954a7507cbd5966e1157b0f06236e0f9
  packagePullSecrets:
    - name: package-pull-secret

As seen below, the package revision still uses the short digest in the name (e.g. e1dde2d0d249), but the version corresponds to the full digest of the package image.

apiVersion: pkg.crossplane.io/v1beta1
kind: Lock
metadata:
  name: lock
packages:
- dependencies: []
  name: provider-aws-e1dde2d0d249
  source: xpkg.upbound.io/upbound/provider-aws
  type: Provider
  version: sha256:e1dde2d0d249fb43d984b09ddcfb0146954a7507cbd5966e1157b0f06236e0f9
  

Pulling by digest guards against a number of potential issues and vulnerabilities, including tags being updated to point at a different manifest, or a malicious party intercepting requests. However, it does make utilizing semantic versioning for upgrades and dependency resolution more difficult. It would be nice if we could have the convenience of using versions when referencing packages, but ensure that the content is still what we expect, similar to the guarantees we get with a programming language like Go. Fortunately, Crossplane does not force you to choose between convenience and security. When installing a Provider or Configuration, users can choose a revisionActivationPolicy of either Automatic or Manual.

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-aws
spec:
  package: xpkg.upbound.io/upbound/provider-aws:v0.11.0
  packagePullSecrets:
    - name: package-pull-secret
  revisionActivationPolicy: Manual
  

If set to Manual, Crossplane resolves the image tag to its digest, but waits until the revision is manually updated from Inactive to Active before continuing with installation. This allows the human to interact with human-readable identifiers (tags), while still relying on the machine to verify the content.

$ kubectl get providerrevision
NAME                        HEALTHY   REVISION   IMAGE                                          STATE      DEP-FOUND   DEP-INSTALLED   AGE
provider-aws-e1dde2d0d249   True      1          xpkg.upbound.io/upbound/provider-aws:v0.11.0   Inactive                               38s

Using this strategy is particularly useful when installing or updating packages across clusters where a consistent API is desired. It also means that if package contents are lost from the cache and the package source has updated the tag to point at a different manifest, Crossplane will wait for approval before rolling to a new revision of the package.

Combining Manual revision activation with packagePullPolicy: Always can be used in interesting ways, such as pinning to a channel tag (e.g. latest) and letting Crossplane create new Inactive revisions any time the tag is updated to point at a new manifest. It is highly recommended that users consult the packagePullPolicy documentation before enabling this functionality.

Most importantly, using a content-addressable API and a unit of installation (a package) that contains both type definitions and their corresponding reconciliation logic enables the verifiable equivalence of types across clusters we are seeking. However, though we do want to couple definition and behavior, sometimes it is advantageous to be able to verify them independently. Let’s take a look at a scenario where this would be useful.

Case Study: Multi-Arch Bundled Provider Packages

In the v1.6.0 release, Crossplane added support for bundled Provider packages. Prior to this change, Provider package metadata was required to include a reference to the controller image for the types included in the package image. Bundled packages allow for omitting this reference, and instead building the package content and controller machinery into a single image. The previous system of decoupling the images had a number of advantages:

Crossplane only needed to download and cache the package contents from the package image, while the often larger controller image was only pulled by the node that the Deployment was scheduled to.
If only the content of the package changed, or only the content of the controllers changed, the digest of the other would remain the same, making it simple to determine if changes occurred to the API surface, the controller logic, or both. This is especially useful when evaluating whether an upgrade includes breaking changes.

Actually performing the verification in (2) proved to be cumbersome, as the digest of the package image would change if its controller image reference changed. The ability to change the controller image reference at runtime via a ControllerConfig somewhat mitigated this issue, but is not a recommended pattern in production settings.

However, the flexibility of the OCI image specification, as well as the formalization of the xpkg specification in the subsequent v1.7.0 release, made it possible to achieve the above benefits while also reducing the complexity of managing multiple images. As previously described, Crossplane packages may annotate a descriptor with the io.crossplane.xpkg: base annotation to indicate that the package manager should only consider that single layer when extracting package contents. In doing so, Crossplane does not increase the amount of data it maintains in its package cache, and, importantly, can easily verify if two packages install an equivalent set of type definitions.

This functionality is useful in a number of scenarios, such as verifying whether a patch release of a Provider presents the same API as previous patches for the same minor version. One area where this is especially useful is multi-architecture Provider images. Prior to supporting bundled Provider images, there was no need to publish variants of package images for multiple platforms as they only contained YAML content. However, when the controller and package images are bundled into a single image, ensuring that the package content layer is consistent across variants for each platform is critical. It would be highly confusing if installing a package on a linux/arm64 host resulted in different types being installed than when installed on a linux/amd64 host.

Multi-architecture OCI images are supported by image indexes (sometimes referred to as manifest lists). When you pull a multi-architecture image, your client is first hitting an image index, then resolving to the manifest for your host platform, before finally downloading the layers specified in that platform-specific manifest. The image index for the xpkg.upbound.io/upbound/provider-aws:v0.11.0 image we have been using throughout this post looks like this:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size": 1988,
      "digest": "sha256:bce75989b8984d4e80c39f2251acfa4a4b5deec775dff0728645767da93694c3",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size": 1988,
      "digest": "sha256:0906d635658cbc64a844029bdcd8ddcb12dcc92ef7724c3e1f3586a7051d4b0d",
      "platform": {
        "architecture": "arm64",
        "os": "linux"
      }
    }
  ]
}

As expected, the digest for the linux/amd64 image manifest does not match the linux/arm64 image manifest digest. However, if we were to examine the layers for each of these manifests, we would see the final two layers are identical:

...
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 936833,
      "digest": "sha256:e897802575bbd7afe37fe086292f64dcb02d0884429d8765482c94b3dc31d502",
      "annotations": {
        "io.crossplane.xpkg": "base"
      }
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 43948,
      "digest": "sha256:a346c3baba3935f3256f4ca7a235885e07988cba3a507e311ae13a4bd48af799",
      "annotations": {
        "io.crossplane.xpkg": "upbound"
      }
    }
  ]
}

The ability to both couple the definition and behavior of types (image digest), as well as decouple the verification of the definitions (base layer digest) is critical in ensuring the integrity of multi-architecture images. When pushing multi-architecture packages to Upbound’s registry (xpkg.upbound.io), different digests between the base layers in each variant cause packages to be rejected during verification.

Where We’re Going

This post outlines ways in which Crossplane enables a more robust Kubernetes type system, but the current functionality only serves as a foundation for future tooling and machinery, similar to how OCI images have helped enable more secure distribution of containerized applications, and as we’ve seen today, any content. A small sample of improvements that can be made to this system includes:

Supporting installing by type, rather than by package.
Verifying controller logic on a per-type basis.
Verifying which types are not equivalent across packages, rather than just that some types are not.
Supplementing the apiVersion and kind of objects with the digest of the type and its corresponding controller logic.
Supporting package image signature verification.

If you are interested in joining Upbound in doing our part to enable this future, check out our open roles.

—

Upbound is the company that invented the open source Crossplane project. Our mission is to help customers build Internal Cloud Platforms using control planes. Every Upbound customer gets access to Upbound Universal Crossplane (UXP) and Official Providers so they can build their Internal Cloud Platforms with Crossplane confidently. UXP is Upbound’s downstream distribution of Crossplane, and Official Providers are production ready versions of Providers available exclusively to Upbound customers. Both are maintained, tested, and supported by Upbound on behalf of our customers, and are included with an Upbound subscription.