Mastering API Evolution: Best Practices for Crossplane XR Design

December 3, 2024

Read time: 7 mins

Nic Cope

Background

Crossplane lets you define your own platform APIs. We call these composite resources, or XRs.

Platform teams use XRs to build abstractions on cloud APIs. For example, instead of giving their developers access to AWS’s RDS API, the Acme platform team can give their developers access to an AcmeDatabase API.

Under the hood, Crossplane translates AcmeDatabase API calls to RDS API calls. Developers don’t need to be concerned with this though. Developers can think about databases in the terms that the Acme platform team frames them.

You define an XR API by authoring an OpenAPI v3 schema. The schema tells the API server what fields an XR API request can have, and what values are valid for those fields.

Platform teams seldom design the perfect API on the first try. Eventually they need to evolve their API, for example by adding, removing, or renaming fields.

You include a version when you specify your schema - e.g. v1alpha1. Users specify this version when interacting with the API. Many platform teams see this and assume the best way to evolve their APIs is to introduce a new version - e.g. v1alpha2, v1beta1, or v1. We at Upbound are frequently asked how to do this.

I want to make the case that folks overestimate what they can do with a new version. I believe platform teams would be better served by putting in the up-front effort to design a future-proof API that they can evolve without breaking changes.

How XR Versions Work

You teach Crossplane about a new type of XR by defining it - providing its schema. You do this using a Composite Resource Definition (XRD). An XRD is really a thin wrapper on a Kubernetes Custom Resource Definition (CRD).

This means changing a Crossplane XR schema is really the same problem as changing a Kubernetes custom resource (CR) schema.

There were two great talks on changing CR schemas at Kubecon Salt Lake City:

I found both talks to be pretty advanced. I’ll attempt to simplify here, with a focus on the parts that impact XRD authors most.

The key thing that folks new to CRDs miss is that two different CRD versions are just two different views into the same data.

I found this really surprising when I first learned about CRDs, but it makes a lot more sense when you think about what a CR is. It’s a JSON REST API backed by a database.

Forget Kubernetes for a moment and imagine you’re building a web app. The app serves a JSON REST API backed by data stored in a PostgreSQL database. Let’s say you can create a new Widget in this API by doing an HTTP POST to https://example.org/v1/widgets, and list Widgets by doing an HTTP GET to https://example.org/v1/widgets.

Let’s say a Widget looks like this:

Imagine that when you GET or POST Widgets, your web app is just reading and writing this JSON blob to the Postgres database.

Now imagine you want to introduce a new version - https://example.org/v2/widgets. You can’t stop serving https://example.org/v1/widgets though. That would break your existing users. In fact, whenever someone interacts with the v1 API, those changes must appear in the v2 API and vice versa.

This is exactly how Kubernetes CR versioning works. It’s therefore exactly how Crossplane XR versioning works.

Let’s say your CRD has two schema versions - v1alpha1 and v1beta1. You mark one of these schemas as the “storage version”. Kubernetes will use this schema when storing CRs in its database - etcd. If you mark v1beta1 as the storage version:

Kubernetes converts all v1alpha1 writes to v1beta1 before writing them to the database
Kubernetes converts all v1alpha1 reads from v1beta1 after reading them from the database

This means all versions of a CR must be “round-trippable”. It must be possible to convert from v1alpha1 to v1beta1 and back without any data loss.

The Implications of Round-Tripping

Introducing a new CR version lets you make breaking schema changes, but not as breaking as you might think. It’s not like introducing a new version of a software library. Your new version needs to be round-trippable to all old versions. This has some implications:

You CAN rename a field - e.g. spec.widgets to spec.widgetCount
You CAN move a field - e.g. spec.shape to spec.properties.shape

However…

You CAN’T drop a field that was required by an old version
You CAN’T introduce a new required field that doesn’t exist in older versions¹

Doing either of these would prevent the CR being round-tripped. If you drop a required field in the v2 API, that resource can’t be stored as a v1 API object (where the field is still required). If you introduce a required field in the v2 API, any resources created by the v1 API (where that field isn’t required) aren’t valid in the v2 API.

There’s also more complex operations, like turning a scalar field in v1 of the API into an array in v2 of the API (e.g. spec.widget: “foo” to spec.widgets: [“foo”, “bar”]. Operations like this are possible, but require careful and unintuitive updates to v1 (in short, v1 has to have both spec.widget and a spec.widgets that’s a superset of spec.widget).

The key takeaway here is that you don’t get much from a new version. Introducing a new version is only useful if you want to rename or move fields.

Evolving Your XR

At this point I hope I’ve made the point that there’s diminishing returns in introducing a new API version. So what do you do instead?

You should design APIs that you can evolve with backward compatible changes. APIs that will seldom or never require breaking changes.

We know this is possible. Evolving an XR is the same problem as evolving the Crossplane or Kubernetes APIs. Kubernetes has evolved the Deployment v1 API since it was released 7 years ago in Kubernetes v1.9, but none of those changes has required a v2 release. Crossplane has been evolving its core APIs since its v1.0.0 release 4 years ago.

Kubernetes has documented very detailed API conventions and API change best practices. I’ll summarize the ones I think about most when designing Crossplane APIs. I think they’ll work well for XR API designers.

Use Required Fields Sparingly

When you add a required field, assume you’ll never be able to remove it. Does it really need to be required? Could it be optional with a default?

Think Twice About Boolean Fields

The Kubernetes API conventions say it best:

Many ideas start as boolean but eventually trend towards a small set of mutually exclusive options. Plan for future expansions by describing the policy options explicitly.

Instead of booleans, consider a string enum field.

Imagine you’re developing a new Widget API. Widgets can only be regular or fast, so you add a fast boolean field:

What happens if later you want to support super fast widgets, or slow widgets?

If you instead use an enum field, you can add more options later:

If In Doubt, Use an Array

Is there a chance a field could have multiple values in future? Could you imagine spec.widget: “foo” becoming spec.widgets: [“foo”, “bar”]? If so, start with an array.

Worst case, you’ll end up with 99% of users using a single element array forever. Best case, it’ll become common to need to specify multiple elements and you’ll have saved yourself and your users a lot of pain needing to introduce a new array variant of the field.

Leave Room for Variants

Often after your API has been in production for some time you’ll realize you need to support a variant.

Let’s say you have an AcmeDatabase API that was originally designed to model PostgreSQL database instances, but now you want that same API to support MySQL too. You can plan for this eventuality by:

Putting everything specific to Postgres in its own object - e.g. spec.postgres
Using a “toggle” field to toggle that object “on” and others off

In practice this looks like this:

In the example above the top level spec.postgresql field is optional, unless spec.engine is set to PostgreSQL.

Worst case, you never introduce MySQL support and some API fields are slightly more nested than they need to be.

Best case, when you need to support MySQL you can do that in a backward compatible way by adding a new engine, and a new spec.mysql top-level field:

Peer Review Your APIs

Finally, ask a coworker to review your API before you commit to it. Sketch out some example claims and show them to your customers, the people who’ll be interacting with your APIs. Ask them how they can foresee their needs from the API changing in the future.

Summary

Introducing a new XR version is trickier than it might seem. Once you know how it works the nuances and limitations make sense, but it’s not always intuitive. I hope this article has helped clarify what you can and can’t do by introducing a new XR version.

My advice is to avoid introducing new XR versions if you can. A little up-front design thinking goes a long way toward building a future-proof API schema that you can evolve over time without needing to make breaking changes.

Crossplane XRs are powered by Kubernetes CRDs. Check out this article to learn more about CRDs and how they work.

If you’d like to have our team review your API’s, let us know here. We can hop on a call to talk through how Upbound can help you with your specific use case today.

Get Help with API Evolution

¹ Okay technically you can, but it’s nuanced. You have to add the field to the old API version too, to avoid losing data when someone reads and writes back a resource using the old API. You can’t make it required in the old API though - that would be a breaking change. So you need to add it to the old version as an optional field with a default value. With all of this in mind I’d argue there’s diminishing returns in adding a new version to make a field required.