The Crossplane Resource Graph

December 12, 2021

Nic Cope

Read time: 5 mins

Share:

At Upbound we’re building your Universal Cloud Platform. Platform teams use Upbound to build bespoke cloud APIs and cloud consoles for the application development teams they support.

The Upbound Cloud Console

At the heart of Upbound is Crossplane, the open source cloud control plane we founded and donated to the CNCF. Crossplane can take anything with an API (like an AWS service, an SQL server, or a Kubernetes cluster) and let you build your own declarative API for it. Perhaps as a platform engineer you want to offer your application developers access to deploy and manage their own AWS RDS instances for example, but you want a very simple, opinionated API where only storage size is configurable.

Crossplane enables a simpler database provisioning API

Upbound helps you design and deploy Crossplane-powered cloud APIs. It also offers a cloud console to help manage them. Think AWS management console, but all the services and concepts are designed by you to suit your organisation’s unique needs. We can do this in part thanks to the Crossplane Resource Model, or XRM. While Crossplane encourages you to be opinionated about the content of your APIs (e.g. what knobs they expose) the XRM ensures the shape of your APIs is always consistent. Metadata is always distinct from desired and observed state, for example, while external identity, and secret properties like credentials are always presented one way. Crossplane is built on Kubernetes, and the XRM is an extension of the Kubernetes Resource Model (KRM).

A typical production API powered by Crossplane will consist of bespoke types designed by your platform team, as well as types installed by a Provider. The latter are called “Managed Resources” (MRs) and represent external systems like an AWS service. They are the building blocks used to create the former, which we call “Composite Resources” (XRs). Crossplane’s API is powered by the Kubernetes API server, which serves a JSON based HTTP REST API.

The API Server serves a JSON REST API

The Upbound Console’s API server queries can be expensive. Take for example a platform engineer who has defined a PostgreSQLInstance XR. Each PostgreSQLInstance is  composed of various MRs — say an RDSInstance, a DBSubnetGroup, and a SecurityGroup. If there were 10 PostgreSQLInstances the Upbound Console would have to make 31 REST API calls to show an application developer everything there was to know about these PostgreSQLInstances - one call to list the 10 XRs, then three calls per XR to lookup the MRs of which they are composed. To make matters worse, the responses to these requests always include every detail about the queried resource. Even if the developer just wanted to know whether or not they were all healthy, they would be forced to retrieve the entire JSON documents. These large queries get even slower when network latency between the Upbound Console and the API Server its querying increases.

Enter GraphQL. GraphQL is a strongly typed query language that is frequently used to build APIs. Compared to a typical JSON REST API a GraphQL API allows the caller to define the structure of the data it requires, allowing the server to omit any irrelevant data from the response. As the name suggests, GraphQL can query a graph of related types — so in our example above it would be possible to retrieve only the health of our 10 PostgreSQLInstances and all the Managed Resources of which they are composed in a single GraphQL query. GraphQL enjoys broad adoption. Invented at Facebook a decade ago, it’s now an open standard and powers high profile APIs like those of GitHub and Shopify. Often large organisations will build a GraphQL aggregation layer to tie their many microservices together.

An example GraphQL query with xgql

The Upbound Console team identified the benefits of GraphQL early on, but despite its broad adoption GraphQL has seen little use within the Kubernetes ecosystem. Perhaps this is due to a technology stack disconnect. While language agnostic as a standard, much of the GraphQL ecosystem favors TypeScript whereas the Kubernetes ecosystem favors Go. Nascent projects such as qlkube and kubernetes-graphql weren’t a good fit as they are primarily auto-generated wrappers around “built in” Kubernetes API types like Pod and Deployment. Upbound needed a solution to support the XRM’s more advanced type system, which has subtypes (e.g. RDSInstance is a subtype of Managed Resource) and dynamic types that are added and removed at runtime (e.g. by defining a new Composite Resource, or installing a Provider). We decided to build our own GraphQL representation of the XRM — a Crossplane Resource Graph.

The GraphQL schema of a Crossplane Composite Resource

xgql is Upbound’s implementation of the Crossplane Resource Graph. It’s open source, and part of our Universal Crossplane distribution. xgql is written in Go using gqlgen. This approach has two key benefits:

  1. gqlgen is “schema first” — it generates much of its server code from a schema.
  2. xgql can leverage the extensive caching support of the Kubernetes Go libraries.

GraphQL schemas are all about types and relationships. Designing the schema up-front was an excellent test of the XRM, forcing us to think deeply about an idiomatic GraphQL representation of concepts like Providers, Configurations, Composite Resources, Managed Resources, and the relationships between them. In fact doing so highlighted one or two small gaps in the XRM that we intend to address in upstream Crossplane. The resulting schema enables GraphQL queries like:

  • “Show me the ‘Ready’ status condition for the tree of resources created by this Claim.”
  • “Show me all events pertaining to provider-aws’s Managed Resources”.
  • “Tell me whether this is a Managed Resource or a Composite Resource”.
  • “Fetch the definition of this Composite Resource”.

Building xgql in Go allows us to use best-in-class Kubernetes client machinery. xgql maintains an in-memory “watch cache” for each unique caller, meaning it subscribes to API server updates rather than making tens or hundreds of REST API calls to resolve each GraphQL query. This often results in xgql answering queries 10x faster with a warm cache.

There’s a lot more to the Upbound Console, including another GraphQL layer that uses a novel approach to route queries to thousands of xgql instances. Let us know on Twitter if you’d be interested in another post about how we leverage GraphQL at Upbound. In the meantime if you’re building on the Crossplane API do try Universal Crossplane and xgql.

Finally, we’re hiring! We’re a fully remote company, deeply invested in open source, and recently closed a strong Series B to help us build a universal cloud platform. We have open engineering roles from frontend to distributed systems to management — apply today.

Subscribe to the Upbound Newsletter