TACOS Providers Review
This blog post was originally posted on medium.
This article will help you understand what TACOS are and select a provider based on their capabilities. SREs, Ops Engineers or Architects will find this content useful.
If you are a passionate cook or like to know how your cooking skills can help you with Infrastructure as Code, you will be disappointed.
Infrastructure as Code
IaC is a common pattern where virtualized infrastructure and auxiliary services can be managed using configuration expressed in almost any language, usually hosted in a source code repository.
IaC enables automated, repeatable and reliable creation and maintenance of any virtualized infrastructure. This is especially important with creating environments on-demand as well as managing infrastructure on multiple providers.
Although IaC is a commonly used solution these days, Crossplane, the open-source project contributed by Upbound, as well as Upbound's Universal Crossplane solution, offer a different way to think about your infrastructure and ultimately avoid many of the hassles with Terraform. This blog post covers an optimization for using Terraform and the TACOS methodology, but I encourage you to take a look at both Crossplane and Upbound to learn how you can future-proof your platform and avoid TACOS all together!
Hashicorp’s Terraform and the open-source ecosystem built around it is a common technology for Infrastructure as Code IaC. Standalone terraform workflow is great but quickly becomes unmanageable when used at scale.
If you want to learn about our alternative cloud-native, Kubernetes powered tool moving towards Infrastructure as Data, check out my recent blog about Crossplane.
A typical simple implementation of a standard terraform workflow could consist of:
- integrating Terraform CLI into CI/CD pipelines by utilizing a terraform runner or a standalone container
- configuring terraform remote backend to store state enabling collaboration between different teams
- state file needs to be locked if concurrent runs are enabled to avoid overrides when running in a single environment
The above steps are possible with the standard to terraform tooling, but some challenges need to be addressed along the way too:
- integration with common development best practices, like code reviews, updates via Merge Request, handling Feature Branches
- RBAC; as the solution matures, different roles can perform different tasks in different environments, such as apply only, deletion etc
- Audit history — who runs what, what was the result, ability to revert to the previous state
- Infrastructure governance & policies for example using OPA or Kyverno
- Enabling of efficient self-service for Dev/QA teams, like creation and destruction of testing environments
Doing IaC at scale in the right way is hard! This is even harder if done on-prem or hybrid scenarios. Doing so means investing significant time and resources to design, develop and maintain the solution.
Time for TACOS
TACOS stands for
Terraform Automation and COlaboration Software
It provides a framework for solving the problems with operating IaC at scale.
Benefits of using TACOS
Typically a SaaS product provides a uniform layer of abstraction by integrating:
- terraform runtime environment, state, history, secrets & variables management,
- RBAC — fine graded permissions
- policy management for particular environment managed by Terraform.
Remote operation mode
One benefit comes from the remote operation mode of the runtime concerns. Important to note is that any vendor that offers cloud workspaces also typically offers the ability to set up runners in your environment (on-prem or cloud), so only the runtime and UI part is on the cloud. The actual deployments and configuration/data can be safely managed behind a firewall.
Remote State management
Another benefit comes from the ability to manage different environments created with different versions of terraform
Remote state management is optional. We can store the remote state in a cloud provider such as an S3 bucket or on-prem and only use the cloud workspace.
Roles based access control for who can plan and apply to terraform runs on project or group level. Ability to manage access on the workspace level.
TACOS provide better visibility on what has happened in the environment in regards to changes made by Terraform during the whole lifespan. Searching through many runs of pipelines to see what resources were added, modified, deleted becomes easier. This info is easily accessible in TACOS also with information on who or what triggered that action. We can also see the history of Terraform plans.
Policy as Code
Policy as Code can be also leveraged by TACOS which would improve governance and security. TACOS can utilize tools like Open Policy Agent which would work by blocking a Merge Request of non-compliant terraform code to the main branch. These policies can be reused between Kubernetes cluster policies.
Recommended TACOS flow
The below diagram shows a recommended TACOS flow with GitOps principles.
It is worth pointing out that instead of communicating with the TACOS provider directly via Web UI, it is also possible to use CLI or REST API webhooks.
The diagram captures only the infrastructure provisioning part. Once the VMs or other infrastructure are ready, the workload deployments can start. Applications deployment can be triggered by the TACOS provides, but it should be a separate pipeline.
Most of the TACOS providers offer a self-hosting option with TACOS runners behind a firewall. Storing variables and secrets for pipeline triggering, SSH credentials etc can be done either in a self-hosted vault or TACOs provider vaults.
TACOS Providers Overview
Terragrunt is not a TACOS but it’s probably the first open-source thing we would find trying to search for terraform automation. It’s another binary that adds a testing layer on the top of plain terraform. The main goal of Terragrunt is to keep terraform code DRY. Not only in my opinion it was a relevant tool where terraform itself wasn’t mature enough. Good usage of Terraform modules and having a good infrastructure setup structure and configuration are solving these concerns.
Atlantis is also not a full TACOS but it was the first open-source tool that tries to add terraform automation to PRs. Webhooks from PRs with Terraform code change can be configured to communicate with Atlantis binary (which must be hosted within the infrastructure) where terraform plan and eventually apply can be run. It gives output back to PR for visibility and the process can be also configured that only PR with a successful terraform plan can be merged.
This functionality is currently available in all TACOS (via VCS flow).
Terraform Cloud TFC/Terraform Enterprise TFE
An offer from Terraform original inventors — HashiCorp. Both of them can provide the same functionality, the main difference is in the hosting schema. TFE is a private installation, where TFC is a classic multi-tenant SaaS offering. It covers all areas mentioned at the beginning of this article. I am going to use it as a reference solution and I will mention differences in other products.
A useful concept here is also Notifications which can trigger webhooks, send emails or notify slack channel after various events in a workspace.
Scalr has a very comparable offering to TFC/TFE, all main features are included. They were openly advertising themselves as a replacement for TFC/TFE but with a fair price policy. They provide multi-tenant SaaS and self-managed solutions as well.
Scalr has a concept of Custom Hooks which can enhance terraform workflow. It can run other terraform commands (like fmt), shell scripts and API calls before and after terraform plan or apply respectively.
Scalr also has classic webhooks which can be triggered after various events. Configuration is split between webhooks and endpoints internally. It also has integration with Zapier.
Scalr has a concept of RBAC layers and inheritance (also for credentials, etc.). It can serve eg. as the distinction between prod and non-prod or between projects. Cloud credentials can be defined in the top layer and then propagated down to particular workspaces. The same with authorization configuration.
We can use Workspace state sharing to share information between “platform” and “project scoped” environments. It’s more a matter of Terraform itself, Scalr is helping with accessibility.
In my opinion, Scalr has a better UI with more information about Terraform runs. Eg. a view with several added, changed, deleted resources per resource type with the option to drill down into details. Compare that with plain terraform plan output when trying to see something particular in a lengthy list.
Env0 has TACOS capabilities but it’s pretty different from TFC or Scalr. It’s closer to a general-purpose CI/CD system as they are focusing a lot on Custom Flows where any kind of script, ansible, etc. can be added into Terraform workflow (before and after apply, etc.). They support terraform and Terragrunt (I think it’s not important anymore) templates.
It can do automatic drift detection where Env0 can notify or take action after some external process or user is changing Cloud environment from outside of Terraform. (If you switch to Crossplane, you can avoid configuration drift forever! Check out the Upbound Marketplace to see Official Providers and resources available for Crossplane. Making the switch is easy with Marketplace.)
Env0 has an interesting concept of costs management. It can monitor real costs and correlate them with terraform deployments, it can also set up limits for teams and users on spending.
It can also work with TTL for whole environments and delete them automatically. Policy TTL can help with feature environments setup.
Besides the interesting features mentioned above, I also have quite a lot of concerns with other things. It doesn't provide access to state files, they are being managed somehow behind the scene. I also haven’t found any option to integrate with Terraform CLI, so it seems it doesn’t support classic remote runs like TFC or Scalr. Documentation is not providing enough details.
Spacelift same as Env0 has TACOS capabilities but it’s also pretty different than TFC or Scalr, but differently than Env0. It’s also somehow closer to general-purpose CI/CD systems and it’s probably the most customizable TACOS. Besides terraform it also supports Pulumi and plan to support also CloudFormation, Ansible and ARM templates were announced. It wants to provide a wrapper and similar user experience regardless of chosen “backend” technology.
Regarding customizations, it also offers Custom workflows where shell scripts can be added before and after terraform plan, destroy, etc. But it also supports a Customized runner image where practically anything can be added to a Docker image. Also, arbitrary files can be mounted to the run container. But these files must be uploaded to Spacelift first.
It can do automatic drift detection where Spacelift can notify or take action after some external process or user is changing Cloud environment from outside of Terraform.
In addition to the classic sharing of remote state between cloud workspaces (they are called stacks in Spacelift) it also allows to share internal context it looks like a better approach but it would need more research.
It seems that Spacelift is also going forward with a private module registry and provides like CI/CD for modules with tests.
Spacelift also has the most customizable options to work with OPA policies
TACOS Providers Comparison Matrix
The below table provides a high-level overview of various IaC capabilities and their support by a given provider.
TACOS externalize common platform-level tasks. Complex, production-grade terraform code bases will often require TACOS support. Before going all-in on Terraform, consider the above requirements and how is your organization going to support them.
IaC is an established way of managing infrastructure, but there is a new powerful trend emerging; control planes. A control-plane-based architecture gives you a declarative approach to defining and managing resources, continuous reconciliation of resources to eliminate configuration drift, and a pattern for achieving self-service.
I've said it a few times now, but I'll say it again: You can avoid all of this by switching to Crossplane and building your platform with control planes.
Upbound is the company that invented the open source Crossplane project, the alternative cloud-native mentioned above. Upbound's mission is to help customers build internal cloud platforms using control planes.
Every Upbound customer gets access to Upbound Universal Crossplane (UXP) and Official Providers so they can build their internal cloud platforms with Crossplane confidently.
UXP is Upbound’s downstream distribution of Crossplane, and Official Providers are production ready versions of Providers available exclusively to Upbound customers. Both are maintained, tested, and supported by Upbound on behalf of our customers, and are included with an Upbound subscription.