Configuration Drift Explained
Put simply, configuration drift is where production environment settings change in ways the platform team didn’t anticipate and doesn’t want. This is usually a result of developers making changes on their own, without approval or oversight from the platform team. As software development and delivery systems grow more complex, the opportunity for platform changes to be made haphazardly also increases. The demand for continuous delivery, hotfixes and critical package updates creates a perfect storm where configuration drift becomes inevitable, no matter how proficient the company or team.
Configuration drift has major business implications, though sometimes they’re not always obvious to folks higher up in the organization. When platform teams detect drift, projects get delayed, and projects already up and running can suffer significant downtime - reflected in lost revenue, and the long term financial impact of poor customer perception, leading to negative reviews online.
In this article, we’ll look at some real-world examples of configuration drift, as well as how the financial losses associated with configuration drift can be avoided.
Configuration Entry from Multiple Sources
This is one of the most common causes of configuration drift. With configuration being inputted from potentially dozens of different sources, it’s not uncommon for a change in configuration from an upstream system to cause problems downstream. Although changes like these are usually unintentional, where the upstream team fails to communicate with their downstream peers, configuration drift predictably results.
Hotfixes that Backfire
Due to tight deadlines or unclear requirements, major issues can often occur at the production level. In these cases, a hotfix will likely be deployed to resolve the issue.
Unfortunately, because hotfixes are often performed under pressure, with little time to follow procedure, configuration drift is almost inevitable. And once a drift like this does occur, engineers will have to spend critical time (and therefore money) fixing the problem.
Time Consuming Manual Audits
Without the ability to automate configuration drift detection, teams must perform regular manual audits to ensure infrastructure is properly deployed and configured throughout all of their environments. Naturally, manual audits can be costly and time consuming. Engineers must spend time checking for errors, making changes, while diverting their attention from other projects.
Where configuration management tools are used to complete these audits, there is also the issue of compatibility: configuration management tools can often only detect changes in systems they have been configured to monitor. In a multi-cloud, hybrid environment, keeping track of and monitoring the totality of your assets may prove time consuming in itself.
Critical Security Updates Under Pressure
Security updates are similar to hotfixes in that they are almost always performed under pressure. Due to the pressure to preempt a potential problem (as opposed to fixing a current one) security updates are likely to be performed at high speed. This means that configuration drift is more than likely to occur as standard procedures are bypassed in a bid to perform the update as quickly as possible.
It goes without saying that, as with hotfixes, a simple mistake can lead to a lack of orchestration across servers. This can lead to a backlog of manual fixes to be performed, further delaying a deployment or live project.
Infrastructure as Code solutions
Another problem can often emerge when teams over rely on Infrastructure as Code (IaC) tools, such as Terraform. While Terraform can be extremely useful, allowing you to create “execution plans” that outline exactly what will happen when you run your code, these tools don’t continuously reconcile infrastructure. Drift is still likely to occur.
Only with a platform like Upbound can you continuously reconcile your configuration with resource state, eliminating any possibility of configuration drift and ensuring applications run reliably.
The True Cost of Configuration Drift
Time is money and getting SREs to redeploy takes more time than companies expect. The most common financial costs of configuration drift are lost productivity, ( applications can no longer function and downtime) engineers will be needed to troubleshoot code, trying to identify the cause of the fault thus loss of productivity.
Wherever the configuration drift originated, the cost of reconfiguration comes in many forms - for the team working on the project, the client is frustrated at continuing project delays, and for stakeholders understandably concerned about their partnership with a company struggling to meet its briefs.
Cost of Client Dissatisfaction
Delays and downtime due to configuration drift naturally leads to complaints from clients. The initial cost impact is fairly straightforward: as the client becomes frustrated with increasing delays and the inconsistent service, they may choose to migrate their project to another company. Longterm, the financial cost isn’t quite so obvious but is no less significant: the ongoing cost of a dissatisfied client is that they are less likely to recommend your service, thereby causing you an additional loss of future earnings.
Cost of Disrupted Workflow
The cost of distributed workflow within a team is twofold. Firstly, a disrupted workflow affects team morale. Team members are less likely to feel confident in their ability to complete their assigned tasks, as result, the smooth deployment of other projects will be disrupted in turn. Secondly, by impacting team morale, the likelihood of staff leaving their post increases. As a result, new hires will have to be interviewed and trained, costing valuable time and therefore directly affecting your bottom line.
Cost accrued from Dissatisfied Investors
Reporting to investors may become strained where a project has been delayed due to configuration drift. This is particularly true where auditors have to be brought in to resolve the situation. As it becomes apparent your team is struggling to meet its requirements, confidence, not only in the project, but your company as a whole, is brought into question, investors may seek to limit their involvement and their investment in turn.
Upbound: The Cloud on your Terms
Upbound is built with a control-plane-based architecture that gives customers a declarative approach to defining and managing resources, and continuous reconciliation of resources to eliminate configuration drift. It allows you to Identify drift, set policies, track and monitor progress all through a fully integrated self-service platform. If and when configuration drift does occur, Upbound detects it and reverts the change back to how it was set previously. In addition, using control planes, teams can unify both their application and infrastructure deployment workflows. All of this ensures you save valuable time as you avoid lengthy meetings and downtime involved with correcting code in house.
Upbound is also customisable from top to bottom. It allows you to modify any component as your business evolves all the time, making re-platforming a thing of the past.