GitOps is an idea that entire systems are declaratively specified in code, versioned in Git, and that repositories are a team’s source of truth. There’s some plumbing required that takes the specified code and reconciles the system to that specification.
In my experience, GitOps can be taken way too far and end up becoming a hassle.
This post discusses some drawbacks of GitOps that I’ve encountered.
First: GitOps isn’t awful
Before I get into GitOps drawbacks, let me be clear: GitOps-y approaches aren’t always bad.
Systems are not only an application’s code. There’s lots of other components that make up a system. A simple application might also have a database, a load balancer, and some type of queue.
Looking at application code, Git solves several problems for engineers. Some examples might be answering these questions:
- When was a module last changed?
- What else changed in that last change?
- Who authored the change?
GitOps expands what Git provides for application code to other parts of a system. In other words, with GitOps, engineers can answer similar questions but about their entire system.
GitOps isn’t a silver bullet, and there are real considerations to make to determine the extent you’ll implement GitOps practices.
The GitOps Purity Test
GitOps evangelists, especially software companies that sell related products, talk about The Right Way to do GitOps. The Right Way usually boils down to a few points:
- If it’s not in Git, it doesn’t exist
- Disallow changes to systems from anywhere but CI/CD
- ClickOps or configuring anything manually is forbidden
I’ll refer to this methodology as Pure GitOps.
Drawback 1: CI/CD job execution is now critical-path
Pretend there’s a company runs its own local installation of GitHub/GitLab and a Pure GitOps engineering team. In Pure GitOps, a team’s ability to deploy/rollback a service is gated on the ability to pull from git and execute a continuous integration/continuous deployment (CI/CD) pipeline. GitHub uses Actions and GitLab uses Pipelines, coupling Git and CI/CD to the same system.
Pure GitOps becomes painful when the Pure GitOps team’s expectations of GitHub/GitLab service’s availability/throughput is higher than what GitHub/GitLab provide.
Why does this happen? In my experience, local installations of GitHub/GitLab typically are not owned by service engineering teams, they’re owned by a central corporate IT team. Those teams reasonably view GitHub and GitLab purely as source code management, and not a tier 0 piece of infrastructure. This viewpoint isn’t unreasonable… after all, Git is decentralized!
Engineering teams should carefully understand dependencies and model SLAs/SLOs accordingly (see “Modeling Dependencies” heading). GitOps is not an exception.
It’s easy for this sort of expectation mismatch to creep up over time since these systems tend to Just Work and then suddenly be problematic. If you’re in this situation, talk with your partner teams and establish why GitOps provides value to the business (hint: bring data!) and SCM and CI/CD execution deserve a tier 0 treatment. Reconsider a Pure GitOps stance.
Drawback 1.1: Your CI/CD pipeline is also in the critical path!
Does your CI/CD pipeline occasionally fail due to tests a flakey test? In Pure GitOps, engineers cannot roll out changes without passing tests. This means you can’t rollback without passing CI/CD! In other words, your system’s uptime is a function of the reliability of your CI/CD pipeline.
Ideally, teams should just write reliable tests, and run the tests on reliable test infrastructure. An unreliable test is worse than no test at all. I’ve seen teams leave paths open to be able to deploy their systems outside of Git as a “in case of emergency break glass”, which later have to be reconciled. These alternate paths are useful, but undermine the utility and benefits of GitOps if those changes don’t make it back into the repository.
Drawback 2: GitOps can turn into Engineer Catnip
Engineer Catnip(tm) is what I call unhealthy snacks for engineers that engineers love, but don’t provide the suitable business value for the effort involved.
It’s important to hone in on why engineers love Gitops: a single source of truth about a system makes it easier to reason about a system. This is A Good Thing! A single source of truth for engineers means more productive engineers. Sadly, Engineers can lose sight of business value they’re producing by over-indexing on their own desires.
Systems are composed of a long tail of components and Gitops-ifying all of them provides diminishing returns. This discussion usually comes up when Pure GitOps teams have to manage components that aren’t already in Git:
Shouldn’t $FOO just be in Git?
Jr. Engineer
Ehh, Maybe Not. My recommendation is to let ClickOps become an acute pain before doing bespoke GitOps-y investments.
Drawback 3: Trivial changes become hassles
I’ve worked on multiple teams at multiple large scale companies that stored alert configurations in Git. Why would anyone want to do that? These were companies that had several regions each with many data centers containing thousands of service deployments. Some sort of automation is necessary for stamping out consistent configurations and alarms.
The Pure GitOps solution is that all alert configuration is stored in Git and can only be changed making changes to the Git repository. What could go wrong?
Well…alerts are tricky to get right the first time. False alerts are unfortunate, but they are a fact of life for on call responders. If a threshold for an alert needs to be adjusted at 2AM so I can go back to sleep and properly address later, especially if my Git configuration requires another person to approve, a round trip is punititve. ClickOps that shit and go back to bed. Oh, no, you can’t! Pure GitOps means all alert changes require a Git round trip, which means your CI/CD for materializing alerts can get in the way of going back to bed.
Drawback 4: The outside world doesn’t care about GitOps (yet?)
GitOps is a means to an outcome. There are whole classes of software that are oblivious to GitOps team desires. Yes, vendors are starting to care, but the ones that care the most are usually trying to sell you something.
- New or experimental features that teams need to use that are not available in CFEngine, Puppet, Chef, Ansible, Terraform, CloudFormation, etc… Do you choose to write your own new custom module make this manageable with GitOps? Do you say “no” if the feature can’t be expressed in GitOps? Is the new feature’s API even available, or is it ClickOps only?
- Bugs in GitOps-y providers are very real, even ones powered by world class engineering teams. What’s your plan when you encounter one that requires ClickOps workarounds?
- Contorting components that don’t have already-built configuration management capabilities (Puppet, Ansible, Terraform, etc.,) by creating custom modules is a now your company’s maintenance burden. My favorite examples of this are legacy technologies, like older out of band management cards (ILO, DRAC, etc). These cards were never envisioned to be managed by the tens of thousands, you want to manage TLS certificates or login credentials via GitOps. Instead of GitOps, some elbow grease, scripting, and manual deployments are probably the right way to go.
Drawback 5: Conflating GitOps and good security practices
Pure GitOps says things like “Now we know who changed what when!”, but you really don’t. That’s a workaround for systems that don’t provide robust audit logs (who changed what) or change differences (what changed). Git is a cheap (maybe not simple?) way to approximate those kinds of logs, but don’t conflate that with the truth: actual substitute for infrastructure and application level audit logs.