How to Fail at Terraform?

Infrastructure as Code, particularly with tools like Terraform, has become indispensable in managing cloud resources. Its widespread adoption is well-deserved due to its extensive ecosystem and compatibility with a wide range of platforms. However, while many teams excel in their Terraform implementations, numerous others find themselves grappling with technical debt and sprawling, insecure infrastructures. Even seasoned engineers are not immune to making mistakes. These errors often arise subtly, stemming from minor design decisions and seemingly innocuous implementation shortcuts. The challenges go beyond mere syntax errors or failed terraform apply commands. They lie in deeper, foundational issues that transform small oversights into significant problems. These troubles can accumulate to such an extent that the only viable option appears to be a total overhaul of the system. Let’s examine some common pitfalls that organizations frequently encounter in their Terraform journey.

It All Starts with Design (Or Lack Thereof)

The choices you make before you write a single line of HCL will echo through the entire lifecycle of your project. This is probably the most critical phase, yet it’s often the most neglected. There’s a persistent tendency to pour all the design effort into the application itself, leaving the infrastructure configuration as an afterthought. This is where the first seeds of technical debt are sown.

The “Not Invented Here” Trap

You’ve probably heard it called “reinventing the wheel.” An engineering team feels compelled to build their own Terraform configuration from the ground up because they “don’t trust” a third-party module or it isn’t a perfect fit for their specifications.

Now, sometimes a bespoke solution is genuinely necessary. But more often than not, this impulse is a sign of something else. It might be a cultural resistance to external ideas, or it might just be a failure to properly evaluate existing options. When teams insist on building everything themselves, they often end up duplicating work that’s already been solved, tested, and hardened by the community. This wastes time and can isolate the team from valuable innovations.

The Monolith Configuration

One of Terraform’s best features is its support for modularity, but you wouldn’t know it from looking at many codebases. Too many infrastructure setups aren’t designed to scale.

The story usually goes something like this:

The design phase for a new app wraps up. Time to build!
There’s a massive push to get a working stack shipped. Yesterday.
Some Terraform code is quickly thrown together in the root of the application repository.
Success! Version 1 is live. The servers and databases are humming along.

The problem is, the process stops there. When it’s time to deploy a new application, the whole frantic cycle repeats. If the first app needs a new region, it’s just bolted onto the same root module. Over time, this thing grows into an unmanageable beast. What about a QA environment? Or disaster recovery? Without modules, engineers are forced to copy and paste code, creating a tangled web of configurations that nobody fully understands or wants to touch.

Designing in a Silo

One of the original promises of DevOps was to break down the walls between developers and operations. Yet, in many places, those walls are still standing strong. We still see application designs thrown over the wall to DevOps or Cloud engineers, who are then expected to just implement them, no questions asked.

This is the opposite of how it should work. Deployment velocity – how quickly and reliably you can ship code – is a key measure of success. If design feedback is bouncing between disconnected teams and people are arguing over technical ownership, that velocity grinds to a halt.

When Deadlines Get Unrealistic

Poor design and collaboration inevitably lead to another problem: engineers get squeezed by impossible deadlines. The effort to implement the infrastructure wasn’t properly factored into the plan because the plan was made in a vacuum.

Looming deadlines lead to shortcuts. Shortcuts lead to tech debt. And tech debt often leads to security issues. Engineers tend to be an optimistic bunch, underestimating the time and effort a task will take. The initial plan assumes a perfect world with no interruptions. Then reality hits, and the constant firefighting can doom a project before it even gets off the ground.

Good Intentions, Bad Practices

Even with a decent design, the implementation can go off the rails. And while design flaws are easier to fix when caught early, implementation failures tend to be far more expensive to untangle.

The Wild West of Naming Conventions

A Terraform codebase with no enforced standards is a special kind of nightmare. Once this chaos sets in at scale, it’s incredibly difficult to walk back.

You’ll see breakdowns like:

No consistent way of naming resources (prod-db in one place, db-PROD in another).
Hardcoded settings that should have been input variables from the start.
One app stack is organized by environment folders, another uses workspaces, and a third crams everything into one module with a mountain of parameters.
An IAM policy might be a data source, a HEREDOC, or an aws_iam_role_policy resource, depending on who wrote it that day.

Here’s a classic example that seems small but causes big headaches: hyphens versus underscores in resource names.

# Is it snake_case?
resource "aws_instance" "frontend_web_server" { ... }

# Or kebab-case? Or CamelCase?
resource "aws_instance" "frontend-webServer" { ... }

Both of these will create an EC2 instance, but their addresses within Terraform’s state will be different. This might seem trivial, but in a large codebase full of references and outputs, these little inconsistencies create friction. They make the infrastructure harder for new engineers to understand and can create a fear of making changes.

Letting Everything Sprawl

As mentioned before, Terraform gives us tools like modules and workspaces to keep our code DRY (Don’t Repeat Yourself). Deployments with tons of repeated code are brittle and a pain to work with.

When faced with a tangled mess of resources, an engineer’s first impulse is often to propose a complete rewrite. Unfortunately, this “scorched-earth” approach usually just makes the problem worse. Now you have the old, sprawling system and a new one to manage.

Resource sprawl isn’t just messy; it’s expensive and insecure. You can’t secure something if you don’t even know it exists. Using hardened, reusable Terraform modules creates a set of standard, opinionated building blocks. This reduces duplicated effort and tightens your security posture from the start.

The Myth of the Laptop Deploy

CI/CD pipelines are the engine of modern software delivery. They let us check in, test, and deploy code quickly and reliably. Terraform code should be treated just like application code: put it in version control, lint it, test it, and deploy it automatically.

Running terraform apply from a laptop is fine when you’re learning, but it’s not a viable strategy for a team managing production environments. It creates a single point of failure (what happens when that person is on vacation?) and leaves no audit trail. Terraform has features like remote state and state locking that are specifically designed for teams and automation. Use them.

Flying Blind Without Policy and Tests

Policy-as-code engines like Open Policy Agent (OPA) or HashiCorp’s Sentinel can act as guardrails. They can automatically prevent someone from deploying an S3 bucket that’s open to the world or an IAM role with overly permissive access.

Without these checks, you’re left relying on manual reviews, which are slow, error-prone, and simply don’t scale. Policies provide the safety net that allows teams to move quickly and autonomously without compromising on security or standards.

The Slow Decay of Operational Neglect

These failures aren’t unique to infrastructure; they plague all kinds of software projects. Organizations have a habit of forgetting about the “soft” work, like documentation and routine maintenance, once a project is live.

Forgetting to Build a Knowledge Base

Teams often don’t invest the time to build a solid foundation of knowledge. Things like documentation, architecture diagrams, and Architectural Decision Records (ADRs) are essential for a project’s long-term health.

The pain from this neglect might not be felt right away. The first generation of engineers on the project has all the context in their heads. But what happens a year or two later when those people have moved on? Without a written record of why certain decisions were made, the project’s continuity is at risk. New engineers will be tempted to start from scratch rather than trying to understand the existing system, leading to the “now you have two problems” situation all over again.

Chasing the “New” and Ignoring the Debt

Once a project ships, it’s often forgotten. All the attention and rewards go to the next shiny new thing. No engineer wants to get stuck maintaining old projects.

Over time, this culture leads to a massive accumulation of technical debt. For Terraform, which can have multiple releases with breaking changes each year, this means older configurations become brittle and locked into outdated versions.

The fix for this is cultural. If the only thing that gets rewarded is shipping new features, you’ll end up with a graveyard of unmaintained projects. A healthy engineering culture celebrates craftsmanship and maintenance just as much as it celebrates a greenfield launch.

Getting It Right Is More Than a Technical Problem

Avoiding these pitfalls isn’t just about writing better code. It’s about fostering an engineering culture that values pragmatic design, collaboration, and long-term maintenance. It means embracing Terraform’s ecosystem instead of trying to build everything from scratch.

Organizations that commit to this approach will end up with more than just high-performing infrastructure. They’ll build more empowered and effective engineering teams.