PipelineOps

CloudFormation vs Terraform: Which IaC Tool for AWS in 2026?

The day our CloudFormation stack crossed 497 resources, I still had three services left to migrate.

TL;DR: We hit CloudFormation's 500-resource-per-stack hard limit in production and had to migrate incrementally to Terraform. The question isn't which tool is better — it's where each one belongs. CloudFormation for pure AWS provisioning; Terraform when you need multi-cloud support, complex module reuse, or cross-team state sharing. Know that distinction upfront and you won't get stuck mid-migration.

What I Was Trying to Do

I joined a platform engineering team at a large SaaS company with one mandate: get all microservices onto Infrastructure as Code. At the time, 8 of our 12 services were being provisioned by hand in the AWS console, and environment drift had become chronic.

Context: This environment required all resources to be managed within a single AWS account for compliance and audit reasons. A multi-account migration was in progress on a separate team, but until it completed, we had to work within the existing constraints. Similar situations come up in regulated industries, large enterprise legacy environments, and any org mid-way through an AWS Organizations rollout.

Four services were already managed with CloudFormation. It's AWS-native, integrates cleanly with IAM, and has a low barrier to entry. "Let's standardize on CloudFormation" felt like the obvious call. It even supports multi-account deployments through AWS Organizations. And frankly, nobody on the team had real Terraform experience.

So we started adding the remaining eight services to the stack, one by one.

What Went Wrong (and Why)

When I added the fifth new service to the stack, this came back:

Resource limit exceeded: You may not exceed 500 resources in a stack.

CloudFormation enforces a hard limit of 500 resources per stack — documented in the AWS CloudFormation quotas page, and non-adjustable through Service Quotas. I'd assumed 500 would be plenty. It wasn't. A single ECS service consumes at minimum a Task Definition, Service, Target Group, Listener Rule, Security Group, IAM Role, and CloudWatch Log Group. Add auto-scaling, CloudWatch alarms, SSM parameters, and Route53 records, and you're easily looking at 50+ resources per service.

The deeper failure was putting everything into one stack to begin with. Teams experienced with CloudFormation know to give each service its own stack. Ours didn't, and the idea of splitting something that was already working never came up. Had we done that from the start, we wouldn't have hit this limit for a long time.

We tried Nested Stacks as a workaround. They buy you more headroom, but they introduced new problems. Every change triggered a rolling update across the full parent-child hierarchy, and deploy times started creeping up. At one point, an undetected circular reference between child stacks left us stuck in UPDATE_ROLLBACK_COMPLETE. Getting out of it required manually deleting and recreating stacks — a full day of recovery.

The third problem we hadn't accounted for was drift. CloudFormation's drift detection doesn't run automatically — you have to trigger it manually or schedule it. An engineer made a direct Security Group change in the console. Nobody noticed. Six months later, that silent deviation exploded during a release.

The Fix — Step by Step

Rather than attempting a full migration at once, we adopted a split policy: new services go to Terraform, existing stacks stay in CloudFormation. That gave the team time to learn Terraform without blocking ongoing work.

Step 1: Lock down Terraform remote state first

We set up an S3 bucket (versioning enabled) with a DynamoDB table for state locking, and made every service's Terraform code point to the same backend.

terraform/backend.tf
terraform {
  backend "s3" {
    bucket         = "mycompany-tfstate-prod"
    key            = "services/api-gateway/terraform.tfstate"
    region         = "us-west-2"
    dynamodb_table = "mycompany-tfstate-lock"
    encrypt        = true
  }
}

Step 2: Reference CloudFormation outputs from Terraform

Our VPC, subnets, and existing security groups were still CloudFormation-managed. The aws_cloudformation_stack data source lets Terraform read stack outputs directly, keeping the two tools loosely coupled.

terraform/data.tf
data "aws_cloudformation_stack" "network" {
  name = "mycompany-network-stack"
}
 
locals {
  vpc_id     = data.aws_cloudformation_stack.network.outputs["VpcId"]
  subnet_ids = split(",", data.aws_cloudformation_stack.network.outputs["SubnetIds"])
}

CloudFormation owns the network layer. Terraform owns the application layer. Neither tool manages the other's resources.

Step 3: Bring drift detection into CI

For the CloudFormation side, we added a scheduled GitHub Actions job to run drift detection on a regular cadence. For Terraform, we made terraform plan a required check on every PR — no merge without a clean plan output.

What I'd Do Differently

Honestly, I don't regret the choice to use CloudFormation initially. It still has a real place: AWS SAM and CDK integration, self-service provisioning through Service Catalog, cross-account deployments with CloudFormation StackSets. These are areas where it genuinely excels.

What I regret is not doing the math. If you estimate the resource count for a single ECS service in production, you'll see you hit the 500-resource wall somewhere around 9 or 10 services. That's a calculation I could have done before writing the first template.

The other lesson is that the root problem here wasn't about IaC tooling at all — it was account design. Once the single-account constraint was lifted, the next project I led started with an AWS Organizations structure that separated accounts by service and team. When you have natural account boundaries in place, stack limits stop being something you worry about.

Key Takeaways

CloudFormation makes sense when:

  • You're managing AWS resources only (no multi-cloud requirement)
  • You need tight integration with AWS Organizations, Service Catalog, or CDK
  • Your team has strong AWS expertise but limited Terraform exposure
  • Your stack resource count will stay comfortably under 400

Terraform makes sense when:

  • You're managing resources across multiple cloud providers
  • Module reuse and cross-team state sharing matter
  • You're expecting significant scale (10+ microservices)
  • You have Terraform experience on the team, or the capacity to build it

True for both:

  • Drift detection needs to be automated, not manual
  • State must be versioned — S3 versioning for Terraform, CloudFormation's built-in rollback for CFn

FAQ

Q: Can I use CloudFormation and Terraform in the same project?

A: Yes, and during incremental migrations it's often the practical choice. Use the aws_cloudformation_stack data source to read stack outputs from Terraform, which keeps the two tools loosely coupled. The one hard rule: never manage the same resource with both tools. That leads to state conflicts that are painful to untangle.

Q: How do I get around CloudFormation's 500-resource limit?

A: The official workaround is Nested Stacks. But before you go that route, consider splitting stacks by functional domain first — network layer, data layer, application layer. That's a cleaner architecture and avoids the circular dependency issues that nested stacks can introduce. If you're regularly running into this limit, it's also worth evaluating whether Terraform's workspace-based state separation is a better fit for your scale.

Q: Why doesn't Terraform have a resource limit like CloudFormation's 500?

A: The architectures are fundamentally different. CloudFormation manages a "stack" object server-side on AWS infrastructure. AWS tracks resource dependencies and rollback state in that object, which is why limits exist. Terraform is a client-side tool — state is just a JSON file on S3, and it calls cloud provider APIs directly. There's no intermediate aggregation object, so there's no AWS-imposed hard limit. The practical constraints are different: large state files make terraform plan slower, and you can hit provider API rate limits at scale, but neither is a hard ceiling.

Q: Where and how should I manage Terraform state files?

A: On AWS, the standard approach is an S3 bucket with versioning and encryption enabled, plus a DynamoDB table for state locking. Don't use local state for team projects — without locking, concurrent applies will corrupt the state file. If you want to skip managing the backend yourself, Terraform Cloud handles all of this as a managed service.

Q: Is AWS CDK closer to CloudFormation or Terraform?

A: CDK is a CloudFormation synthesizer. Code you write in TypeScript or Python gets compiled down to a CloudFormation template, and CloudFormation handles the actual provisioning. That means CDK inherits all of CloudFormation's constraints — including the 500-resource limit per stack. The upside is writing infrastructure in a general-purpose language with real abstractions. The tradeoff is an added layer between your code and what's actually deployed, which can make debugging harder.

Q: Can I import existing AWS resources into Terraform without destroying them?

A: Yes. terraform import pulls an existing resource into Terraform's state. The catch is that import only updates the state file — it doesn't generate the corresponding HCL. You'll need to write that yourself (or use terraform plan -generate-config-out=generated.tf in Terraform v1.5+, which generates a starting point that still needs manual review and cleanup). Before importing anything in production, run through the full process in a staging environment first.