All Articles

Production incidents, AWS deep-dives, and CI/CD battle stories.

April 24, 2026
The Wrong SLI Almost Broke Our Reliability Culture
Our SLO was green while support was on fire. The real issue wasn't the SLI — it was using SLO as a report card. Here's how we fixed both.
SRE SLO Observability
April 17, 2026
Setting Up AWS CodeBuild as a GitHub Actions Runner: No More Self-Managed EC2
Burned out managing EC2 self-hosted runners, I switched to CodeBuild-managed runners. Here's the full setup — including the Webhook and IAM gotchas that cost me a day.
CI/CD GitHub Actions AWS CodeBuild
April 11, 2026
AWS ECS vs EKS: Choosing Container Orchestration in 2026
Three months piloting EKS alongside ECS in production. What the upgrade overhead costs, what broke, and a four-question framework for the decision.
AWS ECS EKS Kubernetes Platform Engineering
April 7, 2026
CloudFormation vs Terraform: Which IaC Tool for AWS in 2026?
We hit CloudFormation's 500-resource hard limit mid-migration. Here's what broke, how we fixed it, and when to choose each tool.
AWS IaC CloudFormation Terraform
April 3, 2026
Why Your CI Pipeline Is Slow (And How to Fix It)
CI pipelines slow down for four reasons: missing cache, sequential jobs, no path filtering, and broken Docker layer cache. I diagnosed a 32-minute pipeline and cut it down to about 15 minutes.
CI/CD GitHub Actions DevOps
March 29, 2026
GitHub Actions Self-Hosted Runners on AWS EC2: What No One Tells You
I put a self-hosted runner on EC2 and it died at 2am. Here's what broke, why non-ephemeral runners are a trap, and the step-by-step path to a production-ready setup.
CI/CD GitHub Actions AWS EC2
March 27, 2026
Docker Multi-Stage Builds: Cut Image Size by 80%
My Spring Boot Docker image hit 1.2 GB. CI took 12 minutes per run and Trivy flagged 140 vulnerabilities. Multi-stage builds brought it down to 245 MB — here's exactly what I changed.
Docker CI/CD AWS DevOps
March 27, 2026
How to Set Up Zero-Downtime Deployment on AWS ECS
ECS rolling update defaults don't give you zero downtime. Here's the three-layer fix — graceful shutdown, ALB deregistration delay, and stopTimeout — that ended our deploy-time 502s.
AWS ECS CI/CD DevOps
March 27, 2026
GitHub Actions vs Jenkins: Which CI/CD Tool in 2026?
Ran Jenkins for years before switching to GitHub Actions. We saw a 20% drop in release work — here's the reasoning I actually used.
CI/CD GitHub Actions Jenkins Platform Engineering