PipelineOps

Why Your CI Pipeline Is Slow (And How to Fix It)

"Is it just me, or is CI super slow?" By the time someone said that, our pipeline was taking 32 minutes. Six months earlier, it had been 8.

TL;DR: CI slowness almost always comes down to four things — no dependency cache, sequential jobs, missing path filters, and a Docker build that runs from scratch every time. This post walks through how I diagnosed a pipeline that had ballooned to 32 minutes and cut it down to around 15. None of the fixes required more than a few dozen lines of config.

What I Was Trying to Do

I was managing the CI pipeline for a SaaS product built on GitHub Actions. The workflow ran tests, lint, a Docker image build, and a staging deploy — all in one shot.

As the team grew and the codebase expanded, the pipeline went from 8 minutes to 32 over six months. Nobody made it slow on purpose. The test suite grew from around 80 tests to over 500. npm dependencies went from roughly 100 packages to over 300. The Docker image picked up layers with every new feature. By the time anyone noticed, you were waiting 30+ minutes for results after opening a pull request.

Developers stopped waiting. They'd open a PR, then immediately dive into something else. Reviewers, stuck waiting for green CI, would move on to other PRs. Context switches piled up. A slow CI pipeline isn't just an inconvenience — it quietly dismantles the feedback loop that keeps a team moving.

What Went Wrong (and Why)

First step was figuring out where the time was actually going. GitHub Actions shows per-step timing in the job summary UI. Here's what I found:

StepTime
npm install4–6 min
Unit tests8 min
Integration tests10 min
Docker build7 min
Deploy to staging3 min
Total~32–34 min

And almost all of it was running sequentially.

Cause 1: Dependencies were reinstalled every time

npm install ran in full on every CI run. Even when package-lock.json hadn't changed, node_modules/ wasn't cached — packages were downloaded fresh each time. If CI runs 100 times a week, that's 6–10 hours spent on npm install alone.

Cause 2: Tests were running sequentially

Unit tests and integration tests lived in the same job and ran one after the other. They had no dependency on each other — they just ended up together because that's how the workflow was first written and nobody revisited it. Classic "good enough when it was fast" design.

Cause 3: No path filtering

Fixing a typo in README.md triggered the full test suite, Docker build, and staging deploy. Every single time.

Cause 4: Docker layer cache wasn't working

The Dockerfile looked like this:

Dockerfile (before)
FROM node:20
WORKDIR /app
COPY . .          # copies everything first
RUN npm ci
RUN npm run build

Because COPY . . copies all files before npm ci, changing a single line of app code invalidates the npm ci layer. Every code change means a full dependency install inside Docker.

The Fix — Step by Step

Step 1: Cache dependencies (biggest impact)

I started here because it required the least config change for the most improvement.

The cleanest approach is using the cache: 'npm' option in actions/setup-node. One line handles everything:

.github/workflows/ci.yml (with cache)
- uses: actions/setup-node@v4
  with:
    node-version: '20'
    cache: 'npm'        # manages ~/.npm cache automatically
 
- name: Install dependencies
  run: npm ci

On a cache hit, npm ci completes in around 30 seconds instead of 4–6 minutes.

What's actually being cached here: cache: 'npm' caches ~/.npm — the directory where npm stores downloaded package tarballs. It does not cache node_modules/. npm ci still deletes and recreates node_modules/ on every run. The speedup comes from skipping the download step, not the install itself.

"Can I skip npm ci entirely if package-lock.json hasn't changed?"

Yes — by caching node_modules/ directly and skipping npm ci on a cache hit:

.github/workflows/ci.yml (node_modules cache variant)
- uses: actions/cache@v4
  id: npm-cache
  with:
    path: node_modules
    key: ${{ runner.os }}-node-modules-${{ hashFiles('**/package-lock.json') }}
 
- run: npm ci
  if: steps.npm-cache.outputs.cache-hit != 'true'

On a cache hit, npm ci doesn't run at all — this is the fastest possible option. The trade-offs:

  • node_modules/ is much larger than ~/.npm, eating into the 10 GB cache limit faster
  • If the Node.js version changes, native modules can break (you'd need to include the Node version in the cache key)
  • Caches are OS-specific and can't be shared across platforms

For most projects, setup-node with cache: 'npm' and always running npm ci is the recommended path. The node_modules/ skip variant makes sense when you have no native modules and a pinned Node.js version.

A note on npm install vs npm ci

You'll sometimes hear that they produce the same result these days. There's some truth to it in a clean CI environment where node_modules/ doesn't exist — the installed packages often end up identical. But the behavioral difference matters:

  • npm ci never updates package-lock.json and fails fast on any mismatch
  • npm install may update the lockfile if package.json has changed

For CI, keep using npm ci. "Never modifies the lockfile" and "fails on version mismatch" are exactly the properties you want for reproducibility.

Pitfalls

  • When the cache breaks: If ~/.npm gets corrupted, npm ci fails with confusing errors. With setup-node, your escape hatch is manually deleting the cache via the GitHub Actions UI or temporarily removing cache: 'npm'. If you use actions/cache directly, add a version prefix to the key (v1-npm-...) so you can force-bust by bumping to v2-.

Step 2: Parallelize the tests

Split unit tests and integration tests into separate jobs and run them in parallel:

.github/workflows/ci.yml (parallelized)
jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm run test:unit
 
  integration-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm run test:integration
 
  build:
    needs: [unit-tests, integration-tests]
    runs-on: ubuntu-latest
    steps:
      # only runs after both test jobs pass

Unit tests (8 min) and integration tests (10 min) now run at the same time. Wall time drops from 18 minutes to 10 — the duration of the longer job.

Pitfalls

  • Startup cost is paid twice: Each parallel job checks out the repo, restores the cache, and runs npm ci independently. For short test suites (2–3 minutes each), this overhead can eat most of the parallelism benefit
  • Billing minutes go up, not down: GitHub-hosted runners are billed by total execution minutes across all jobs. Parallelizing makes things faster for humans but doesn't reduce the compute you're paying for — if anything, it increases slightly. Faster ≠ cheaper
  • Parallelism is harder to retrofit: If jobs need to share files later, you'll need upload-artifact / download-artifact plumbing. It's worth thinking about job boundaries upfront — "can this run independently?" — rather than untangling a monolithic job later

Step 3: Add path filtering

Skip CI entirely for changes that don't affect code:

.github/workflows/ci.yml (with path filtering)
on:
  push:
    paths-ignore:
      - '**.md'
      - 'docs/**'
      - '.github/CODEOWNERS'
  pull_request:
    paths-ignore:
      - '**.md'
      - 'docs/**'
      - '.github/CODEOWNERS'

When only ignored paths change, GitHub skips the workflow and counts it as success — so required status checks in branch protection still pass.

For documentation-only commits, CI time drops from 32 minutes to zero. That said, this fix only applies to commits that touch nothing but ignored paths — for regular code changes, it has no effect on run time.

Note: paths (positive filter — run only when specific paths change) behaves differently. Commits that don't match the filter won't create a workflow run at all, which can leave required checks in a pending state. If you use branch protection, test this before shipping.

Pitfalls

  • Mixed commits slip through: If someone changes docs/ and a config file in the same commit, paths-ignore won't save you — CI runs because non-ignored files changed. The risk is the reverse: a commit you meant to be "docs-only" that accidentally includes a config change and gets skipped
  • **.md may be too broad: Some projects store code-generation config or templating logic in .md files. The safer path is being specific about which directories to ignore
  • Merges to main can still get skipped: Whether CI runs depends on the diff between the PR head and base. Verify that your branch protection rules hold up — "code that reaches main has passed CI" should remain true

Step 4: Fix the Docker layer order

Move dependency files ahead of application code:

Dockerfile (after)
FROM node:20
WORKDIR /app
COPY package.json package-lock.json ./   # dependency files first
RUN npm ci                               # this layer is now cacheable
COPY . .                                 # app code changes only invalidate from here
RUN npm run build

Now changing application code doesn't invalidate the npm ci layer. Docker reuses it from cache. On a code-only change, Docker build time dropped from 7 minutes to around 1–2 minutes.

To enable layer caching in GitHub Actions, use docker/build-push-action with cache-from and cache-to:

.github/workflows/ci.yml (Docker layer cache)
- name: Build Docker image
  uses: docker/build-push-action@v5
  with:
    context: .
    push: false
    cache-from: type=gha
    cache-to: type=gha,mode=max

type=gha stores cache in GitHub Actions Cache, shared across runner instances.

Pitfalls

  • The 10 GB limit bites large images: GitHub Actions Cache evicts old entries once a repository hits 10 GB. Projects with large images will see inconsistent cache hit rates — fast one day, slow the next. If that becomes a problem, an external registry (ECR, etc.) as the cache backend is more stable
  • Branch scoping is asymmetric: Caches from the default branch are accessible to feature branches, but feature branch caches are invisible to main. Also, PR caches live under refs/pull/N/merge — a separate scope that only the same PR can access. A brand new branch falls back to main's cache, which may be stale
  • mode=max uses more storage: It caches all intermediate layers, which is more effective but drains the 10 GB quota faster. Start with mode=min (final image only) and move to mode=max if you're still seeing too many cache misses

What I'd Do Differently

Put cache config in the team workflow template from day one. Adding caching after the fact is work that should only be done once. If every new workflow starts with it built in, the whole team benefits immediately without anyone having to remember.

Track CI duration as a metric. I didn't notice the problem until we hit 32 minutes. The GitHub Actions API exposes run_duration_ms for every workflow run — storing weekly P95 times would have surfaced the trend months earlier and let me fix it before it started affecting the team.

Design for parallelism when splitting jobs. Retrofitting parallel jobs means untangling dependencies, figuring out artifact passing, and redesigning cache strategy all at once. If you think about "can this job run independently?" when you first write the workflow, it's a 10-minute decision instead of a half-day refactor.

Key Takeaways

  • CI slowness almost always traces back to four things: missing cache, sequential jobs, no path filtering, and Docker layer order
  • Adding dependency caching is the fastest-payoff change you can make — do it first
  • Parallelizing jobs cuts wall time but doesn't reduce billing minutes; faster ≠ cheaper
  • paths-ignore lets you skip CI for documentation-only changes; verify branch protection behavior before relying on it
  • In your Dockerfile, copy what changes least first — dependency files before application code
  • CI pipelines drift slow without anyone noticing. Measure regularly

FAQ

Q: What's the fastest way to speed up GitHub Actions CI?

A: Start with dependency caching. In most pipelines, npm install (or its equivalent) accounts for 20–40% of total run time. Adding cache: 'npm' to actions/setup-node takes two lines and pays off immediately.

Q: How do I track and measure CI run times in GitHub Actions?

A: The GitHub Actions UI shows per-step timing in the job summary. For trend tracking, the GitHub API endpoint GET /repos/{owner}/{repo}/actions/runs returns run_duration_ms for each workflow run. Logging P95 weekly makes slowdowns visible before they become painful.

Q: Does path filtering break required status checks in branch protection?

A: With paths-ignore, skipped workflows are treated as success by GitHub — required checks still pass. With paths (positive filter), workflows that don't trigger won't appear in the checks list at all, which can leave required checks in pending. Test your specific branch protection setup before deploying.

Q: What are the limits of GitHub Actions Cache for Docker layers?

A: The cache is capped at 10 GB per repository. Cache entries from the default branch are accessible to feature branches, but not the reverse. PR caches live in an isolated scope (refs/pull/N/merge) and are only accessible within the same PR. For large images that regularly overflow 10 GB, an external registry like ECR is more predictable.

Q: Should I reduce the number of tests to speed up CI?

A: No. Fewer tests means more bugs get through. The right move is parallelizing or sharding the tests you have — not removing them. Jest's --shard flag and pytest-xdist are good starting points for splitting a large test suite across multiple runners.


Based on experience across multiple SRE engagements. Details that could identify specific companies or individuals have been omitted or generalized.