Place2Page
How We Built Place2Page's Selective CI/CD Pipeline

Engineering

How We Built Place2Page's Selective CI/CD Pipeline

A grounded look at how Place2Page ties together change detection, selective tests, GHCR image builds, and production deploy automation.

The goal was straightforward:

  1. Test only the areas that changed
  2. Build only the images we actually need
  3. Restart only the services that should move in production

Instead of introducing a large deployment platform, we kept the stack simple: GitHub Actions, GHCR, and a self-hosted deploy runner.

The problem we were trying to remove

Once a monorepo starts shipping both API and Web from the same repository, a few inefficiencies show up quickly:

  • Web-only changes still trigger API work
  • API-only changes still rebuild Web artifacts
  • Production deploys drift toward manual server work
  • Failures are hard to localize quickly

The current pipeline is shaped around reducing exactly those issues.

1. Change detection comes first

The first job in the workflow is detect-changes.

It decides whether apps/api/**, apps/web/**, docker-compose.yml, or a manual dispatch input should drive the rest of the pipeline.

if [[ "$compose_changed" == "true" ]]; then
  deploy_services="all"
elif [[ "$api_changed" == "true" && "$web_changed" == "true" ]]; then
  deploy_services="api,web"
elif [[ "$api_changed" == "true" ]]; then
  deploy_services="api"
elif [[ "$web_changed" == "true" ]]; then
  deploy_services="web"
fi

That output becomes the contract for every later job.

The benefit is less about clever Bash and more about keeping the deploy intent centralized. The workflow decides scope once, then everything else consumes it.

2. Tests and image builds stay selective too

The test split is simple:

  • API tests: uv run pytest -q
  • Web tests: npm ci && npm test

Both run only when their area changes.

The same pattern applies to image builds:

  • api-image runs only when API changed
  • web-image runs only when Web changed

Another useful detail is that build and publish are not treated as the same thing.

Branches and pull requests can build images for validation, but deployable GHCR pushes only happen on main. That keeps review flows lighter while keeping production images tied to the default branch.

3. Production deploys finish on a self-hosted deploy runner

The deploy runbook makes the main architectural choice explicit: instead of SSH-driven deploy steps, the deploy job runs on a self-hosted deploy runner.

The workflow expects:

  • runner labels: self-hosted, linux, x64
  • a configured deploy path such as $DEPLOY_PATH
  • Docker and Docker Compose available in the deploy environment
  • write access to docker-compose.yml in the deploy workspace

From there, the deploy sequence stays linear:

  1. Check out the repository
  2. Sync docker-compose.yml into the deploy path
  3. Validate the environment with docker compose config >/dev/null
  4. Pull only the changed services
  5. Restart only the changed services with --wait --wait-timeout 180

When docker-compose.yml itself changes, the workflow escalates to a full docker compose up -d --wait --wait-timeout 180. That is an intentional distinction between application code changes and service wiring changes.

4. The failure path matters as much as the success path

One of the strongest parts of the workflow is that it does not treat deploy failure as a black box.

The deploy step wraps docker compose pull in a bounded retry with exponential backoff:

pull_service_with_retry() {
  local service="$1"
  local attempt=1
  local max_attempts=4
  local delay_seconds=5
 
  while true; do
    if docker compose pull "$service"; then
      return 0
    fi
 
    if [[ "$attempt" -ge "$max_attempts" ]]; then
      return 1
    fi
 
    sleep "$delay_seconds"
    attempt=$((attempt + 1))
    delay_seconds=$((delay_seconds * 2))
  done
}

If the deploy still fails, the next step collects diagnostics rather than stopping at a red badge:

  • docker compose ps
  • targeted logs for the changed services
  • broader service diagnostics when the deploy scope is all

That keeps the workflow operationally useful, not just technically automated.

Selective deploy rules and deploy step excerpt

5. Deployment alerts stay operational, not ceremonial

The Slack payload includes the details that actually help with follow-up:

  • environment
  • service scope
  • commit SHA
  • actor
  • deploy result
  • workflow context

That gives the team a compact answer to “what deployed, for which environment, and with what result?”

6. What stayed intentionally manual

The deployment runbook also makes it clear that not everything should be folded into CI/CD.

The one-time SQLite to PostgreSQL migration remains a manual cutover procedure. The deployment notes keep it separate from the main pipeline, which is a sensible boundary. A one-off data transition should not expand the failure surface of a routine deploy.

The other clear trade-off is the self-hosted runner itself. It adds an infrastructure dependency compared with GitHub-hosted runners, but it also keeps the deploy steps close to the runtime they manage.

Closing

The most useful thing about this pipeline is not novelty. It is clarity.

  • Change scope is decided once
  • Tests and builds follow that scope
  • Production deploys update only what changed
  • Failures gather targeted diagnostics and deployment alerts in the same workflow

For a project like Place2Page, where API, Web, and deployment docs evolve together, that level of explicitness is worth a lot.

If we extend this further, the next obvious step would be post-deploy smoke checks and a small set of deploy metrics inside the same workflow.

Sources