Health Checks
Letting infrastructure know when your service is ready
Key Takeaways
- ✓Liveness probes (/healthz) check if the process is alive — readiness probes (/readyz) check if it can serve traffic
- ✓Readiness checks must verify actual dependency connectivity, not just return 200
- ✓Run dependency checks in parallel with timeouts so one slow dependency does not block the entire check
- ✓Return structured JSON with per-dependency status so operators can quickly identify which dependency failed
What are Health Checks?
Health checks are HTTP endpoints that report whether your service is functioning correctly. Infrastructure tools (load balancers, orchestrators, monitoring systems) poll these endpoints to make automated decisions about routing traffic, restarting containers, and triggering alerts.
There are two distinct types:
- Liveness (
/healthz): "Is the process alive and not deadlocked?" — if this fails, the process should be restarted - Readiness (
/readyz): "Can this instance handle requests right now?" — if this fails, stop sending traffic but don't restart
Why It Matters
A health check that always returns 200 is worse than no health check at all — it tells the infrastructure everything is fine when it's not. Your database could be down, your Redis connection could be broken, and the health check keeps saying "ok." Traffic keeps flowing to a broken instance, and users see errors.
Good health checks are the foundation of self-healing infrastructure. They let Kubernetes automatically restart stuck pods, load balancers route around failed instances, and monitoring systems alert before users notice.
How It Works
Liveness Probe
Simple — just confirm the HTTP server is responding:
GET /healthz → 200 { class="text-pass">"status": class="text-pass">"ok" }If the event loop is blocked or the process is deadlocked, this request will time out, and the orchestrator will restart the container.
Readiness Probe
More sophisticated — verify each dependency is actually reachable:
- Check PostgreSQL: run
SELECT 1 - Check Redis: send
PING, expectPONG - Check any other dependencies
Return 200 only if ALL dependencies are healthy. Return 503 with details if any fail.
Key Design Principles
- Run checks in parallel: Use
Promise.allSettled()so a slow database check doesn't block the Redis check - Use timeouts: A health check that takes 30 seconds to fail is useless. Set 2-second connection timeouts.
- Return structured data: Include a
checksobject showing the status of each dependency, so operators can immediately see what's broken - Use short-lived connections: Don't reuse the main connection pool for health checks — you want to verify you *can* connect, not just that an existing connection works
Common Mistakes
- Always returning 200: The health check should actually test dependencies, not just respond
- Sequential checks: Checking Postgres, then Redis, then another service sequentially means the total check time is the sum of all check times
- No timeouts: A health check blocked on a hanging database connection can cause cascading failures
Further Reading
The definitive guide to Kubernetes health probe configuration, types, and best practices.
How Google implements health checking for load balancing across data centers.
Azure architecture pattern for health endpoint monitoring with implementation guidance.