Rate Limiting
Protecting services from excessive traffic
Key Takeaways
- ✓Rate limiting protects your service from abuse and ensures fair resource allocation across clients
- ✓Use atomic Redis operations (INCR + EXPIRE or Lua scripts) to count requests without race conditions
- ✓Return standard headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) so clients can self-throttle
- ✓The 429 Too Many Requests status code with a Retry-After header tells clients exactly when to try again
What is Rate Limiting?
Rate limiting controls how many requests a client can make to your API within a time window. When a client exceeds the limit, the server rejects subsequent requests with a 429 status code until the window resets.
It's one of the most fundamental protection mechanisms in production APIs — without it, a single misbehaving client (or attacker) can overwhelm your service and degrade it for everyone.
Why It Matters
Without rate limiting, your API is vulnerable to:
- Denial of service: Intentional or accidental traffic spikes that exhaust server resources
- Unfair usage: One client consuming a disproportionate share of capacity
- Cost overruns: Uncontrolled API calls driving up infrastructure costs
- Cascading failures: Overloaded services failing and taking down dependent systems
Every major API (GitHub, Stripe, Twitter) implements rate limiting. It's expected behavior, not an inconvenience.
How It Works
The most common approach is the fixed window counter:
- Define a window (e.g., 60 seconds) and a limit (e.g., 10 requests)
- For each request, increment a counter keyed by the client identifier
- If the counter exceeds the limit, reject with 429
- When the window expires, the counter resets
The Concurrency Problem
A naive implementation with separate "read count, then increment" steps has a race condition. Ten concurrent requests all read "count = 9" and all pass the check, allowing 19 requests through a limit of 10.
The fix: use atomic operations. In Redis, INCR atomically increments and returns the new value in a single operation. Combine it with EXPIRE to auto-reset the window, or use a Lua script for guaranteed atomicity.
Response Headers
Standard rate limit headers help clients self-regulate:
X-RateLimit-Limit: The maximum number of requests allowed per windowX-RateLimit-Remaining: How many requests the client has leftX-RateLimit-Reset: Unix timestamp when the window resetsRetry-After: Seconds until the client should retry (on 429 responses)
Common Mistakes
- Non-atomic counting: Reading and incrementing in separate operations allows concurrent requests to bypass the limit.
- Missing headers: Without rate limit headers, clients can't adjust their behavior proactively.
- Per-server counting: If you store counts in memory, each server has its own counter. Use a shared store like Redis.
Further Reading
Clear overview of rate limiting concepts, algorithms, and why it matters for web infrastructure.
How to implement rate limiting using Redis data structures and atomic operations.
The official specification for the 429 Too Many Requests HTTP status code.