HTTP Status Codes for Rate Limiting
When API clients exceed rate limits, the server must communicate this clearly. HTTP provides specific status codes and headers for this purpose. Proper rate-limiting communication enables clients to:
- Understand why requests were rejected
- Determine when to retry
- Adjust behavior to stay within limits
- Build resilient systems
Primary Status Code: 429 Too Many Requests
HTTP 429: The Standard Response
Status: 429 Too Many Requests
Meaning: Client has sent too many requests in a given time window
RFC: Defined in RFC 6585 (Additional HTTP Status Codes)
When to use: Any time client exceeds rate limits
Example:
GET /api/search?q=query HTTP/1.1
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706779200
{
"error": "Rate limit exceeded",
"message": "You have exceeded 100 requests per minute",
"retry_after": 60
}
Why 429 is Better Than Alternatives
Wrong: Using 503 Service Unavailable
503 confuses clients into thinking SERVER is broken
Clients may stop retrying for long periods
Suggests widespread outage rather than client's fault
Wrong: Using 403 Forbidden
403 is for authorization/permission issues
Rate limiting is temporary, not permanent
Client might never retry
Correct: Using 429 Too Many Requests
Clearly indicates rate limiting, not server error
Client knows they're temporarily throttled
Client can intelligently retry after specified time
Doesn't suggest permanent access denial
Rate Limiting Response Headers
Essential Headers for Rate Limiting
Retry-After: When client should retry
Retry-After: 60
[Server will accept requests in 60 seconds]
or
Retry-After: Sun, 31 Dec 2025 23:59:59 GMT
[Server will accept requests after this timestamp]
X-RateLimit-Limit: Maximum requests allowed in time window
X-RateLimit-Limit: 100
[Client allowed 100 requests per minute/hour/etc]
X-RateLimit-Remaining: Requests remaining in current window
X-RateLimit-Remaining: 42
[Client has 42 requests left before hitting limit]
X-RateLimit-Reset: When the limit window resets (Unix timestamp)
X-RateLimit-Reset: 1706779200
[Limit resets at this Unix timestamp]
Complete Rate Limit Headers Example
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1706779200
[Response body]
When limit exceeded:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706779200
Retry-After: 45
{
"error": "Rate limit exceeded"
}
Industry Standard Headers
GitHub API:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 59
X-RateLimit-Reset: 1372700873
Stripe API:
Retry-After: 2
Twitter API:
X-Rate-Limit-Limit: 15
X-Rate-Limit-Remaining: 14
X-Rate-Limit-Reset: 1420070400
AWS API:
x-amzn-RequestId: request-id
x-amzn-RateLimit-Limit: 3000
Rate Limiting Strategies and Status Codes
Strategy 1: Strict Rate Limiting (429 for Any Overage)
Approach: Reject any request exceeding limit
Window: 1 minute
Limit: 100 requests per minute
Request 1-100: Accept (200 OK)
Request 101: Reject (429 Too Many Requests)
When to use:
- Public APIs needing strict control
- Preventing abuse
- Resource constraints
- Fair use enforcement
Response:
429 Too Many Requests
Retry-After: 45
X-RateLimit-Remaining: 0
Strategy 2: Soft Limit (Warning Headers, No 429)
Approach: Allow overage but warn client with headers
Limit: 100 requests per minute
Soft threshold: 95 requests
Request 1-95: Accept (200 OK + warning headers)
Request 96-105: Accept (200 OK + urgent warning headers)
Request 106+: Reject (429 Too Many Requests)
Response for request 96:
200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 4
X-RateLimit-Warning: true
X-RateLimit-Warning-Threshold: 95
[Response body]
When to use:
- Premium customers
- Internal APIs
- Trusted partners
- Progressive degradation
Strategy 3: Graduated Rate Limiting (Progressive Delays)
Approach: Progressively increase response time as limit approached
Request 1-80: Normal response (0ms delay)
Request 81-95: Slight delay (100ms added)
Request 96-100: Moderate delay (500ms added)
Request 101+: Reject (429 Too Many Requests)
Response:
200 OK
X-RateLimit-Remaining: 5
[Response delayed by 500ms before sending]
When to use:
- Natural traffic smoothing
- Encouraging less aggressive clients
- Protecting backend resources
Strategy 4: Queue-Based Rate Limiting (202 Accepted)
Approach: Accept requests above limit but queue them
Limit: 100 requests per second
Request 1-100: Process immediately (200 OK)
Request 101-200: Queue for later (202 Accepted)
Request 201+: Reject or queue (202 or 429)
Response for queued request:
202 Accepted
Location: /queue/jobs/abc-123
Retry-After: 30
{
"status": "queued",
"job_id": "abc-123",
"queue_position": 45
}
When to use:
- Batch processing APIs
- Long-running operations
- Fair resource allocation
- User-facing APIs valuing reliability
Implementing Rate Limiting with HTTP Status Codes
Implementation Pattern
client → request → rate_limiter → decision
if request_count <= limit:
├─ response status: 200 OK
├─ add rate limit headers
└─ process request
else if request_count > limit:
├─ if can_queue:
│ └─ status: 202 Accepted (if async job queue)
├─ else:
│ ├─ status: 429 Too Many Requests
│ ├─ Retry-After header
│ └─ rate limit headers
└─ reject request
Code Example
from flask import request, jsonify
RATE_LIMIT_REQUESTS = 100
RATE_LIMIT_WINDOW = 60 # seconds
def check_rate_limit(client_id):
"""Check if client has exceeded rate limit"""
key = f"rate_limit:{client_id}"
current_count = redis.incr(key)
if current_count == 1:
# First request in window, set expiration
redis.expire(key, RATE_LIMIT_WINDOW)
ttl = redis.ttl(key)
limit_reset = int(time.time()) + ttl
return {
"count": current_count,
"limit": RATE_LIMIT_REQUESTS,
"reset": limit_reset,
"remaining": max(0, RATE_LIMIT_REQUESTS - current_count)
}
@app.route("/api/data")
def get_data():
client_id = get_client_id(request)
rate_limit = check_rate_limit(client_id)
# Add rate limit headers
headers = {
"X-RateLimit-Limit": str(rate_limit["limit"]),
"X-RateLimit-Remaining": str(rate_limit["remaining"]),
"X-RateLimit-Reset": str(rate_limit["reset"])
}
# Check if exceeded
if rate_limit["count"] > RATE_LIMIT_REQUESTS:
retry_after = rate_limit["reset"] - int(time.time())
return jsonify({
"error": "Rate limit exceeded",
"retry_after": retry_after
}), 429, {
**headers,
"Retry-After": str(retry_after)
}
# Request within limit
return jsonify({"data": "value"}), 200, headers
Multiple Rate Limit Tiers
Tiered Rate Limiting
Many APIs use multiple limits (requests per second, minute, hour, day):
API Limits for user:
├─ Per second: 10 requests
├─ Per minute: 100 requests
├─ Per hour: 1000 requests
└─ Per day: 10,000 requests
Check in order (most restrictive first):
1. If per-second limit exceeded → 429
2. If per-minute limit exceeded → 429
3. If per-hour limit exceeded → 429
4. If per-day limit exceeded → 429
5. Otherwise → 200 OK
Response for Tiered Limits
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706779200
X-RateLimit-Limit-Type: minute
{
"error": "Rate limit exceeded",
"limit_type": "minute",
"limit": 100,
"retry_after": 45
}
Differentiated Rate Limits
Based on Client Type
Free tier:
├─ 10 requests per second
├─ 100 requests per minute
└─ 1000 requests per hour
Pro tier:
├─ 100 requests per second
├─ 1000 requests per minute
└─ 10,000 requests per hour
Enterprise tier:
├─ 1000 requests per second
├─ No minute limit
└─ No hour limit
Response varies:
Free tier exceeds limit:
429 Too Many Requests
Pro tier has higher limit:
Same API, but 429 only at higher threshold
Based on Resource Intensity
Simple queries: 1000 requests per minute
Complex queries: 10 requests per minute
Heavy computations: 1 request per minute
Rate limiting by resource cost:
Request /search?q=simple → Cost: 1 unit
Request /analytics/report → Cost: 100 units
Request /ml-model/predict → Cost: 500 units
Limit: 1000 units per minute
/search 1000 times: OK (1000 units)
/analytics 5 times + /search 500 times: OK (1000 units)
Client-Friendly Rate Limit Communication
Error Response with Guidance
{
"error": "Rate limit exceeded",
"code": "RATE_LIMIT_EXCEEDED",
"message": "You have exceeded 100 requests per minute",
"limit": {
"requests": 100,
"window": "minute"
},
"current": {
"requests": 105,
"window_reset": "2025-01-31T10:05:00Z"
},
"retry": {
"after_seconds": 45,
"after_time": "2025-01-31T10:04:45Z"
},
"upgrade": {
"message": "Upgrade to Pro for higher limits",
"url": "https://api.example.com/pricing"
}
}
Helpful Response Codes
429 Too Many Requests ← Standard rate limit
Retry-After: 60 ← When to retry
X-RateLimit-Limit: 100 ← Your limit
X-RateLimit-Remaining: 0 ← How many left
X-RateLimit-Reset: timestamp ← When limit resets
Testing Rate Limiting
Test Case Matrix
Test | Condition | Expected Status | Expected Headers
-----|-----------|-----------------|------------------
1 | Under limit | 200 OK | Remaining > 0
2 | At limit | 200 OK | Remaining = 0
3 | Over limit | 429 | Retry-After set
4 | Window reset | 200 OK | Remaining reset
Rate Limit Test Example
import time
def test_rate_limiting():
client = APIClient()
# Make requests up to limit
for i in range(100):
response = client.get("/api/data")
assert response.status_code == 200
assert int(response.headers["X-RateLimit-Remaining"]) == 99 - i
# Next request exceeds limit
response = client.get("/api/data")
assert response.status_code == 429
assert "Retry-After" in response.headers
retry_after = int(response.headers["Retry-After"])
assert retry_after > 0
# Wait and retry
time.sleep(retry_after + 1)
response = client.get("/api/data")
assert response.status_code == 200 # Should work now
Best Practices Summary
DO:
- ✓ Use 429 for rate limit exceeded
- ✓ Include Retry-After header
- ✓ Provide X-RateLimit-* headers on all responses
- ✓ Be generous with rate limits initially
- ✓ Communicate limits clearly in documentation
- ✓ Offer upgrade paths for higher limits
- ✓ Document retry strategy
DON'T:
- ✗ Use 503 for rate limiting (server not broken)
- ✗ Use 403 (not permission-related)
- ✗ Omit Retry-After header
- ✗ Have unclear rate limiting rules
- ✗ Lock out clients permanently
- ✗ Change limits without notice
- ✗ Implement silently dropping requests
Conclusion
Proper HTTP status codes and headers for rate limiting enable clients to gracefully handle throttling and adjust their behavior. Using 429 Too Many Requests with appropriate headers like Retry-After and X-RateLimit-* provides clear, actionable feedback that clients can respond to intelligently.
Well-implemented rate limiting communicates clearly, provides guidance on when to retry, and enables a positive experience even when requests are temporarily throttled. This benefits both API providers (protecting resources) and clients (knowing exactly how to behave).

