What HTTP status codes should I use for API rate limiting and throttling?

HTTP Status Codes for Rate Limiting

When API clients exceed rate limits, the server must communicate this clearly. HTTP provides specific status codes and headers for this purpose. Proper rate-limiting communication enables clients to:

Understand why requests were rejected
Determine when to retry
Adjust behavior to stay within limits
Build resilient systems

Primary Status Code: 429 Too Many Requests

HTTP 429: The Standard Response

Status: 429 Too Many Requests

Meaning: Client has sent too many requests in a given time window

RFC: Defined in RFC 6585 (Additional HTTP Status Codes)

When to use: Any time client exceeds rate limits

Example:

GET /api/search?q=query HTTP/1.1

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706779200

{
  "error": "Rate limit exceeded",
  "message": "You have exceeded 100 requests per minute",
  "retry_after": 60
}

Why 429 is Better Than Alternatives

Wrong: Using 503 Service Unavailable

503 confuses clients into thinking SERVER is broken
Clients may stop retrying for long periods
Suggests widespread outage rather than client's fault

Wrong: Using 403 Forbidden

403 is for authorization/permission issues
Rate limiting is temporary, not permanent
Client might never retry

Correct: Using 429 Too Many Requests

Clearly indicates rate limiting, not server error
Client knows they're temporarily throttled
Client can intelligently retry after specified time
Doesn't suggest permanent access denial

Rate Limiting Response Headers

Essential Headers for Rate Limiting

Retry-After: When client should retry

Retry-After: 60
[Server will accept requests in 60 seconds]

or

Retry-After: Sun, 31 Dec 2025 23:59:59 GMT
[Server will accept requests after this timestamp]

X-RateLimit-Limit: Maximum requests allowed in time window

X-RateLimit-Limit: 100
[Client allowed 100 requests per minute/hour/etc]

X-RateLimit-Remaining: Requests remaining in current window

X-RateLimit-Remaining: 42
[Client has 42 requests left before hitting limit]

X-RateLimit-Reset: When the limit window resets (Unix timestamp)

X-RateLimit-Reset: 1706779200
[Limit resets at this Unix timestamp]

Complete Rate Limit Headers Example

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1706779200

[Response body]

When limit exceeded:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706779200
Retry-After: 45

{
  "error": "Rate limit exceeded"
}

Industry Standard Headers

GitHub API:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 59
X-RateLimit-Reset: 1372700873

Stripe API:

Retry-After: 2

Twitter API:

X-Rate-Limit-Limit: 15
X-Rate-Limit-Remaining: 14
X-Rate-Limit-Reset: 1420070400

AWS API:

x-amzn-RequestId: request-id
x-amzn-RateLimit-Limit: 3000

Rate Limiting Strategies and Status Codes

Strategy 1: Strict Rate Limiting (429 for Any Overage)

Approach: Reject any request exceeding limit

Window: 1 minute
Limit: 100 requests per minute

Request 1-100: Accept (200 OK)
Request 101: Reject (429 Too Many Requests)

When to use:

Public APIs needing strict control
Preventing abuse
Resource constraints
Fair use enforcement

Response:

429 Too Many Requests
Retry-After: 45
X-RateLimit-Remaining: 0

Strategy 2: Soft Limit (Warning Headers, No 429)

Approach: Allow overage but warn client with headers

Limit: 100 requests per minute
Soft threshold: 95 requests

Request 1-95: Accept (200 OK + warning headers)
Request 96-105: Accept (200 OK + urgent warning headers)
Request 106+: Reject (429 Too Many Requests)

Response for request 96:

200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 4
X-RateLimit-Warning: true
X-RateLimit-Warning-Threshold: 95

[Response body]

When to use:

Premium customers
Internal APIs
Trusted partners
Progressive degradation

Strategy 3: Graduated Rate Limiting (Progressive Delays)

Approach: Progressively increase response time as limit approached

Request 1-80: Normal response (0ms delay)
Request 81-95: Slight delay (100ms added)
Request 96-100: Moderate delay (500ms added)
Request 101+: Reject (429 Too Many Requests)

Response:

200 OK
X-RateLimit-Remaining: 5
[Response delayed by 500ms before sending]

When to use:

Natural traffic smoothing
Encouraging less aggressive clients
Protecting backend resources

Strategy 4: Queue-Based Rate Limiting (202 Accepted)

Approach: Accept requests above limit but queue them

Limit: 100 requests per second
Request 1-100: Process immediately (200 OK)
Request 101-200: Queue for later (202 Accepted)
Request 201+: Reject or queue (202 or 429)

Response for queued request:

202 Accepted
Location: /queue/jobs/abc-123
Retry-After: 30

{
  "status": "queued",
  "job_id": "abc-123",
  "queue_position": 45
}

When to use:

Batch processing APIs
Long-running operations
Fair resource allocation
User-facing APIs valuing reliability

Implementing Rate Limiting with HTTP Status Codes

Implementation Pattern

client → request → rate_limiter → decision

if request_count <= limit:
    ├─ response status: 200 OK
    ├─ add rate limit headers
    └─ process request

else if request_count > limit:
    ├─ if can_queue:
    │  └─ status: 202 Accepted (if async job queue)
    ├─ else:
    │  ├─ status: 429 Too Many Requests
    │  ├─ Retry-After header
    │  └─ rate limit headers
    └─ reject request

Code Example

from flask import request, jsonify

RATE_LIMIT_REQUESTS = 100
RATE_LIMIT_WINDOW = 60  # seconds

def check_rate_limit(client_id):
    """Check if client has exceeded rate limit"""
    key = f"rate_limit:{client_id}"
    current_count = redis.incr(key)

    if current_count == 1:
        # First request in window, set expiration
        redis.expire(key, RATE_LIMIT_WINDOW)

    ttl = redis.ttl(key)
    limit_reset = int(time.time()) + ttl

    return {
        "count": current_count,
        "limit": RATE_LIMIT_REQUESTS,
        "reset": limit_reset,
        "remaining": max(0, RATE_LIMIT_REQUESTS - current_count)
    }

@app.route("/api/data")
def get_data():
    client_id = get_client_id(request)
    rate_limit = check_rate_limit(client_id)

    # Add rate limit headers
    headers = {
        "X-RateLimit-Limit": str(rate_limit["limit"]),
        "X-RateLimit-Remaining": str(rate_limit["remaining"]),
        "X-RateLimit-Reset": str(rate_limit["reset"])
    }

    # Check if exceeded
    if rate_limit["count"] > RATE_LIMIT_REQUESTS:
        retry_after = rate_limit["reset"] - int(time.time())
        return jsonify({
            "error": "Rate limit exceeded",
            "retry_after": retry_after
        }), 429, {
            **headers,
            "Retry-After": str(retry_after)
        }

    # Request within limit
    return jsonify({"data": "value"}), 200, headers

Multiple Rate Limit Tiers

Tiered Rate Limiting

Many APIs use multiple limits (requests per second, minute, hour, day):

API Limits for user:
├─ Per second: 10 requests
├─ Per minute: 100 requests
├─ Per hour: 1000 requests
└─ Per day: 10,000 requests

Check in order (most restrictive first):
1. If per-second limit exceeded → 429
2. If per-minute limit exceeded → 429
3. If per-hour limit exceeded → 429
4. If per-day limit exceeded → 429
5. Otherwise → 200 OK

Response for Tiered Limits

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706779200
X-RateLimit-Limit-Type: minute

{
  "error": "Rate limit exceeded",
  "limit_type": "minute",
  "limit": 100,
  "retry_after": 45
}

Differentiated Rate Limits

Based on Client Type

Free tier:
├─ 10 requests per second
├─ 100 requests per minute
└─ 1000 requests per hour

Pro tier:
├─ 100 requests per second
├─ 1000 requests per minute
└─ 10,000 requests per hour

Enterprise tier:
├─ 1000 requests per second
├─ No minute limit
└─ No hour limit

Response varies:

Free tier exceeds limit:
429 Too Many Requests

Pro tier has higher limit:
Same API, but 429 only at higher threshold

Based on Resource Intensity

Simple queries: 1000 requests per minute
Complex queries: 10 requests per minute
Heavy computations: 1 request per minute

Rate limiting by resource cost:

Request /search?q=simple → Cost: 1 unit
Request /analytics/report → Cost: 100 units
Request /ml-model/predict → Cost: 500 units

Limit: 1000 units per minute

/search 1000 times: OK (1000 units)
/analytics 5 times + /search 500 times: OK (1000 units)

Client-Friendly Rate Limit Communication

Error Response with Guidance

{
  "error": "Rate limit exceeded",
  "code": "RATE_LIMIT_EXCEEDED",
  "message": "You have exceeded 100 requests per minute",
  "limit": {
    "requests": 100,
    "window": "minute"
  },
  "current": {
    "requests": 105,
    "window_reset": "2025-01-31T10:05:00Z"
  },
  "retry": {
    "after_seconds": 45,
    "after_time": "2025-01-31T10:04:45Z"
  },
  "upgrade": {
    "message": "Upgrade to Pro for higher limits",
    "url": "https://api.example.com/pricing"
  }
}

Helpful Response Codes

429 Too Many Requests        ← Standard rate limit
Retry-After: 60              ← When to retry
X-RateLimit-Limit: 100       ← Your limit
X-RateLimit-Remaining: 0     ← How many left
X-RateLimit-Reset: timestamp ← When limit resets

Testing Rate Limiting

Test Case Matrix

Test | Condition | Expected Status | Expected Headers
-----|-----------|-----------------|------------------
1    | Under limit | 200 OK        | Remaining > 0
2    | At limit   | 200 OK         | Remaining = 0
3    | Over limit | 429            | Retry-After set
4    | Window reset | 200 OK       | Remaining reset

Rate Limit Test Example

import time

def test_rate_limiting():
    client = APIClient()

    # Make requests up to limit
    for i in range(100):
        response = client.get("/api/data")
        assert response.status_code == 200
        assert int(response.headers["X-RateLimit-Remaining"]) == 99 - i

    # Next request exceeds limit
    response = client.get("/api/data")
    assert response.status_code == 429
    assert "Retry-After" in response.headers

    retry_after = int(response.headers["Retry-After"])
    assert retry_after > 0

    # Wait and retry
    time.sleep(retry_after + 1)
    response = client.get("/api/data")
    assert response.status_code == 200  # Should work now

Best Practices Summary

DO:

✓ Use 429 for rate limit exceeded
✓ Include Retry-After header
✓ Provide X-RateLimit-* headers on all responses
✓ Be generous with rate limits initially
✓ Communicate limits clearly in documentation
✓ Offer upgrade paths for higher limits
✓ Document retry strategy

DON'T:

✗ Use 503 for rate limiting (server not broken)
✗ Use 403 (not permission-related)
✗ Omit Retry-After header
✗ Have unclear rate limiting rules
✗ Lock out clients permanently
✗ Change limits without notice
✗ Implement silently dropping requests

Conclusion

Proper HTTP status codes and headers for rate limiting enable clients to gracefully handle throttling and adjust their behavior. Using 429 Too Many Requests with appropriate headers like Retry-After and X-RateLimit-* provides clear, actionable feedback that clients can respond to intelligently.

Well-implemented rate limiting communicates clearly, provides guidance on when to retry, and enables a positive experience even when requests are temporarily throttled. This benefits both API providers (protecting resources) and clients (knowing exactly how to behave).

What HTTP status codes should I use for API rate limiting and throttling?

HTTP Status Codes for Rate Limiting

Primary Status Code: 429 Too Many Requests

HTTP 429: The Standard Response

Why 429 is Better Than Alternatives

Rate Limiting Response Headers

Essential Headers for Rate Limiting

Complete Rate Limit Headers Example

Industry Standard Headers

Rate Limiting Strategies and Status Codes

Strategy 1: Strict Rate Limiting (429 for Any Overage)

Strategy 2: Soft Limit (Warning Headers, No 429)

Strategy 3: Graduated Rate Limiting (Progressive Delays)

Strategy 4: Queue-Based Rate Limiting (202 Accepted)

Implementing Rate Limiting with HTTP Status Codes

Implementation Pattern

Code Example

Multiple Rate Limit Tiers

Tiered Rate Limiting

Response for Tiered Limits

Differentiated Rate Limits

Based on Client Type

Based on Resource Intensity

Client-Friendly Rate Limit Communication

Error Response with Guidance

Helpful Response Codes

Testing Rate Limiting

Test Case Matrix

Rate Limit Test Example

Best Practices Summary

Conclusion

Need Expert IT & Security Guidance?

How should APIs use status codes for RESTful responses?

Can I compare JSON, XML, or structured data?

How do I create and apply patch files from diffs?

What HTTP status codes should I use for API rate limiting and throttling?

HTTP Status Codes for Rate Limiting

Primary Status Code: 429 Too Many Requests

HTTP 429: The Standard Response

Why 429 is Better Than Alternatives

Rate Limiting Response Headers

Essential Headers for Rate Limiting

Complete Rate Limit Headers Example

Industry Standard Headers

Rate Limiting Strategies and Status Codes

Strategy 1: Strict Rate Limiting (429 for Any Overage)

Strategy 2: Soft Limit (Warning Headers, No 429)

Strategy 3: Graduated Rate Limiting (Progressive Delays)

Strategy 4: Queue-Based Rate Limiting (202 Accepted)

Implementing Rate Limiting with HTTP Status Codes

Implementation Pattern

Code Example

Multiple Rate Limit Tiers

Tiered Rate Limiting

Response for Tiered Limits

Differentiated Rate Limits

Based on Client Type

Based on Resource Intensity

Client-Friendly Rate Limit Communication

Error Response with Guidance

Helpful Response Codes

Testing Rate Limiting

Test Case Matrix

Rate Limit Test Example

Best Practices Summary

Conclusion

Need Expert IT & Security Guidance?

Related Articles

How should APIs use status codes for RESTful responses?

Can I compare JSON, XML, or structured data?

How do I create and apply patch files from diffs?