Design a rate limiter service that restricts the number of requests a client can make within a given time period. This is essential for preventing abuse and ensuring fair resource usage.
Constraints
Functional: Limit requests per client per window, multiple strategies (fixed/sliding window, token bucket), per-user limits, whitelisting, rate limit headers in response
Non-functional: Low latency (< 1ms overhead), millions of requests/second, precise limits, works across multiple servers
Scale: 10M requests/s, 100M unique clients, ~100 bytes per client (~10 GB total), 1-minute window
Design Considerations
Think about: How to track request counts efficiently (in-memory vs distributed cache); which rate limiting algorithm to use (fixed window, sliding window, token bucket); how to handle distributed rate limiting across multiple servers; how to store and retrieve rate limit data quickly; how to handle race conditions in distributed systems; how to implement different rate limits for different endpoints or users; how to handle rate limit expiration and cleanup.
Good luck!