skip to content
Nameet Rajore

Learning about rate limiting algorithms

/ 10 min read

Table of Contents

If you don’t know what rate limiters are you can refer to this blog: https://www.cloudflare.com/en-gb/learning/bots/what-is-rate-limiting/

Why Rate Limiting?

Rate limiting protects your services from:

  • DoS attacks - Malicious request floods that can take down your service
  • Resource exhaustion - Prevents CPU, memory, and database overload
  • Cost control - Essential for metered APIs and cloud services
  • Fair usage - Prevents one client from starving others

Common limits: 100 req/min (tier 1), 1000 req/min (tier 2), 10000 req/min (enterprise), etc.

Architecture

In a very oversimplified way, this is how a rate limiter works.

flowchart TD
  A[Client] --> B[API Gateway]

  B --> C{Rate Limit Check}

  C -->|Allowed| D[Service]
  C -->|Blocked| E[429 Too Many Requests]

  subgraph Store
    S[Counter / Token Bucket]
  end

  C --> S

Token Bucket

A token bucket controls the amount of requests by using a simple idea:

A “bucket” that fills up with tokens slowly over time and with every request that is allowed, one token must be spent.

A bucket has a maximum capacity which is pre-defined above which no more tokens are filled into the bucket. The tokens are filled in the bucket at a constant rate, which determines the average throughput rate for the rate limiter.

When a request comes in, it is only allowed if there are more than 1 tokens available in the bucket. If not, the request is throttled.

The token bucket algorithm is ideal for services which can experience frequent surges in requests as this algorithm allows any amount of requests as long as there are tokens available in the bucket.

Example:

sequenceDiagram
  participant C as Client
  participant G as API Gateway
  participant R as Rate Limiter
  participant S as Token Store

  Note over R,S: config: capacity=5, refill=1 token/sec
  C->>G: 5 requests at t=0 (burst)
  G->>R: Check (req1)
  R->>S: fetch state (tokens=5, last=0)
  S-->>R: tokens=5
  R-->>G: allow (consume -> tokens=4)
  G-->>C: 200 OK

  G->>R: Check (req2..req5)  %% condensed
  R->>S: tokens=4..1
  R-->>G: allow (consume)
  G-->>C: 200 OK (×4)

  Note over C,R: t=1s later
  C->>G: Request at t=1.0
  G->>R: Check (lazy refill)
  R->>S: fetch state (tokens=0, last=0)
  alt elapsed = 1s
    R->>R: tokens += 1*refill -> tokens=1 (capped to capacity)
  end
  R-->>G: allow (consume -> tokens=0)
  G-->>C: 200 OK

  C->>G: Request at t=1.2
  G->>R: Check
  R->>S: fetch state (tokens=0, last=1.0)
  R-->>G: reject (429)
  G-->>C: 429 Too Many Requests

Implementation

Most implementations don’t refill the bucket at regular intervals, instead they do something called “lazy refill”. This means the tokens get refilled in the bucket only when a request arrives. The amount of tokens which get refilled match the exact number of tokens which would get accumulated if the bucket had been filling up at a constant rate.

flowchart TD
  A[Request arrives] --> B[Read state: tokens, last_timestamp]
  B --> C[elapsed = now - last_timestamp]
  C --> D[add = elapsed * refill_rate]
  D --> E["tokens = min(capacity, tokens + add)"]
  E --> F{tokens >= cost?}
  F -->|yes| G["consume cost (tokens -= cost); allow request"]
  F -->|no| H[reject / queue / throttle]
  G --> I[update state: tokens, last_timestamp=now]
  H --> I

Algorithm in C++

class TokenBucket {
double tokens;
int capacity;
double refillRate; // tokens per second
long long lastRefill;
public:
TokenBucket(int cap, double rate) : capacity(cap), refillRate(rate) {
tokens = cap;
lastRefill = time(0);
}
bool allowRequest(int cost = 1) {
refill();
if (tokens >= cost) {
tokens -= cost;
return true;
}
return false;
}
void refill() {
long long now = time(0);
double elapsed = now - lastRefill;
// Add tokens based on time elapsed
tokens = min((double)capacity, tokens + elapsed * refillRate);
lastRefill = now;
}
};

Complexity Analysis

  • Time: O(1) per request
  • Space: O(1) - only stores tokens, capacity, rate, timestamp

Pros

  • Allows legitimate traffic bursts
  • Simple and memory efficient
  • Flexible - different costs per request type
  • Used by major cloud providers (AWS, GCP)

Cons

  • Doesn’t smooth traffic
  • Large bursts can still overwhelm downstream services
  • Requires careful tuning of capacity vs refill rate

Real-World Usage

  • GitHub API: 5,000 requests/hour per user
  • Stripe API: Token bucket for payment endpoints
  • AWS API Gateway: Default rate limiting mechanism

Leaky Bucket

A leaky bucket algorithm sends requests to the service at a fixed rate no matter how fast they come in. It’s like pouring water into a bucket with a tiny hole at the bottom — you can pour as fast as you want, but the water will only drip out at one steady speed.

graph TB
    subgraph "Leaky Bucket Algorithm"
        A[Incoming Requests] --> B{Bucket}
        B --> C[Queue/Buffer]
        C --> D[Leak Rate Controller]
        D --> E[Outgoing Requests]

        B -.->|Bucket Full| F[Reject Request]
        D -.->|Fixed Rate| G[Constant Output Rate]
    end

Implementation

We usually implement leaky bucket with help of a queue that stores incoming requests upto a maximum capacity, combined with a background process that processes the requests at a fixed rate.

sequenceDiagram
    participant Client
    participant Bucket
    participant LeakController
    participant Server

    Client->>Bucket: Request 1 (t=0)
    Bucket->>Bucket: Add to queue

    Client->>Bucket: Request 2 (t=0.1)
    Bucket->>Bucket: Add to queue

    Client->>Bucket: Request 3 (t=0.2)
    Bucket->>Bucket: Add to queue

    loop Fixed Leak Rate
        LeakController->>Bucket: Process next request
        Bucket->>Server: Forward request
        Server-->>Bucket: Response
        Bucket-->>Client: Response
        Note over LeakController: Wait for fixed interval
    end

    Client->>Bucket: Request N (Bucket Full)
    Bucket-->>Client: 429 Too Many Requests

Algorithm in C++

class LeakyBucket {
queue<int> q;
int capacity;
long long lastLeak;
int leakRate; // requests per second
public:
LeakyBucket(int cap, int rate) : capacity(cap), leakRate(rate) {
lastLeak = time(0);
}
bool addRequest(int id) {
// Leak first
leak();
// Check capacity
if (q.size() >= capacity) {
return false; // Reject
}
q.push(id);
return true; // Accept
}
void leak() {
long long now = time(0);
long long elapsed = now - lastLeak;
// Number of requests to leak based on time elapsed
int toRemove = elapsed * leakRate;
while (toRemove-- > 0 && !q.empty()) {
q.pop();
}
lastLeak = now;
}
};

Complexity Analysis

  • Time: O(M) per request, where M = expired requests to remove
  • Space: O(N) where N = requests in queue

Pros

  • Smooths out traffic bursts into steady stream
  • Predictable output rate
  • Good for network traffic shaping

Cons

  • Requires queue/buffer storage
  • Higher memory usage (O(N))
  • Requests may experience variable latency

Real-World Usage

  • Network QoS (Quality of Service)
  • Video streaming rate control
  • Traffic shaping in routers

Fixed Window Counter

This algorithm divides time into equal intervals where each interval has a maximum number of requests it can serve which is tracked using a counter. This counter resets every time a new interval begins.

graph TB
    subgraph "Fixed Window Counter"
        A[Incoming Request] --> B{Check Current Time}
        B --> C{Window Expired?}
        C -->|Yes| D[Reset Counter to 0]
        C -->|No| E[Keep Current Counter]
        D --> F[Update Window Start Time]
        E --> G{Counter < Limit?}
        F --> G
        G -->|Yes| H[Increment Counter]
        G -->|No| I[Reject Request]
        H --> J[Accept Request]
    end

The Burst Problem

The major issue with fixed window counters is the 2x burst at window boundaries:

graph LR
    subgraph "Burst Problem Example"
        W1["Window 1
10:00:00-10:00:59
100 req at 10:00:30-59"] W2["Window 2
10:01:00-10:01:59
100 req at 10:01:00-29"] W1 --> W2 Problem["⚠️ Result: 200 requests
in 60 seconds
(10:00:30-10:01:29)
2x the limit!"] end

Implementation

sequenceDiagram
    participant Client
    participant RateLimiter
    participant Window as Window State

    Note over Window: Window 1: 10:00:00-10:00:59
Counter=0, Limit=100 Client->>RateLimiter: Request 1 (t=10:00:05) RateLimiter->>Window: Get current time Window-->>RateLimiter: 10:00:05 RateLimiter->>Window: Check if window expired Window-->>RateLimiter: No (still in window) RateLimiter->>Window: Check counter (0 < 100?) Window-->>RateLimiter: Yes RateLimiter->>Window: Increment counter (0→1) RateLimiter-->>Client: 200 OK (Request Accepted) Note over Client,Window: ... more requests ... Client->>RateLimiter: Request 100 (t=10:00:55) RateLimiter->>Window: Check counter (99 < 100?) Window-->>RateLimiter: Yes RateLimiter->>Window: Increment counter (99→100) RateLimiter-->>Client: 200 OK (Request Accepted) Client->>RateLimiter: Request 101 (t=10:00:58) RateLimiter->>Window: Check if window expired Window-->>RateLimiter: No RateLimiter->>Window: Check counter (100 < 100?) Window-->>RateLimiter: No RateLimiter-->>Client: 429 Too Many Requests (REJECTED) Note over Window: Window expires at 10:01:00 Client->>RateLimiter: Request 102 (t=10:01:02) RateLimiter->>Window: Get current time Window-->>RateLimiter: 10:01:02 RateLimiter->>Window: Check if window expired Window-->>RateLimiter: Yes (10:01:02 >= 10:01:00) RateLimiter->>Window: Reset counter to 0 RateLimiter->>Window: Update window start (10:01:02) Note over Window: Window 2: 10:01:02-10:02:01
Counter=0 RateLimiter->>Window: Check counter (0 < 100?) Window-->>RateLimiter: Yes RateLimiter->>Window: Increment counter (0→1) RateLimiter-->>Client: 200 OK (Request Accepted)

Algorithm in C++

class FixedWindowCounter {
int count;
long long windowStart;
int limit;
int windowSize; // seconds
public:
bool allowRequest() {
long long now = time(0);
// Check if window expired
if (now >= windowStart + windowSize) {
count = 0;
windowStart = now;
}
// Check limit
if (count >= limit) {
return false;
}
count++;
return true;
}
};

Complexity Analysis

  • Time: O(1) per request
  • Space: O(1) - only stores counter, window start, limit

Pros

  • Extremely simple to implement
  • Memory efficient (O(1))
  • Easy to understand and debug

Cons

  • Major flaw: Allows 2x burst at window boundaries
  • Not suitable for strict SLAs
  • Doesn’t provide smooth rate limiting

Real-World Usage

  • Simple analytics and metrics counting
  • Non-critical rate limiting
  • Basic logging systems where precision isn’t critical

Sliding Window Log

A sliding window log tracks every request timestamp inside the current time window. On each incoming request, it removes all timestamps older than now - window_size and checks whether the remaining count exceeds the limit. It’s accurate but memory-heavy.

graph TB
    subgraph "Sliding Window Log Architecture"
        A[Incoming Request] --> B[Get Current Timestamp]
        B --> C[Cleanup Phase]
        C --> D{Remove timestamps
older than
now - window_size} D -->|Has old entries| E[Pop from front of log] E --> D D -->|No old entries| F{Log size < Limit?} F -->|Yes| G[Add current timestamp
to log] F -->|No| H[Reject Request] G --> I[Accept Request] end

Implementation

sequenceDiagram
    participant Client
    participant RateLimiter
    participant Log as Timestamp Log
    participant Clock

    Note over Log: Initially empty
Limit: 3 req/10sec Client->>RateLimiter: Request 1 (t=0s) RateLimiter->>Clock: Get current time Clock-->>RateLimiter: t = 0 RateLimiter->>Log: Remove timestamps < (0 - 10) = -10 Log-->>RateLimiter: None to remove RateLimiter->>Log: Count entries Log-->>RateLimiter: 0 entries RateLimiter->>Log: Check (0 < 3?) Log-->>RateLimiter: Yes RateLimiter->>Log: Add timestamp: 0 Note over Log: [0] RateLimiter-->>Client: 200 OK ✓ Client->>RateLimiter: Request 2 (t=2s) RateLimiter->>Clock: Get current time Clock-->>RateLimiter: t = 2 RateLimiter->>Log: Remove timestamps < (2 - 10) = -8 Log-->>RateLimiter: None to remove RateLimiter->>Log: Count entries Log-->>RateLimiter: 1 entry RateLimiter->>Log: Check (1 < 3?) Log-->>RateLimiter: Yes RateLimiter->>Log: Add timestamp: 2 Note over Log: [0, 2] RateLimiter-->>Client: 200 OK ✓ Client->>RateLimiter: Request 3 (t=5s) RateLimiter->>Clock: Get current time Clock-->>RateLimiter: t = 5 RateLimiter->>Log: Remove timestamps < (5 - 10) = -5 Log-->>RateLimiter: None to remove RateLimiter->>Log: Count entries Log-->>RateLimiter: 2 entries RateLimiter->>Log: Check (2 < 3?) Log-->>RateLimiter: Yes RateLimiter->>Log: Add timestamp: 5 Note over Log: [0, 2, 5] RateLimiter-->>Client: 200 OK ✓ Client->>RateLimiter: Request 4 (t=7s) RateLimiter->>Clock: Get current time Clock-->>RateLimiter: t = 7 RateLimiter->>Log: Remove timestamps < (7 - 10) = -3 Log-->>RateLimiter: None to remove RateLimiter->>Log: Count entries Log-->>RateLimiter: 3 entries RateLimiter->>Log: Check (3 < 3?) Log-->>RateLimiter: No Note over Log: [0, 2, 5]
(unchanged) RateLimiter-->>Client: 429 Too Many Requests ✗ Note over Client,Log: Window slides forward... Client->>RateLimiter: Request 5 (t=11s) RateLimiter->>Clock: Get current time Clock-->>RateLimiter: t = 11 RateLimiter->>Log: Remove timestamps < (11 - 10) = 1 activate Log Log->>Log: Check: 0 <= 1? Yes, remove Note over Log: [2, 5] deactivate Log Log-->>RateLimiter: Removed 1 entry RateLimiter->>Log: Count entries Log-->>RateLimiter: 2 entries RateLimiter->>Log: Check (2 < 3?) Log-->>RateLimiter: Yes RateLimiter->>Log: Add timestamp: 11 Note over Log: [2, 5, 11] RateLimiter-->>Client: 200 OK ✓ Client->>RateLimiter: Request 6 (t=13s) RateLimiter->>Clock: Get current time Clock-->>RateLimiter: t = 13 RateLimiter->>Log: Remove timestamps < (13 - 10) = 3 activate Log Log->>Log: Check: 2 <= 3? Yes, remove Note over Log: [5, 11] deactivate Log Log-->>RateLimiter: Removed 1 entry RateLimiter->>Log: Count entries Log-->>RateLimiter: 2 entries RateLimiter->>Log: Check (2 < 3?) Log-->>RateLimiter: Yes RateLimiter->>Log: Add timestamp: 13 Note over Log: [5, 11, 13] RateLimiter-->>Client: 200 OK ✓

Algorithm in C++

class SlidingWindowLog {
list<long long> log; // timestamps
int limit;
int windowSize; // seconds
public:
SlidingWindowLog(int lim, int window) : limit(lim), windowSize(window) {}
bool allowRequest() {
long long now = time(0);
// Remove old timestamps outside window
while (!log.empty() && log.front() <= now - windowSize) {
log.pop_front();
}
// Check if limit exceeded
if (log.size() >= limit) {
return false; // Reject
}
// Add current timestamp
log.push_back(now);
return true; // Accept
}
};

Complexity Analysis

  • Time:

    O(K) per request, where K = number of expired timestamps to remove

    (amortized close to O(1) for evenly spaced traffic)

  • Space:

    O(N), storing one timestamp per request in the window

Pros

  • Very accurate rate limiting
  • No burst problems like fixed windows
  • Easy to reason about
  • Handles irregular traffic patterns well

Cons

  • Needs memory proportional to all requests in the window
  • Slower at high QPS
  • Not ideal for large-scale distributed systems

Real-World Usage

  • Smaller services needing precise control
  • Security-sensitive endpoints
  • API endpoints with low-to-medium traffic where accuracy matters more than memory