Learning about rate limiting algorithms • Nameet Rajore

If you don’t know what rate limiters are you can refer to this blog: https://www.cloudflare.com/en-gb/learning/bots/what-is-rate-limiting/

Why Rate Limiting?

Rate limiting protects your services from:

DoS attacks - Malicious request floods that can take down your service
Resource exhaustion - Prevents CPU, memory, and database overload
Cost control - Essential for metered APIs and cloud services
Fair usage - Prevents one client from starving others

Common limits: 100 req/min (tier 1), 1000 req/min (tier 2), 10000 req/min (enterprise), etc.

Architecture

In a very oversimplified way, this is how a rate limiter works.

flowchart TD
  A[Client] --> B[API Gateway]

  B --> C{Rate Limit Check}

  C -->|Allowed| D[Service]
  C -->|Blocked| E[429 Too Many Requests]

  subgraph Store
    S[Counter / Token Bucket]
  end

  C --> S

Token Bucket

A token bucket controls the amount of requests by using a simple idea:

A “bucket” that fills up with tokens slowly over time and with every request that is allowed, one token must be spent.

A bucket has a maximum capacity which is pre-defined above which no more tokens are filled into the bucket. The tokens are filled in the bucket at a constant rate, which determines the average throughput rate for the rate limiter.

When a request comes in, it is only allowed if there are more than 1 tokens available in the bucket. If not, the request is throttled.

The token bucket algorithm is ideal for services which can experience frequent surges in requests as this algorithm allows any amount of requests as long as there are tokens available in the bucket.

Example:

sequenceDiagram
  participant C as Client
  participant G as API Gateway
  participant R as Rate Limiter
  participant S as Token Store

  Note over R,S: config: capacity=5, refill=1 token/sec
  C->>G: 5 requests at t=0 (burst)
  G->>R: Check (req1)
  R->>S: fetch state (tokens=5, last=0)
  S-->>R: tokens=5
  R-->>G: allow (consume -> tokens=4)
  G-->>C: 200 OK

  G->>R: Check (req2..req5)  %% condensed
  R->>S: tokens=4..1
  R-->>G: allow (consume)
  G-->>C: 200 OK (×4)

  Note over C,R: t=1s later
  C->>G: Request at t=1.0
  G->>R: Check (lazy refill)
  R->>S: fetch state (tokens=0, last=0)
  alt elapsed = 1s
    R->>R: tokens += 1*refill -> tokens=1 (capped to capacity)
  end
  R-->>G: allow (consume -> tokens=0)
  G-->>C: 200 OK

  C->>G: Request at t=1.2
  G->>R: Check
  R->>S: fetch state (tokens=0, last=1.0)
  R-->>G: reject (429)
  G-->>C: 429 Too Many Requests

Implementation

Most implementations don’t refill the bucket at regular intervals, instead they do something called “lazy refill”. This means the tokens get refilled in the bucket only when a request arrives. The amount of tokens which get refilled match the exact number of tokens which would get accumulated if the bucket had been filling up at a constant rate.

flowchart TD
  A[Request arrives] --> B[Read state: tokens, last_timestamp]
  B --> C[elapsed = now - last_timestamp]
  C --> D[add = elapsed * refill_rate]
  D --> E["tokens = min(capacity, tokens + add)"]
  E --> F{tokens >= cost?}
  F -->|yes| G["consume cost (tokens -= cost); allow request"]
  F -->|no| H[reject / queue / throttle]
  G --> I[update state: tokens, last_timestamp=now]
  H --> I

Algorithm in C++

class TokenBucket {
    double tokens;
    int capacity;
    double refillRate;  // tokens per second
    long long lastRefill;

public:
    TokenBucket(int cap, double rate) : capacity(cap), refillRate(rate) {
        tokens = cap;
        lastRefill = time(0);
    }

    bool allowRequest(int cost = 1) {
        refill();

        if (tokens >= cost) {
            tokens -= cost;
            return true;
        }
        return false;
    }

    void refill() {
        long long now = time(0);
        double elapsed = now - lastRefill;

        // Add tokens based on time elapsed
        tokens = min((double)capacity, tokens + elapsed * refillRate);
        lastRefill = now;
    }
};

Complexity Analysis

Time: O(1) per request
Space: O(1) - only stores tokens, capacity, rate, timestamp

Pros

Allows legitimate traffic bursts
Simple and memory efficient
Flexible - different costs per request type
Used by major cloud providers (AWS, GCP)

Cons

Doesn’t smooth traffic
Large bursts can still overwhelm downstream services
Requires careful tuning of capacity vs refill rate

Real-World Usage

GitHub API: 5,000 requests/hour per user
Stripe API: Token bucket for payment endpoints
AWS API Gateway: Default rate limiting mechanism

Leaky Bucket

A leaky bucket algorithm sends requests to the service at a fixed rate no matter how fast they come in. It’s like pouring water into a bucket with a tiny hole at the bottom — you can pour as fast as you want, but the water will only drip out at one steady speed.

graph TB
    subgraph "Leaky Bucket Algorithm"
        A[Incoming Requests] --> B{Bucket}
        B --> C[Queue/Buffer]
        C --> D[Leak Rate Controller]
        D --> E[Outgoing Requests]

        B -.->|Bucket Full| F[Reject Request]
        D -.->|Fixed Rate| G[Constant Output Rate]
    end

Implementation

We usually implement leaky bucket with help of a queue that stores incoming requests upto a maximum capacity, combined with a background process that processes the requests at a fixed rate.

sequenceDiagram
    participant Client
    participant Bucket
    participant LeakController
    participant Server

    Client->>Bucket: Request 1 (t=0)
    Bucket->>Bucket: Add to queue

    Client->>Bucket: Request 2 (t=0.1)
    Bucket->>Bucket: Add to queue

    Client->>Bucket: Request 3 (t=0.2)
    Bucket->>Bucket: Add to queue

    loop Fixed Leak Rate
        LeakController->>Bucket: Process next request
        Bucket->>Server: Forward request
        Server-->>Bucket: Response
        Bucket-->>Client: Response
        Note over LeakController: Wait for fixed interval
    end

    Client->>Bucket: Request N (Bucket Full)
    Bucket-->>Client: 429 Too Many Requests

Algorithm in C++

class LeakyBucket {
    queue<int> q;
    int capacity;
    long long lastLeak;
    int leakRate;  // requests per second

public:
    LeakyBucket(int cap, int rate) : capacity(cap), leakRate(rate) {
        lastLeak = time(0);
    }

    bool addRequest(int id) {
        // Leak first
        leak();

        // Check capacity
        if (q.size() >= capacity) {
            return false;  // Reject
        }

        q.push(id);
        return true;  // Accept
    }

    void leak() {
        long long now = time(0);
        long long elapsed = now - lastLeak;

        // Number of requests to leak based on time elapsed
        int toRemove = elapsed * leakRate;

        while (toRemove-- > 0 && !q.empty()) {
            q.pop();
        }

        lastLeak = now;
    }
};

Complexity Analysis

Time: O(M) per request, where M = expired requests to remove
Space: O(N) where N = requests in queue

Pros

Smooths out traffic bursts into steady stream
Predictable output rate
Good for network traffic shaping

Cons

Requires queue/buffer storage
Higher memory usage (O(N))
Requests may experience variable latency

Real-World Usage

Network QoS (Quality of Service)
Video streaming rate control
Traffic shaping in routers

Fixed Window Counter

This algorithm divides time into equal intervals where each interval has a maximum number of requests it can serve which is tracked using a counter. This counter resets every time a new interval begins.

graph TB
    subgraph "Fixed Window Counter"
        A[Incoming Request] --> B{Check Current Time}
        B --> C{Window Expired?}
        C -->|Yes| D[Reset Counter to 0]
        C -->|No| E[Keep Current Counter]
        D --> F[Update Window Start Time]
        E --> G{Counter < Limit?}
        F --> G
        G -->|Yes| H[Increment Counter]
        G -->|No| I[Reject Request]
        H --> J[Accept Request]
    end

The Burst Problem

The major issue with fixed window counters is the 2x burst at window boundaries:

graph LR
    subgraph "Burst Problem Example"
        W1["Window 1
10:00:00-10:00:59
100 req at 10:00:30-59"]
        W2["Window 2
10:01:00-10:01:59
100 req at 10:01:00-29"]
        W1 --> W2
        Problem["⚠️ Result: 200 requests
in 60 seconds
(10:00:30-10:01:29)
2x the limit!"]
    end

Implementation

sequenceDiagram
    participant Client
    participant RateLimiter
    participant Window as Window State

    Note over Window: Window 1: 10:00:00-10:00:59
Counter=0, Limit=100

    Client->>RateLimiter: Request 1 (t=10:00:05)
    RateLimiter->>Window: Get current time
    Window-->>RateLimiter: 10:00:05
    RateLimiter->>Window: Check if window expired
    Window-->>RateLimiter: No (still in window)
    RateLimiter->>Window: Check counter (0 < 100?)
    Window-->>RateLimiter: Yes
    RateLimiter->>Window: Increment counter (0→1)
    RateLimiter-->>Client: 200 OK (Request Accepted)

    Note over Client,Window: ... more requests ...

    Client->>RateLimiter: Request 100 (t=10:00:55)
    RateLimiter->>Window: Check counter (99 < 100?)
    Window-->>RateLimiter: Yes
    RateLimiter->>Window: Increment counter (99→100)
    RateLimiter-->>Client: 200 OK (Request Accepted)

    Client->>RateLimiter: Request 101 (t=10:00:58)
    RateLimiter->>Window: Check if window expired
    Window-->>RateLimiter: No
    RateLimiter->>Window: Check counter (100 < 100?)
    Window-->>RateLimiter: No
    RateLimiter-->>Client: 429 Too Many Requests (REJECTED)

    Note over Window: Window expires at 10:01:00

    Client->>RateLimiter: Request 102 (t=10:01:02)
    RateLimiter->>Window: Get current time
    Window-->>RateLimiter: 10:01:02
    RateLimiter->>Window: Check if window expired
    Window-->>RateLimiter: Yes (10:01:02 >= 10:01:00)
    RateLimiter->>Window: Reset counter to 0
    RateLimiter->>Window: Update window start (10:01:02)

    Note over Window: Window 2: 10:01:02-10:02:01
Counter=0

    RateLimiter->>Window: Check counter (0 < 100?)
    Window-->>RateLimiter: Yes
    RateLimiter->>Window: Increment counter (0→1)
    RateLimiter-->>Client: 200 OK (Request Accepted)

Algorithm in C++

class FixedWindowCounter {
    int count;
    long long windowStart;
    int limit;
    int windowSize;  // seconds

public:
    bool allowRequest() {
        long long now = time(0);

        // Check if window expired
        if (now >= windowStart + windowSize) {
            count = 0;
            windowStart = now;
        }

        // Check limit
        if (count >= limit) {
            return false;
        }

        count++;
        return true;
    }
};

Complexity Analysis

Time: O(1) per request
Space: O(1) - only stores counter, window start, limit

Pros

Extremely simple to implement
Memory efficient (O(1))
Easy to understand and debug

Cons

Major flaw: Allows 2x burst at window boundaries
Not suitable for strict SLAs
Doesn’t provide smooth rate limiting

Real-World Usage

Simple analytics and metrics counting
Non-critical rate limiting
Basic logging systems where precision isn’t critical

Sliding Window Log

A sliding window log tracks every request timestamp inside the current time window. On each incoming request, it removes all timestamps older than now - window_size and checks whether the remaining count exceeds the limit. It’s accurate but memory-heavy.

graph TB
    subgraph "Sliding Window Log Architecture"
        A[Incoming Request] --> B[Get Current Timestamp]
        B --> C[Cleanup Phase]
        C --> D{Remove timestamps
older than
now - window_size}
        D -->|Has old entries| E[Pop from front of log]
        E --> D
        D -->|No old entries| F{Log size < Limit?}
        F -->|Yes| G[Add current timestamp
to log]
        F -->|No| H[Reject Request]
        G --> I[Accept Request]
    end

Implementation

sequenceDiagram
    participant Client
    participant RateLimiter
    participant Log as Timestamp Log
    participant Clock

    Note over Log: Initially empty
Limit: 3 req/10sec

    Client->>RateLimiter: Request 1 (t=0s)
    RateLimiter->>Clock: Get current time
    Clock-->>RateLimiter: t = 0
    RateLimiter->>Log: Remove timestamps < (0 - 10) = -10
    Log-->>RateLimiter: None to remove
    RateLimiter->>Log: Count entries
    Log-->>RateLimiter: 0 entries
    RateLimiter->>Log: Check (0 < 3?)
    Log-->>RateLimiter: Yes
    RateLimiter->>Log: Add timestamp: 0
    Note over Log: [0]
    RateLimiter-->>Client: 200 OK ✓

    Client->>RateLimiter: Request 2 (t=2s)
    RateLimiter->>Clock: Get current time
    Clock-->>RateLimiter: t = 2
    RateLimiter->>Log: Remove timestamps < (2 - 10) = -8
    Log-->>RateLimiter: None to remove
    RateLimiter->>Log: Count entries
    Log-->>RateLimiter: 1 entry
    RateLimiter->>Log: Check (1 < 3?)
    Log-->>RateLimiter: Yes
    RateLimiter->>Log: Add timestamp: 2
    Note over Log: [0, 2]
    RateLimiter-->>Client: 200 OK ✓

    Client->>RateLimiter: Request 3 (t=5s)
    RateLimiter->>Clock: Get current time
    Clock-->>RateLimiter: t = 5
    RateLimiter->>Log: Remove timestamps < (5 - 10) = -5
    Log-->>RateLimiter: None to remove
    RateLimiter->>Log: Count entries
    Log-->>RateLimiter: 2 entries
    RateLimiter->>Log: Check (2 < 3?)
    Log-->>RateLimiter: Yes
    RateLimiter->>Log: Add timestamp: 5
    Note over Log: [0, 2, 5]
    RateLimiter-->>Client: 200 OK ✓

    Client->>RateLimiter: Request 4 (t=7s)
    RateLimiter->>Clock: Get current time
    Clock-->>RateLimiter: t = 7
    RateLimiter->>Log: Remove timestamps < (7 - 10) = -3
    Log-->>RateLimiter: None to remove
    RateLimiter->>Log: Count entries
    Log-->>RateLimiter: 3 entries
    RateLimiter->>Log: Check (3 < 3?)
    Log-->>RateLimiter: No
    Note over Log: [0, 2, 5]
(unchanged)
    RateLimiter-->>Client: 429 Too Many Requests ✗

    Note over Client,Log: Window slides forward...

    Client->>RateLimiter: Request 5 (t=11s)
    RateLimiter->>Clock: Get current time
    Clock-->>RateLimiter: t = 11
    RateLimiter->>Log: Remove timestamps < (11 - 10) = 1
    activate Log
    Log->>Log: Check: 0 <= 1? Yes, remove
    Note over Log: [2, 5]
    deactivate Log
    Log-->>RateLimiter: Removed 1 entry
    RateLimiter->>Log: Count entries
    Log-->>RateLimiter: 2 entries
    RateLimiter->>Log: Check (2 < 3?)
    Log-->>RateLimiter: Yes
    RateLimiter->>Log: Add timestamp: 11
    Note over Log: [2, 5, 11]
    RateLimiter-->>Client: 200 OK ✓

    Client->>RateLimiter: Request 6 (t=13s)
    RateLimiter->>Clock: Get current time
    Clock-->>RateLimiter: t = 13
    RateLimiter->>Log: Remove timestamps < (13 - 10) = 3
    activate Log
    Log->>Log: Check: 2 <= 3? Yes, remove
    Note over Log: [5, 11]
    deactivate Log
    Log-->>RateLimiter: Removed 1 entry
    RateLimiter->>Log: Count entries
    Log-->>RateLimiter: 2 entries
    RateLimiter->>Log: Check (2 < 3?)
    Log-->>RateLimiter: Yes
    RateLimiter->>Log: Add timestamp: 13
    Note over Log: [5, 11, 13]
    RateLimiter-->>Client: 200 OK ✓

Algorithm in C++

class SlidingWindowLog {
    list<long long> log;  // timestamps
    int limit;
    int windowSize;  // seconds

public:
    SlidingWindowLog(int lim, int window) : limit(lim), windowSize(window) {}

    bool allowRequest() {
        long long now = time(0);

        // Remove old timestamps outside window
        while (!log.empty() && log.front() <= now - windowSize) {
            log.pop_front();
        }

        // Check if limit exceeded
        if (log.size() >= limit) {
            return false;  // Reject
        }

        // Add current timestamp
        log.push_back(now);
        return true;  // Accept
    }
};

Complexity Analysis

Time:

O(K) per request, where K = number of expired timestamps to remove

(amortized close to O(1) for evenly spaced traffic)
Space:

O(N), storing one timestamp per request in the window

Pros

Very accurate rate limiting
No burst problems like fixed windows
Easy to reason about
Handles irregular traffic patterns well

Cons

Needs memory proportional to all requests in the window
Slower at high QPS
Not ideal for large-scale distributed systems

Real-World Usage

Smaller services needing precise control
Security-sensitive endpoints
API endpoints with low-to-medium traffic where accuracy matters more than memory