Framework Deep DivesFramework Series #4

Rate Limiting at Serverless Scale: Tiered Throttling with DynamoDB

How we built a multi-tier rate limiting system with three algorithms (fixed window, sliding window, token bucket), geographic rules, whitelist/blacklist support, distributed coordination across Lambda instances, and configurable failure modes.

June 10, 2026· 13 min read

TCTF Editorials

TCTF Newsletter

3Algorithms

3Limit Types

YesGeographic Rules

YesWhitelist/Blacklist

YesDistributed Mode

Open/ClosedFailure Modes

Every public API needs rate limiting. Without it, one misbehaving client can exhaust your entire capacity. In serverless environments, this problem is harder than it first appears. Lambda functions are stateless — there is no shared counter sitting in process memory. Multiple instances serve requests concurrently, each unaware of the others. The rate limiter itself must remain fast enough that it does not become the bottleneck. At TCTF, we built a rate limiting service on DynamoDB that supports three algorithms, geographic rules, whitelist and blacklist management, distributed coordination across Lambda instances, and configurable failure modes. Here is how each piece works.

01Why Rate Limiting Is Hard in Serverless

On a traditional server, rate limiting is straightforward: increment a counter in memory, check the limit, allow or deny. Every request for a given user hits the same process.

Serverless breaks that model. A single user's requests might be handled by ten separate Lambda instances within seconds. Each instance has its own memory space. If each one tracks its own counter, the user effectively gets ten times the intended limit because no single instance sees the full picture.

The counter must live externally — in DynamoDB or Redis — so that every instance reads from and writes to the same state. But external storage introduces latency. A DynamoDB read-increment-write cycle takes 5 to 10 milliseconds. Multiply that by every inbound request and the overhead becomes significant.

Our solution uses a layered approach: DynamoDB for durable storage, configurable algorithms that let you trade precision for speed, and a distributed mode that coordinates across instances without requiring a database round-trip on every single request.

⚡
In serverless, rate limit counters must be external. The challenge is keeping them fast enough that the limiter does not become the bottleneck it was designed to prevent.

02Three Algorithms: Fixed Window, Sliding Window, Token Bucket

The service supports three algorithms. Each solves a different problem.

Fixed window divides time into intervals (for example, 60-second windows). It counts requests in the current interval and resets when the window expires. This is the simplest to implement and the cheapest to store — one counter per window. The downside is boundary spikes: a user can issue the maximum number of requests at the end of one window and again at the start of the next, briefly doubling their effective rate. For most endpoints, this is acceptable.

Sliding window eliminates that boundary problem. Instead of fixed intervals, it tracks individual request timestamps and counts how many fall within a rolling window ending at the current moment. The result is more accurate enforcement, but it requires storing more data per user and doing more computation per check.

Token bucket is designed for bursty traffic. The bucket starts full (say, 100 tokens). Each request consumes one token. Tokens refill at a constant rate (say, 10 per second). When the bucket empties, requests are denied until tokens accumulate. This allows short bursts up to the bucket capacity while enforcing a steady long-term average.

We assign algorithms per action. Authentication endpoints use fixed window for low overhead. General API endpoints use sliding window for precision. File upload endpoints use token bucket so users can burst-upload several files without hitting a wall.

// Rate limit configuration per action
const rateLimitConfig: RateLimitConfig = {
  // Fixed window: simple counter, resets every interval
  'auth/login': {
    algorithm: 'fixed-window',
    limit: 5,
    windowSizeMs: 60 * 60 * 1000, // 1 hour
    blockDurationSec: 3600,
  },

  // Sliding window: rolling timestamp tracking
  'api/users': {
    algorithm: 'sliding-window',
    limit: 100,
    windowSizeMs: 60 * 1000, // 1 minute
  },

  // Token bucket: allows bursts, steady refill
  'files/upload': {
    algorithm: 'token-bucket',
    bucketCapacity: 50,
    refillRate: 5, // tokens per second
    limitType: 'payload-size',
    maxPayloadMb: 200,
  },
};

🔧
Fixed window for simplicity. Sliding window for accuracy. Token bucket for burst tolerance. Each action gets the algorithm that matches its traffic pattern.

03Three Limit Types: Requests, Bandwidth, Payload Size

Not every rate limit counts requests. Some count bytes.

Request count is the default. It limits how many requests a user can make in a time window — 100 requests per minute, 5 login attempts per hour.

Bandwidth limiting counts the total data transferred. A user might be allowed 10 MB of response data per minute. Each request consumes bandwidth proportional to its response size. This prevents a single user from saturating your network with many large responses.

Payload size limiting counts upload volume. A user might be allowed 50 MB of uploads per hour. Each upload costs tokens proportional to its file size. This prevents storage abuse without restricting the number of small requests.

The limit type is configured per action. Internally, a calculateTokensNeeded method determines how many tokens each request costs. For request-count limits, every request costs 1. For bandwidth limits, the cost equals the response size in bytes. For payload limits, the cost equals the upload size.

04Geographic Rules: Different Limits for Different Regions

Not all traffic carries the same risk profile. A country with a large, established user base warrants generous limits. A region with no registered users suddenly generating a burst of signup attempts deserves scrutiny.

The service supports geographic rules — per-country overrides that adjust the base rate limit up or down. Country codes arrive via the CloudFront-Viewer-Country header, which CloudFront populates at the edge before forwarding the request.

Geographic rules live in DynamoDB alongside the rate limit configuration. Because they are data rather than code, the operations team can tighten or relax limits for specific countries in real time without a deployment.

When a request arrives, the applyGeographicRules method checks whether the request's country matches any override. If it does, that override replaces the base limit. If no match exists, the base limit applies unchanged. A single API endpoint can therefore have different effective thresholds in different countries — stricter where risk is high, more generous where users are established.

🌍
Geographic rules let you adjust limits by country without redeploying. Tighten limits in high-risk regions and relax them where your users are.

05Whitelist, Blacklist, and Access Control

Some identifiers should bypass rate limits entirely. Internal service accounts, load testing clients, and trusted partners need unrestricted access. Whitelisted identifiers skip all rate limit checks.

Other identifiers should be blocked outright. Known attack IPs, compromised accounts, and confirmed abuse sources should never receive a successful response. Blacklisted identifiers are immediately denied with a 429 status, regardless of their actual request count.

Both lists are managed through the rate limiting API:

// Add a trusted partner to the whitelist
await rateLimiter.addToWhitelist({
  action: 'api/users',
  identifier: 'partner-service-account-xyz',
  reason: 'Trusted integration partner',
});

// Block a known attack source
await rateLimiter.addToBlacklist({
  action: 'auth/login',
  identifier: '203.0.113.42',
  reason: 'Brute-force attempt detected',
  expiresAt: Date.now() + 24 * 60 * 60 * 1000, // 24-hour block
});

// Bulk operations for incident response
await rateLimiter.bulkAddToBlacklist({
  action: 'auth/login',
  identifiers: suspiciousIps,
  reason: 'Coordinated attack — incident #4821',
});

// Check current status of any identifier
const status = await rateLimiter.checkAccess({
  action: 'api/users',
  identifier: 'user-abc-123',
});
// Returns: { status: 'allowed' | 'whitelisted' | 'blacklisted' }

Lists are stored persistently and cached in memory for fast lookups. Cache invalidation happens automatically when lists change. Lists are scoped per action — an identifier can be whitelisted for one endpoint and blacklisted for another.

06Distributed Mode: Coordinating Across Lambda Instances

Standard rate limiting algorithms assume all requests pass through a single counter. In serverless, requests are spread across many Lambda instances. Each one increments the counter independently, and DynamoDB's eventual consistency means two instances might read the same value before either writes an update.

For most endpoints, this slight over-counting is acceptable. If the limit is 100 and two instances each see 99, allowing one more apiece yields 101 total — close enough.

When precision matters — billing-related limits, security-critical endpoints — the distributed mode provides stronger coordination. Each Lambda instance maintains a local counter tagged with a unique instance ID. Periodically, these local counters are aggregated into a global count via the getGlobalCount method.

This design reduces DynamoDB writes because local counters batch before flushing. It maintains accurate global counts at the cost of slightly delayed enforcement — a burst might briefly exceed the limit before the global aggregation catches up. For security-critical actions, the sync interval is configured to 1-2 seconds to keep that window small.

📊
Distributed mode: each Lambda instance tracks locally, aggregates globally. Fewer DynamoDB writes, accurate totals, configurable sync delay.

07Failure Modes and Blocking

What happens when DynamoDB is unreachable? The counter cannot be read or written. Should the request proceed or be denied?

That depends on the endpoint. For a public API that serves content, failing open (allowing the request) is the safer choice — a brief period without rate limiting is better than a full outage. For a login endpoint, failing closed (denying the request) is safer — temporary denied logins are preferable to unlimited brute-force attempts.

Each action configures its own failure mode: open or closed. The default is closed because security is the more common priority. Services that value availability over strictness set their failure mode to open.

When a rate limit is exceeded, the service can optionally block the identifier for a configurable duration (blockDurationSec). Consider a login endpoint: after 5 failed attempts, the offending IP is blocked for one hour. The block is stored in DynamoDB with a TTL. Subsequent requests from that identifier are immediately denied without rechecking the counter.

Blocking also addresses slow-drip attacks — where an attacker carefully spaces requests to stay just under the limit. Once the threshold is crossed, the block enforces a meaningful cooldown before any further attempts are accepted.

🛡
️ Fail open for content APIs. Fail closed for auth endpoints. Configurable per action. Time-based blocking with TTL prevents sustained low-rate attacks.

Rate limiting works best when users never notice it. Their requests flow through without delay, and the system absorbs abuse silently. At TCTF, this single service protects every endpoint across 34 microservices with consistent behavior: three algorithms, geographic awareness, access control lists, distributed coordination, and configurable failure modes. One configuration surface, predictable protection everywhere.

Editor's Note: This is Framework Series #4 in the TCTF Newsletter. Next in the series: Caching Strategies for Serverless, Part 1 — from in-memory to DynamoDB-backed TTL caches.

Never miss an edition

Subscribe to get TCTF newsletters delivered to your inbox.

PreviousTCTF's DynamoDB Framework, Part 2: Building a Fluent Query Builder in TypeScript

NextCaching Strategies for Serverless: From In-Memory to DynamoDB-Backed TTL Caches

More From TCTF Newsletter

Vol. 1, Issue 4

Built to Last: Why Sustained Collaboration Is the Future of Tech Teams

Most platforms optimize for transactions — post a job, hire, move on. TCTF is built around sustained collaboration: long-term teams, milestone-driven projects, language support that breaks barriers, and a community where everyone — not just developers — has a seat at the table.

April 15, 2026

Q2 2026

Q2 2026 Roadmap: What's Next for the TCTF Portal

Our quarterly roadmap for Q2 — what shipped in April, the origin of Cometbid Social, and the plan for May and June as we build toward user accounts, authentication, and the social network launch.

April 1, 2026

Tech Series #3

How We Built a Real-Time Messaging System with AWS Lambda and WebSockets

Inside the architecture of TCTF's messaging platform — three services handling real-time chat, campaign delivery, and transactional notifications, all built on Lambda, API Gateway WebSockets, SQS, and multi-provider email with automatic failover.

March 15, 2026

Browse by Month

2026

June

May

April

March

February

January

Account

Rate Limiting at Serverless Scale: Tiered Throttling with DynamoDB

01Why Rate Limiting Is Hard in Serverless

02Three Algorithms: Fixed Window, Sliding Window, Token Bucket

03Three Limit Types: Requests, Bandwidth, Payload Size

04Geographic Rules: Different Limits for Different Regions

05Whitelist, Blacklist, and Access Control

06Distributed Mode: Coordinating Across Lambda Instances

07Failure Modes and Blocking

More From TCTF Newsletter

Built to Last: Why Sustained Collaboration Is the Future of Tech Teams

Q2 2026 Roadmap: What's Next for the TCTF Portal

How We Built a Real-Time Messaging System with AWS Lambda and WebSockets

Browse by Month

2026

The Cometbid
Technology Foundation

Our Community

Learn

Legal

More

Subscribe to our Newsletter

Rate Limiting at Serverless Scale: Tiered Throttling with DynamoDB

01Why Rate Limiting Is Hard in Serverless

02Three Algorithms: Fixed Window, Sliding Window, Token Bucket

03Three Limit Types: Requests, Bandwidth, Payload Size

04Geographic Rules: Different Limits for Different Regions

05Whitelist, Blacklist, and Access Control

06Distributed Mode: Coordinating Across Lambda Instances

07Failure Modes and Blocking

More From TCTF Newsletter

Built to Last: Why Sustained Collaboration Is the Future of Tech Teams

Q2 2026 Roadmap: What's Next for the TCTF Portal

How We Built a Real-Time Messaging System with AWS Lambda and WebSockets

Browse by Month

2026

Account

Rate Limiting at Serverless Scale: Tiered Throttling with DynamoDB

01Why Rate Limiting Is Hard in Serverless

02Three Algorithms: Fixed Window, Sliding Window, Token Bucket

03Three Limit Types: Requests, Bandwidth, Payload Size

04Geographic Rules: Different Limits for Different Regions

05Whitelist, Blacklist, and Access Control

06Distributed Mode: Coordinating Across Lambda Instances

07Failure Modes and Blocking

More From TCTF Newsletter

Built to Last: Why Sustained Collaboration Is the Future of Tech Teams

Q2 2026 Roadmap: What's Next for the TCTF Portal

How We Built a Real-Time Messaging System with AWS Lambda and WebSockets

Browse by Month

2026

The Cometbid Technology Foundation

Follow Us

Our Community

Learn

Legal

More

Subscribe to our Newsletter

Rate Limiting at Serverless Scale: Tiered Throttling with DynamoDB

01Why Rate Limiting Is Hard in Serverless

02Three Algorithms: Fixed Window, Sliding Window, Token Bucket

03Three Limit Types: Requests, Bandwidth, Payload Size

04Geographic Rules: Different Limits for Different Regions

05Whitelist, Blacklist, and Access Control

06Distributed Mode: Coordinating Across Lambda Instances

07Failure Modes and Blocking

More From TCTF Newsletter

Built to Last: Why Sustained Collaboration Is the Future of Tech Teams

Q2 2026 Roadmap: What's Next for the TCTF Portal

How We Built a Real-Time Messaging System with AWS Lambda and WebSockets

Browse by Month

2026

The Cometbid
Technology Foundation