logo
▼
Projects
Collaborations
Resources
Our Partners
Our Community
Projects
Collaborations
Resources
Our Partners
Our Community
Account
Sign InJoin UsHelp & Support

The Cometbid
Technology Foundation

Empowering innovation through open-source collaboration. TCTF supports developers, organizations, and communities worldwide in building the future of technology with transparent, vendor-neutral governance and world-class open-source projects.


Follow Us

Our Community

  • About Us
  • Upcoming Events
  • Projects
  • Collaborations
  • Membership
  • TCTF Training
  • Corporate Sponsorship

Learn

  • FAQ
  • TCTF Incubator Programs
  • Brand Guidelines
  • Logo Specifications

Legal

  • Privacy Policy
  • Terms of Use
  • Compliance
  • Code of Conduct
  • Contribution Guidelines
  • Legal & Trademark
  • Manage Cookies

More

  • Report a Vulnerability
  • Report Bugs
  • Mailing Lists
  • Contact Us
  • Support
  • Support Tickets
  • TCTF Social Network

Subscribe to our Newsletter

Rate Limiting at Serverless Scale: Tiered Throttling with DynamoDB
Framework Deep DivesFramework Series #4

Rate Limiting at Serverless Scale: Tiered Throttling with DynamoDB

How we built a multi-tier rate limiting system with three algorithms (fixed window, sliding window, token bucket), geographic rules, whitelist/blacklist support, distributed coordination across Lambda instances, and configurable failure modes.

June 10, 2026· 13 min read
TCTF Editorials
TCTF Newsletter
Home›Newsletter›Rate Limiting at Serverless Scale: Tiered Throt...

In This Edition

  • Why Rate Limiting Is Hard in Serverless
  • Three Algorithms: Fixed Window, Sliding Window, Token Bucket
  • Three Limit Types: Requests, Bandwidth, Payload Size
  • Geographic Rules: Different Limits for Different Regions
  • Whitelist, Blacklist, and Access Control
  • Distributed Mode: Coordinating Across Lambda Instances
  • Failure Modes and Blocking
3Algorithms
3Limit Types
YesGeographic Rules
YesWhitelist/Blacklist
YesDistributed Mode
Open/ClosedFailure Modes

Every public API needs rate limiting. Without it, a single user can consume all your capacity — whether intentionally (an attack) or accidentally (a buggy client in a retry loop). In serverless, rate limiting is harder than it sounds. Lambda functions are stateless. There is no shared counter in memory. Multiple instances handle requests simultaneously. And the rate limiter itself must not become a bottleneck. At TCTF, we built a rate limiting service that handles per-user, per-endpoint, and per-IP throttling with three algorithms, geographic rules that adjust limits by country, whitelist and blacklist support, distributed coordination across Lambda instances, and configurable failure modes. This article explains how it works.

01Why Rate Limiting Is Hard in Serverless

In a traditional server, rate limiting is a counter in memory. Request comes in, increment the counter, check the limit, allow or deny. The counter lives in the process. Every request hits the same counter. Simple.

In serverless, there is no shared process. A user's requests might hit 10 different Lambda instances in 10 seconds. Each instance has its own memory. If each instance keeps its own counter, the user gets 10x the intended limit — each instance thinks it has seen only one request.

The counter must be external — stored in DynamoDB or Redis where every instance can read and write it atomically. But external counters add latency. A DynamoDB read-increment-write cycle takes 5-10ms. If rate limiting adds 10ms to every request, it becomes a significant overhead.

TCTF's rate limiting service solves this with a layered approach: DynamoDB for durable counter storage, configurable algorithms that trade accuracy for performance, and a distributed mode that coordinates across instances without requiring every instance to hit the database on every request.

⚡

In serverless, rate limit counters must be external. The challenge: make them fast enough that rate limiting does not become the bottleneck it is supposed to prevent.

Three Algorithms: Fixed Window, Sliding Window, Token Bucket

02Three Algorithms: Fixed Window, Sliding Window, Token Bucket

The rate limiting service supports three algorithms, each with different trade-offs.

Fixed window is the simplest. Divide time into fixed intervals (e.g., 60-second windows). Count requests in the current window. If the count exceeds the limit, deny. When the window expires, the counter resets. The trade-off: at window boundaries, a user can make 2x the limit — max requests at the end of one window and max requests at the start of the next. For most use cases, this is acceptable.

Sliding window is more accurate. Instead of fixed intervals, it tracks individual request timestamps and counts requests within a rolling window. A request at time T checks how many requests occurred between T minus the window size and T. No boundary problem. The trade-off: it stores more data (individual timestamps instead of a single counter) and requires more computation per check.

Token bucket allows bursts. The bucket starts full (e.g., 100 tokens). Each request consumes a token. Tokens refill at a constant rate (e.g., 10 per second). If the bucket is empty, the request is denied. This allows short bursts of traffic (up to the bucket capacity) while enforcing a long-term average rate. The trade-off: more complex state management (token count, last refill time).

The algorithm is configured per action. Authentication endpoints use fixed window (simple, low overhead). API endpoints use sliding window (accurate, prevents boundary abuse). File upload endpoints use token bucket (allows burst uploads while limiting sustained throughput).

🔧

Fixed window for simplicity. Sliding window for accuracy. Token bucket for burst tolerance. Each action gets the algorithm that fits its traffic pattern.

03Three Limit Types: Requests, Bandwidth, Payload Size

Not all rate limits count requests. Some count bytes.

Request count is the default — limit the number of requests per time window. This is what most people think of when they hear rate limiting. 100 requests per minute. 5 login attempts per hour.

Bandwidth limiting counts the total data transferred. A user might be allowed 10MB of bandwidth per minute. Each request consumes bandwidth proportional to its response size. This prevents a single user from saturating the network by making many large requests.

Payload size limiting counts the total upload size. A user might be allowed 50MB of uploads per hour. Each upload consumes tokens proportional to its file size. This prevents storage abuse without limiting the number of small requests.

The limit type is configured per action. The calculateTokensNeeded method determines how many tokens each request consumes based on the type. For request count, every request costs 1 token. For bandwidth, the cost is proportional to the response size. For payload size, the cost is proportional to the upload size.

04Geographic Rules: Different Limits for Different Regions

Not all traffic is equal. A country with a large, active user base should get generous rate limits. A country with no registered users generating a burst of signup attempts is suspicious.

The rate limiting service supports geographic rules — per-country and per-region overrides that adjust the base rate limit. The country code comes from the CloudFront-Viewer-Country header (set by CloudFront at the edge) or from the geolocation service.

Geographic rules are stored in DynamoDB alongside the rate limit configuration. They can be updated without redeploying any service — the operations team can respond to emerging threats by tightening limits for specific countries in real time.

The applyGeographicRules method checks if the request's country matches any geographic rule. If it does, the rule's limit overrides the base limit. If no rule matches, the base limit applies. This means a global API endpoint can have different effective limits in different countries — stricter in high-risk regions, more generous in regions with established user bases.

🌍

Geographic rules adjust rate limits by country — stricter in high-risk regions, more generous where users are established. Updated in real time via DynamoDB, no redeployment needed.

05Whitelist, Blacklist, and Access Control

Some identifiers should bypass rate limits entirely. Internal service accounts, load testing tools, and trusted partners need unrestricted access. The whitelist handles this — whitelisted identifiers skip all rate limit checks.

Some identifiers should be blocked entirely. Known attack IPs, compromised accounts, and abuse sources should never get through. The blacklist handles this — blacklisted identifiers are immediately denied with a rate limit error, regardless of their actual request count.

Both lists are managed through the rate limiting service API: addToWhitelist, removeFromWhitelist, addToBlacklist, removeFromBlacklist. Bulk operations (bulkAddToWhitelist, bulkAddToBlacklist) handle mass updates efficiently. The checkAccess method returns the current status of any identifier: whitelisted, blacklisted, or allowed.

Whitelists and blacklists are stored persistently via the config manager and cached in memory for fast lookups. Cache invalidation happens automatically when lists are modified. The lists are per-action — an identifier can be whitelisted for one action and blacklisted for another.

06Distributed Mode: Coordinating Across Lambda Instances

The standard rate limiting algorithms work well when all requests flow through a single counter. But in serverless, requests are distributed across many Lambda instances. Each instance increments the counter independently, and DynamoDB's eventual consistency means two instances might read the same counter value before either writes the increment.

For most use cases, this slight over-counting is acceptable. If the limit is 100 requests per minute and two instances each see 99, allowing one more each gives 101 — close enough.

For use cases where accuracy matters — billing-related limits, security-critical endpoints — the distributed mode provides stronger coordination. Each Lambda instance maintains a local counter identified by a unique instance ID. Periodically, the local counters are aggregated into a global count. The getGlobalCount method sums all instance counters within the time window.

This approach reduces DynamoDB writes (local counters batch before writing) while maintaining accurate global counts. The trade-off is slightly delayed enforcement — a burst might briefly exceed the limit before the global count catches up. For security-critical limits, the delay is configured to be very short (1-2 seconds).

📊

Distributed mode: each Lambda instance tracks locally, aggregates globally. Reduces DynamoDB writes while maintaining accurate counts. Configurable sync delay for security-critical limits.

07Failure Modes and Blocking

What happens when the rate limiting storage fails? DynamoDB is down. The counter cannot be read or written. Do you allow the request or deny it?

The answer depends on the endpoint. For a public API that serves content, failing open (allowing the request) is safer — a brief period without rate limiting is better than a complete outage. For a login endpoint, failing closed (denying the request) is safer — a brief period of denied logins is better than unlimited brute-force attempts.

The failure mode is configured per action: open or closed. The default is closed (deny on failure) because security is the more common concern. Services that prefer availability over security can set failureMode to open.

When a rate limit is exceeded, the service can optionally block the identifier for a configurable duration (blockDurationSec). This is useful for login endpoints — after 5 failed attempts, block the IP for 1 hour. The block is stored in DynamoDB with a TTL. Subsequent requests from the blocked identifier are immediately denied without checking the counter.

The blocking mechanism prevents slow-drip attacks where an attacker stays just under the rate limit by spacing requests. Once the limit is hit, the block ensures a meaningful cooldown period before the attacker can try again.

🛡

️ Fail open for content APIs (availability over security). Fail closed for auth endpoints (security over availability). Configurable per action. Blocking with TTL prevents slow-drip attacks.

Article closing illustration

Rate limiting is the invisible guardian of every API. Users never see it when it works — their requests flow through without delay. They only notice it when it activates — a 429 response with a Retry-After header telling them to slow down. At TCTF, the rate limiting service protects every endpoint across all 34 microservices with the same consistent behavior: three algorithms for different traffic patterns, geographic rules for regional intelligence, whitelist and blacklist for access control, distributed coordination for accuracy, and configurable failure modes for the right balance of security and availability. One service, one configuration, consistent protection everywhere.

Editor's Note: This is Framework Series #4 in the TCTF Newsletter. Next in the series: Caching Strategies for Serverless, Part 1 — from in-memory to DynamoDB-backed TTL caches.

Never miss an edition

Subscribe to get TCTF newsletters delivered to your inbox.

Subscribe
PreviousTCTF's DynamoDB Framework, Part 2: Building a Fluent Query Builder in TypeScript
NextCaching Strategies for Serverless: From In-Memory to DynamoDB-Backed TTL Caches

In This Edition

  • Why Rate Limiting Is Hard in Serverless
  • Three Algorithms: Fixed Window, Sliding Window, Token Bucket
  • Three Limit Types: Requests, Bandwidth, Payload Size
  • Geographic Rules: Different Limits for Different Regions
  • Whitelist, Blacklist, and Access Control
  • Distributed Mode: Coordinating Across Lambda Instances
  • Failure Modes and Blocking

Browse by Month

May
  • The Struggles of Timelines and Schedules: When Building Gets Real
  • How to Stay Motivated in the Face of Uncertainties: Faith Beyond Doubt
  • Cognito Middleware: Building an Authentication Pipeline for Serverless APIs
  • Building TCTF's DynamoDB Query Framework, Part 1: Single-Table Design Patterns
April
  • Built to Last: Why Sustained Collaboration Is the Future of Tech Teams
  • Q2 2026 Roadmap: What's Next for the TCTF Portal
March
  • How We Built a Real-Time Messaging System with AWS Lambda and WebSockets
  • From Forum to Social Network: The Origin Story of Cometbid Social
  • Agentic AI: What It Means for Software Development and Why We're Paying Attention
February
  • Platform Update: Social Network Architecture, Achievement Engine, and What's Next
  • How We Built 34 Serverless Microservices: Architecture Patterns Behind the TCTF Platform
January
  • How We Secure the TCTF Platform: Principles Every Developer Should Know
  • New Year, New Projects: TCTF 2026 Roadmap

More From TCTF Newsletter

Built to Last: Why Sustained Collaboration Is the Future of Tech TeamsVol. 1, Issue 4

Built to Last: Why Sustained Collaboration Is the Future of Tech Teams

Most platforms optimize for transactions — post a job, hire, move on. TCTF is built around sustained collaboration: long-term teams, milestone-driven projects, language support that breaks barriers, and a community where everyone — not just developers — has a seat at the table.

April 15, 2026
Q2 2026 Roadmap: What's Next for the TCTF PortalQ2 2026

Q2 2026 Roadmap: What's Next for the TCTF Portal

Our quarterly roadmap for Q2 — what shipped in April, the origin of Cometbid Social, and the plan for May and June as we build toward user accounts, authentication, and the social network launch.

April 1, 2026
How We Built a Real-Time Messaging System with AWS Lambda and WebSocketsTech Series #3

How We Built a Real-Time Messaging System with AWS Lambda and WebSockets

Inside the architecture of TCTF's messaging platform — three services handling real-time chat, campaign delivery, and transactional notifications, all built on Lambda, API Gateway WebSockets, SQS, and multi-provider email with automatic failover.

March 15, 2026

Browse by Month

2026

May
  • The Struggles of Timelines and Schedules: When Building Gets Real
  • How to Stay Motivated in the Face of Uncertainties: Faith Beyond Doubt
  • Cognito Middleware: Building an Authentication Pipeline for Serverless APIs
  • Building TCTF's DynamoDB Query Framework, Part 1: Single-Table Design Patterns
April
  • Built to Last: Why Sustained Collaboration Is the Future of Tech Teams
  • Q2 2026 Roadmap: What's Next for the TCTF Portal
March
  • How We Built a Real-Time Messaging System with AWS Lambda and WebSockets
  • From Forum to Social Network: The Origin Story of Cometbid Social
  • Agentic AI: What It Means for Software Development and Why We're Paying Attention
February
  • Platform Update: Social Network Architecture, Achievement Engine, and What's Next
  • How We Built 34 Serverless Microservices: Architecture Patterns Behind the TCTF Platform
January
  • How We Secure the TCTF Platform: Principles Every Developer Should Know
  • New Year, New Projects: TCTF 2026 Roadmap