logo
▼
Projects
Collaborations
Resources
Our Partners
Our Community
Projects
Collaborations
Resources
Our Partners
Our Community
Account
Sign InJoin UsHelp & Support

The Cometbid
Technology Foundation

Empowering innovation through open-source collaboration. TCTF supports developers, organizations, and communities worldwide in building the future of technology with transparent, vendor-neutral governance and world-class open-source projects.


Follow Us

Our Community

  • About Us
  • Upcoming Events
  • Projects
  • Collaborations
  • Membership
  • TCTF Training
  • Corporate Sponsorship

Learn

  • FAQ
  • TCTF Incubator Programs
  • Brand Guidelines
  • Logo Specifications

Legal

  • Privacy Policy
  • Terms of Use
  • Compliance
  • Code of Conduct
  • Contribution Guidelines
  • Legal & Trademark
  • Manage Cookies

More

  • Report a Vulnerability
  • Report Bugs
  • Mailing Lists
  • Contact Us
  • Support
  • Support Tickets
  • TCTF Social Network

Subscribe to our Newsletter

Session Management and WebSocket Push in a Serverless World
Framework Deep DivesFramework Series #11

Session Management and WebSocket Push in a Serverless World

Managing user sessions, device trust, and real-time WebSocket connections when your compute layer is stateless — our approach with DynamoDB, Redis, API Gateway WebSockets, and the SessionCoordinator that ties it all together.

August 15, 2026· 13 min read
TCTF Editorials
TCTF Newsletter
Home›Newsletter›Session Management and WebSocket Push in a Serv...

In This Edition

  • The Stateless Session Problem
  • Session Data Model
  • The Session Coordinator: Event-Driven Auth Flow
  • Session CRUD: Creator, Retriever, Updater, Deleter
  • WebSocket Connections: Real-Time in a Stateless World
  • Scaling Considerations
  • Lessons Learned
PluggableSession Cache
5Max Sessions/User
5–10 minLogin Session TTL
5 minMFA Session TTL
24 hrsWebSocket TTL
YesEvent-Driven

Serverless functions are stateless. They spin up, handle a request, and disappear. There is no memory between invocations, no persistent process, no in-memory session store. And yet, a platform like TCTF needs to track who is logged in, which devices they trust, when their sessions expire, and how to push real-time messages to their browser. This article covers two challenges that every serverless platform faces: managing user sessions without a server, and delivering real-time updates without a persistent connection. Our solutions: a pluggable session system with swappable cache providers — DynamoDB-backed caching or Redis (ElastiCache) caching, seamlessly interchangeable behind the same interface — and WebSocket connections managed through API Gateway.

01The Stateless Session Problem

In a traditional server application, sessions live in memory. A user logs in, the server creates a session object, stores it in a hash map, and attaches a session ID to a cookie. Every subsequent request includes the cookie, the server looks up the session, and the user is authenticated. Simple.

In serverless, there is no server to hold that hash map. Lambda functions are ephemeral — they may handle one request and never be invoked again, or they may handle thousands of requests across different instances. There is no shared memory between invocations. There is no guarantee that the same instance handles consecutive requests from the same user.

This means sessions must be stored externally — in a database or cache that every Lambda instance can access. The session lookup must be fast (every authenticated request needs it), reliable (a missed session means a logged-out user), and consistent (two Lambda instances must see the same session state).

At TCTF, we solve this with a pluggable cache architecture. The session system is built against a cache interface, not a specific provider. You can run it with DynamoDB as the cache backend — using TTL-based expiration and single-table design — or swap to Redis (ElastiCache) for sub-millisecond lookups. The switch is a configuration change, not a code change. Both providers implement the same interface, so the session logic does not know or care which one is behind it. In production, we use both: DynamoDB as the durable session store and Redis as the hot cache layer. But a service that does not need Redis can run entirely on DynamoDB caching with no code changes.

⚡

The session cache is pluggable — DynamoDB or Redis, swappable via configuration. Both implement the same interface. In production, we use both: DynamoDB for durability, Redis for speed.

02Session Data Model

A session in TCTF is more than a token. The SessionStorage interface captures everything we need to know about an active session: the user's email, the refresh token for session renewal, a unique session ID, the expiration timestamp, device information (browser, OS, IP), whether the user chose to stay signed in, and whether the device is trusted.

Device trust is a key feature. When a user marks a device as trusted, we store that flag and timestamp on the session. Trusted devices skip certain security checks on subsequent logins — reducing friction for users on their own machines while maintaining full security on unknown devices.

Each user is limited to 5 concurrent sessions. This prevents session accumulation — a common problem where users log in from multiple devices and never explicitly log out. When a user creates a sixth session, the oldest session is automatically evicted using an LRU (Least Recently Used) strategy. The eviction happens atomically within the same DynamoDB transaction that creates the new session.

Session expiration uses DynamoDB TTL. Every session has an expiresAt timestamp. DynamoDB automatically removes expired items — no cleanup jobs, no cron tasks, no Lambda functions scanning for stale sessions. The database handles it.

🔒

Max 5 sessions per user. When a 6th session is created, the oldest is automatically evicted. DynamoDB TTL handles expiration — no cleanup jobs needed.

Session management architecture showing the pluggable cache layer (DynamoDB or Redis, swappable via configuration) and the SessionCoordinator event-driven authentication flow with lifecycle events.
Fig. 1 — Session management architecture showing the pluggable cache layer (DynamoDB or Redis, swappable via configuration) and the SessionCoordinator event-driven authentication flow with lifecycle events.

03The Session Coordinator: Event-Driven Auth Flow

Authentication is not a single step. A user enters their email and password. If MFA is enabled, they need to complete a second factor. If MFA succeeds, the login session is cleaned up and a user session is created. If anything fails, all temporary sessions need to be cleaned up.

The SessionCoordinator orchestrates this flow using an event-driven architecture. It extends Node.js EventEmitter and emits lifecycle events at each stage: LOGIN_STARTED when a login session is created, MFA_REQUIRED when the user needs to complete MFA, MFA_COMPLETED when authentication succeeds, and AUTH_FAILED when something goes wrong.

The coordinator manages parent-child relationships between sessions. A login session is the parent. An MFA session is the child, linked to the parent by loginSessionId. When MFA completes, both sessions are cleaned up atomically. When authentication fails, both sessions are cleaned up in the failure handler.

Login sessions have a 5-10 minute TTL — just long enough for the user to complete the auth flow. MFA sessions have a 5-minute TTL — MFA must complete quickly or the session expires. Both are stored in Redis with TTL, so expired sessions are automatically removed without any cleanup logic.

The event-driven design makes the coordinator extensible. Need to log every login attempt? Subscribe to LOGIN_STARTED. Need to trigger a security alert on failed auth? Subscribe to AUTH_FAILED. Need to update a last-login timestamp? Subscribe to MFA_COMPLETED. New behaviors are added by subscribing to events, not by modifying the coordinator.

📨

The SessionCoordinator emits lifecycle events: LOGIN_STARTED, MFA_REQUIRED, MFA_COMPLETED, AUTH_FAILED. New behaviors are added by subscribing to events, not modifying the coordinator.

04Session CRUD: Creator, Retriever, Updater, Deleter

The UserSessionService is the main interface for session operations. Under the hood, it delegates to four specialized classes: SessionCreator, SessionRetriever, SessionUpdater, and SessionDeleter. Each handles one concern.

SessionCreator validates the session data, calculates the expiration time (longer for stay-signed-in sessions), builds a DynamoDB transaction that creates the new session and evicts the oldest if the user is at the 5-session limit, caches the new session in Redis, and logs success metrics. The entire operation is transactional — if any step fails, nothing is written.

SessionRetriever handles lookups by session ID, by refresh token, and by user (for listing all active sessions). It checks Redis first and falls back to DynamoDB on cache miss. Cache hits are logged as metrics so we can track the cache hit rate.

SessionUpdater handles session renewal (extending expiration), device trust changes, MFA state updates, and optimistic locking via lastUpdatedAt timestamps. Conditional updates prevent race conditions when two Lambda instances try to update the same session simultaneously.

SessionDeleter handles single session deletion (user logs out of one device), bulk deletion (user logs out of all devices), and the LRU eviction during session creation. Deletions are propagated to both DynamoDB and Redis to keep the cache consistent.

The UserSessionService also provides health checks, attempt status tracking for rate limiting login attempts, and MFA setup transaction support — completing MFA setup atomically with session updates.

WebSocket real-time push flow — from backend service through SQS, Lambda, DynamoDB connection lookup (GSI), API Gateway Management API, to the user's browser. Includes connection lifecycle: $connect, $default, $disconnect, and TTL backstop.
Fig. 2 — WebSocket real-time push flow — from backend service through SQS, Lambda, DynamoDB connection lookup (GSI), API Gateway Management API, to the user's browser. Includes connection lifecycle: $connect, $default, $disconnect, and TTL backstop.

05WebSocket Connections: Real-Time in a Stateless World

Sessions handle authentication. But how do you push a real-time notification to a user's browser when there is no persistent server process?

The answer is API Gateway WebSocket APIs. When a user opens Cometbid Social, the frontend establishes a WebSocket connection to API Gateway. API Gateway assigns a connectionId and invokes a Lambda function to handle the connection event. The Lambda function stores the connection metadata in DynamoDB — connectionId, userId, userEmail, connected timestamp, and a 24-hour TTL.

The WebSocketConnectionService manages this connection store. It uses single-table design with PK/SK keys: PK is CONNECTION#{connectionId}, SK is METADATA. A GSI maps users to their connections: GSI1PK is USER#{userId}, GSI1SK is CONNECTION#{connectionId}. This lets us quickly find all active connections for a user.

When a service needs to push a message — a new notification, a chat message, a project update — it queries the GSI to find the user's active connections, then uses the API Gateway Management API to post the message to each connectionId. If a connection is stale (the user closed their browser), the post fails and the connection is cleaned up.

The 24-hour TTL on connections ensures stale connections are eventually cleaned up even if the disconnect event is missed. This is important because WebSocket disconnects are not always reliable — network drops, browser crashes, and mobile app backgrounding can all cause silent disconnects.

🔌

WebSocket connections are stored in DynamoDB with a GSI mapping users to connections. When a service needs to push a message, it queries the GSI and posts to each active connectionId via API Gateway.

06Scaling Considerations

The session system handles scale through the pluggable cache architecture. When Redis is configured, it absorbs the read load — session validation happens on every authenticated request, and Redis serves these at sub-millisecond latency. DynamoDB handles the write load and provides durability. If Redis goes down, the system falls back to DynamoDB caching with slightly higher latency but no data loss. Services that do not need Redis-level speed can run entirely on DynamoDB caching — same interface, same code, different configuration.

WebSocket connections scale differently. API Gateway WebSocket APIs handle connection management automatically — there is no connection limit to configure, no load balancer to tune. The bottleneck is the DynamoDB connection store and the API Gateway Management API calls for message delivery.

For high-fanout scenarios — sending a notification to thousands of users simultaneously — we use SQS to queue the delivery. A Lambda function reads from the queue, looks up connections, and delivers messages in batches. This prevents the sending service from being blocked by slow deliveries and provides automatic retry for failed sends.

The combination of DynamoDB for state, Redis for speed, API Gateway for connections, and SQS for delivery gives us a real-time system that scales without any servers to manage. Every component is managed, pay-per-use, and scales to zero when idle.

07Lessons Learned

Building session management and real-time push for serverless taught us several lessons.

First, Redis TTL is your friend. Login sessions and MFA sessions expire automatically. No cleanup jobs, no scheduled Lambda functions, no DynamoDB scans. Set the TTL when you create the session and forget about it.

Second, transactional session creation prevents race conditions. Creating a session and evicting the oldest session must happen atomically. Without transactions, two concurrent logins could both succeed and leave the user with 6 sessions.

Third, WebSocket disconnect events are unreliable. Always use TTL as a backstop. A 24-hour TTL on connections means stale connections are cleaned up within a day, even if the disconnect Lambda never fires.

Fourth, the event-driven coordinator pattern is worth the complexity. It adds a layer of abstraction, but it makes the auth flow testable, extensible, and observable. Every state transition emits a metric. Every failure triggers cleanup. New behaviors are added without touching existing code.

Fifth, cache consistency matters. When a session is created, updated, or deleted, both DynamoDB and Redis must be updated. We update DynamoDB first (source of truth) and Redis second (cache). If the Redis update fails, the next cache miss will repopulate from DynamoDB. Eventually consistent, but always correct.

📊

Key lessons: Redis TTL for automatic expiration, transactional session creation, TTL as a backstop for unreliable WebSocket disconnects, and DynamoDB-first writes for cache consistency.

Stateless does not mean sessionless. It means the state lives somewhere else — in DynamoDB for durability, in Redis for speed, in API Gateway for connections. The challenge is coordinating these pieces into a system that feels seamless to the user. They log in, they see their sessions, they trust their devices, they receive real-time notifications — and they never think about the Lambda functions, DynamoDB transactions, and WebSocket connections making it all work. That invisibility is the goal.

Editor's Note: This is Framework Series #11 in the TCTF Newsletter. Next in the series: DynamoDB Providers and the Repository Pattern — clean data access for serverless.

Never miss an edition

Subscribe to get TCTF newsletters delivered to your inbox.

Subscribe
PreviousCircuit Breaker Pattern for Serverless: How We Protect External Service Calls
NextDynamoDB Providers and the Repository Pattern: Clean Data Access for Serverless

In This Edition

  • The Stateless Session Problem
  • Session Data Model
  • The Session Coordinator: Event-Driven Auth Flow
  • Session CRUD: Creator, Retriever, Updater, Deleter
  • WebSocket Connections: Real-Time in a Stateless World
  • Scaling Considerations
  • Lessons Learned

Browse by Month

May
  • The Struggles of Timelines and Schedules: When Building Gets Real
  • How to Stay Motivated in the Face of Uncertainties: Faith Beyond Doubt
  • Cognito Middleware: Building an Authentication Pipeline for Serverless APIs
  • Building TCTF's DynamoDB Query Framework, Part 1: Single-Table Design Patterns
April
  • Built to Last: Why Sustained Collaboration Is the Future of Tech Teams
  • Q2 2026 Roadmap: What's Next for the TCTF Portal
March
  • How We Built a Real-Time Messaging System with AWS Lambda and WebSockets
  • From Forum to Social Network: The Origin Story of Cometbid Social
  • Agentic AI: What It Means for Software Development and Why We're Paying Attention
February
  • Platform Update: Social Network Architecture, Achievement Engine, and What's Next
  • How We Built 34 Serverless Microservices: Architecture Patterns Behind the TCTF Platform
January
  • How We Secure the TCTF Platform: Principles Every Developer Should Know
  • New Year, New Projects: TCTF 2026 Roadmap

More From TCTF Newsletter

Built to Last: Why Sustained Collaboration Is the Future of Tech TeamsVol. 1, Issue 4

Built to Last: Why Sustained Collaboration Is the Future of Tech Teams

Most platforms optimize for transactions — post a job, hire, move on. TCTF is built around sustained collaboration: long-term teams, milestone-driven projects, language support that breaks barriers, and a community where everyone — not just developers — has a seat at the table.

April 15, 2026
Q2 2026 Roadmap: What's Next for the TCTF PortalQ2 2026

Q2 2026 Roadmap: What's Next for the TCTF Portal

Our quarterly roadmap for Q2 — what shipped in April, the origin of Cometbid Social, and the plan for May and June as we build toward user accounts, authentication, and the social network launch.

April 1, 2026
How We Built a Real-Time Messaging System with AWS Lambda and WebSocketsTech Series #3

How We Built a Real-Time Messaging System with AWS Lambda and WebSockets

Inside the architecture of TCTF's messaging platform — three services handling real-time chat, campaign delivery, and transactional notifications, all built on Lambda, API Gateway WebSockets, SQS, and multi-provider email with automatic failover.

March 15, 2026

Browse by Month

2026

May
  • The Struggles of Timelines and Schedules: When Building Gets Real
  • How to Stay Motivated in the Face of Uncertainties: Faith Beyond Doubt
  • Cognito Middleware: Building an Authentication Pipeline for Serverless APIs
  • Building TCTF's DynamoDB Query Framework, Part 1: Single-Table Design Patterns
April
  • Built to Last: Why Sustained Collaboration Is the Future of Tech Teams
  • Q2 2026 Roadmap: What's Next for the TCTF Portal
March
  • How We Built a Real-Time Messaging System with AWS Lambda and WebSockets
  • From Forum to Social Network: The Origin Story of Cometbid Social
  • Agentic AI: What It Means for Software Development and Why We're Paying Attention
February
  • Platform Update: Social Network Architecture, Achievement Engine, and What's Next
  • How We Built 34 Serverless Microservices: Architecture Patterns Behind the TCTF Platform
January
  • How We Secure the TCTF Platform: Principles Every Developer Should Know
  • New Year, New Projects: TCTF 2026 Roadmap