
How we structured error handling across 34 microservices — custom error hierarchies, the withErrorHandling wrapper, priority-based error routing, the ErrorResponseBuilder, and the factory pattern for specialized handlers.
Every Lambda function can fail. The database times out. The user sends invalid input. The authentication token expires. The external API returns a 503. The question is not whether errors happen, but how they are handled when they do. At TCTF, error handling is not an afterthought bolted onto each Lambda function. It is a shared architecture — a hierarchy of custom error classes, a centralized handler that routes errors to the right response, and a wrapper that ensures every Lambda function handles errors consistently. This article explains how we built an error handling system that works across 34 microservices without duplicating a single line of error-handling code.
Without a shared error handling architecture, every Lambda function reinvents the wheel. One function catches errors and returns a 500 with a generic message. Another returns the raw error message (leaking internal details). A third forgets to catch errors entirely and lets the Lambda runtime return its default error response.
The result is inconsistent API responses. The frontend cannot reliably parse error responses because every endpoint formats them differently. The monitoring team cannot aggregate errors because there is no common error code scheme. And debugging is painful because error messages do not include correlation IDs, timestamps, or context.
Worse, some errors need special handling. A rate limit error should return a 429 with a Retry-After header. An authentication error should return a 401 and invalidate the session. A validation error should return a 400 with field-level details. A database error should return a 500 but never expose the raw DynamoDB error to the client.
Handling all of this correctly in every Lambda function is error-prone and tedious. The solution: handle it once, in a shared library, and wrap every Lambda function with it.
🎯Without shared error handling, every Lambda function reinvents the wheel. Inconsistent responses, leaked internal details, missing correlation IDs. The solution: handle it once, wrap every function.
At the foundation is CustomError — a base class that extends JavaScript's native Error with three additional properties: errorCode (a machine-readable string like AUTHENTICATION_ERROR), statusCode (the HTTP status code to return), and additionalData (a key-value map of context).
Every error in the system extends CustomError. The hierarchy is organized by domain:
Authentication errors: AuthenticationError (401), AuthorizationError (403), UserExistsError (409), UserNotFoundError (404). These cover the auth flow — wrong credentials, insufficient permissions, duplicate signups, missing users.
Token errors: AccessTokenError, RefreshTokenError, MissingTokenError, TokenValidationError. These cover JWT lifecycle — expired tokens, invalid signatures, missing headers, malformed payloads.
Resource errors: ResourceNotFoundException (404) is the base, with domain-specific subclasses like UserNotFoundError. Any entity that can be looked up and not found gets its own error class.
System errors: TimeoutError (408), RateLimitError (429), DatabaseError (500), DynamodbError (500). These cover infrastructure-level failures — slow responses, throttling, database issues.
Validation errors: BadRequestException (400) with field-level details. Request body validation, query parameter validation, path parameter validation.
Crypto errors: EncryptionError, DecryptionError, KeyRotationError. These cover the encryption service failures.
The hierarchy matters because the error handler uses instanceof checks to route errors. An AuthenticationError is handled differently from a DatabaseError, which is handled differently from a ValidationError. The class hierarchy makes this routing clean and extensible.
🏗️ 15+ error classes organized by domain. Every error carries an errorCode, statusCode, and additionalData. The class hierarchy enables clean, extensible error routing.

Every Lambda handler at TCTF is wrapped with withErrorHandling. This is a higher-order function that takes a handler function and returns a new function with error handling built in.
The wrapper does three things. First, it generates a correlation ID for the request — a unique identifier that traces the request through every log entry, every error response, and every downstream service call. Second, it records the start time for performance tracking. Third, it wraps the handler in a try-catch that routes any thrown error to the centralized handleError function.
The handler function receives the API Gateway event, the correlation ID, and the start time as parameters. It does not need to worry about error handling — it just throws errors when something goes wrong, and the wrapper catches them.
The wrapper also supports an optional cleanup function that runs in the finally block — useful for releasing resources, closing connections, or flushing metrics regardless of whether the handler succeeded or failed.
This pattern means every Lambda function in the platform has consistent error handling, consistent correlation IDs, consistent performance tracking, and consistent response formatting — without any of that code being duplicated in the handler itself.
🔧withErrorHandling wraps every Lambda handler. It generates correlation IDs, records timing, catches all errors, and routes them to the centralized handler. Zero error-handling code in the business logic.
The handleError function is the brain of the error handling system. It receives an error and routes it to the appropriate response builder based on a priority order.
The priority is deliberate. Validation errors are checked first because they are the most common — bad input from the client. Rate limit errors come next because they need a Retry-After header. Timeout errors follow because they indicate infrastructure stress. Authentication and authorization errors come next because they need security context (was the token invalid? was it missing?).
Resource not found errors, bad request errors, and database errors follow in decreasing frequency. Custom application errors with their own status codes are handled next. AWS service errors (from the SDK) are mapped using an error registry. And finally, unknown errors — anything that does not match any of the above — get a generic 500 response.
This priority order means the most common errors are handled with the fewest checks. A validation error hits on the first check. An unknown error falls through all checks to the bottom. The system is optimized for the common case while still handling every edge case.
Every error response in the platform follows the same JSON structure. The ErrorResponseBuilder ensures this consistency.
Every response includes: the HTTP status code, a machine-readable error code, a human-readable message (safe to show to users), the correlation ID (for support tickets and debugging), and a timestamp. Some responses include additional fields — a Retry-After header for rate limit errors, field-level details for validation errors, a security context for auth errors.
The builder has specialized methods for each error category: buildValidationErrorResponse, buildRateLimitErrorResponse, buildTimeoutErrorResponse, buildSecurityErrorResponse, buildResourceNotFoundErrorResponse, buildCustomErrorResponse, buildAWSServiceErrorResponse, and buildUnknownErrorResponse.
Critically, the builder never exposes internal error details to the client. A DynamoDB ConditionalCheckFailedException becomes a generic database error. A Cognito NotAuthorizedException becomes an authentication error. The raw error is logged server-side with the correlation ID, but the client only sees a safe, formatted response.
This separation — detailed logging server-side, safe responses client-side — is essential for security. Internal error messages can reveal database schemas, service names, and infrastructure details that an attacker could exploit.
🔒The ErrorResponseBuilder never exposes internal details to the client. Raw errors are logged server-side with correlation IDs. The client sees safe, formatted responses. Security by design.
Some services need specialized error handling beyond the default. The error handler factory provides pre-built handlers for common scenarios.
createAPIErrorHandler wraps errors with API-specific context — the API name, the endpoint, the HTTP method. This context appears in logs and metrics, making it easy to identify which API endpoint is generating errors.
createAuthErrorHandler adds authentication-specific handling — logging failed login attempts, tracking suspicious patterns, and enriching error responses with security context.
createDatabaseErrorHandler adds database-specific handling — distinguishing between transient errors (throttling, timeout) that should be retried and permanent errors (validation, not found) that should not.
createValidationErrorHandler adds field-level validation context — which field failed, what the expected format was, what the actual value was (sanitized). This makes validation errors actionable for the frontend.
These factory handlers compose with the default handler. A service can use createAuthErrorHandler for its authentication endpoints and the default handler for everything else. The factory pattern keeps the specialization modular and reusable.
With this architecture in place, several things become possible that would be difficult without it.
Consistent API responses across all 34 services. The frontend team writes one error parser that works with every endpoint. No special cases, no per-service error formats.
Correlation ID tracing from request to response. When a user reports an error, the support team searches by correlation ID and sees the entire request path — the Lambda handler, the database query, the external service call, and the exact error that occurred.
Metrics aggregation by error type. CloudWatch dashboards show authentication errors, validation errors, rate limit errors, and database errors as separate metrics. Alerts trigger on error rate spikes per category, not just overall error rates.
Safe error responses by default. No developer can accidentally leak a DynamoDB error message or a Cognito exception to the client. The ErrorResponseBuilder enforces the safety boundary.
And zero error-handling boilerplate in business logic. Lambda handlers throw errors. The wrapper catches them. The handler routes them. The builder formats them. The handler function itself is pure business logic — clean, focused, and testable.
✅Consistent responses across 34 services. Correlation ID tracing. Metrics by error type. Safe responses by default. Zero boilerplate in business logic. One architecture, used everywhere.
Error handling is the least glamorous part of any platform. Nobody writes blog posts about their error classes. Nobody tweets about their error response format. But when a user hits an error and sees a clear message with a correlation ID they can send to support — and the support team finds the exact issue in seconds — that is the architecture paying off. Every error, handled consistently, across every service, every time. That is what shared error handling gives you.
Never miss an edition
Subscribe to get TCTF newsletters delivered to your inbox.