0xdsqr/posts/misc/about
Dave Dennis (@0xdsqr)
••
Durable Lambdas

Durable Lambdas

Author
·
November 3, 2025·10 min·Blog
awstypescript
Loading post...

comments (0)

sign in to leave a comment

no comments yet. be the first to share your thoughts.

  • Building Stateful Serverless: Durable Lambda
  • The Problem
  • The Architecture
  • 1. Distributed Locking Layer
  • 2. State Persistence with Versioning
  • 3. Message Ordering with SQS FIFO
  • 4. Event Bus for Signals and Timers
  • Invocation Modes
  • Synchronous Calls
  • Asynchronous Calls
  • The Runtime
  • Infrastructure with CDK
  • Key Technical Decisions
  • Type Safety Throughout
  • CLI Tool with Bun
  • What You Get
  • Performance Characteristics
  • Learning Outcomes

Building Stateful Serverless: Durable Lambda

I built Durable Lambda as a learning project to explore actor-based state management on AWS serverless infrastructure. It brings Cloudflare Durable Objects-style guarantees to Lambda—single-instance execution, distributed locking, and automatic persistence—using only DynamoDB, SQS, and EventBridge.

The Problem

Serverless scales horizontally, but stateful applications create headaches:

  • Race conditions in concurrent Lambda executions
  • Distributed state consistency issues
  • Loss of single-instance execution guarantees
  • Manual coordination of state mutations

I needed a system that could guarantee only one Lambda ever mutates an actor's state, while keeping everything distributed and scalable.

The Architecture

Durable Lambda uses a layered approach:

1. Distributed Locking Layer

At the core is DynamoDB-based distributed locking. Before any Lambda executes actor code, it must acquire a lock:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
try {
  await ddb.send(
    new PutCommand({
      TableName: "DurableLocks",
      Item: {
        actorId,
        lockHolder: lambdaInstanceId,
        expiresAt: now + ttlSeconds * 1000,
        ttl: Math.floor((now + ttlSeconds * 1000) / 1000),
      },
      ConditionExpression:
        "attribute_not_exists(actorId) OR expiresAt < :now",
      ExpressionAttributeValues: { ":now": now },
    }),
  )
  return true
} catch (error) {
  if (error.name === "ConditionalCheckFailedException") return false
  throw error
}

The key is the conditional expression: only succeed if the lock doesn't exist or has expired. This prevents concurrent mutations.

2. State Persistence with Versioning

Actor state lives in DynamoDB with optimistic locking via versions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
async function save<T>(state: DurableState<T>) {
  const nextVersion = state.version + 1
  await ddb.send(
    new PutCommand({
      TableName: tableName,
      Item: {
        ...state,
        version: nextVersion,
        updatedAt: new Date().toISOString(),
      },
      ConditionExpression: "attribute_not_exists(version) OR version = :v",
      ExpressionAttributeValues: { ":v": state.version },
    }),
  )
  state.version = nextVersion
}

If the version doesn't match, the write fails—preventing concurrent modifications from different Lambda instances.

3. Message Ordering with SQS FIFO

For async operations, messages queue in SQS FIFO with per-actor ordering:

1
2
3
4
5
6
7
8
await sqs.send(
  new SendMessageCommand({
    QueueUrl: queueUrl,
    MessageGroupId: actorId,  // Ensures ordering per actor
    MessageDeduplicationId: eventId,
    MessageBody: JSON.stringify({ actorId, eventId, payload }),
  }),
)

Messages group by actorId, guaranteeing order. The Lambda function polls this queue and processes messages sequentially per actor.

4. Event Bus for Signals and Timers

EventBridge handles inter-actor communication and scheduled operations:

1
2
3
4
5
6
7
8
9
10
11
12
13
await bus.send(
  new PutEventsCommand({
    Entries: [
      {
        EventBusName: busName,
        Source: "durable.lambda",
        DetailType: "TimerFired",
        Detail: JSON.stringify({ actorId, delayMs }),
        Time: new Date(Date.now() + delayMs),
      },
    ],
  }),
)

EventBridge rules forward timer events and signals back to the Lambda, triggering actor execution.

Invocation Modes

Synchronous Calls

For sync calls, the handler loads state immediately, executes the user function, and returns the result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
if (sync) {
  const state = await runtime.state.load<T>(actorId)
  const ctx = {
    actorId,
    state: state.data,
    version: state.version,
    save: async (ns?: T) => {
      state.data = ns ?? ctx.state
      await runtime.state.save(state)
    },
  }
  const result = await fn(ctx, userPayload)
  return result || { ok: true, state: ctx.state }
}

Latency: 50-100ms (lock acquisition + state load + execution + save).

Asynchronous Calls

For async, the message queues to SQS and returns immediately:

1
2
await fn(runtime as any, event)
return { ok: true, processed: true, actorId }

SQS triggers the Lambda later via event source mapping, processing messages in order per actor.

The Runtime

I implemented a minimal but complete runtime with core services:

  • StateService - Load/save state with versioning
  • LockService - DynamoDB-based distributed locks with TTL
  • QueueService - SQS message batching and coalescing
  • EventService - EventBridge timers and signals
  • WorkflowService - Multi-step workflows with state tracking

Each service is strictly typed with TypeScript interfaces to prevent type-unsafe operations.

Infrastructure with CDK

The DurableFabric construct provisions everything:

1
2
3
4
const fabric = new DurableFabric(this, "DurableFabric", {
  lambdas: [handler1, handler2],
  prefix: "MyApp",
})

This creates:

  • 3 DynamoDB tables (State, Locks, Workflows)
  • 1 SQS FIFO queue with dead-letter queue
  • 1 EventBridge bus with rules for timers and signals
  • Proper IAM roles and permissions
  • Automatic environment variable injection

Key Technical Decisions

Why DynamoDB locks over Redis? - No external service dependency. Conditional puts are atomic. TTL auto-cleanup prevents lock leaks.

Why SQS FIFO over Kinesis? - Built-in ordering per partition key (actorId). Lower cost. Dead-letter queue handling included.

Why EventBridge over SNS/custom scheduler? - Time-based delivery native. Rules-based filtering. Clean separation between sync and async paths.

Why version numbers over timestamps? - Prevents ABA problem. Deterministic conflict detection. Works even with clock skew.

Type Safety Throughout

I stripped all any types:

1
2
3
4
5
6
7
8
9
10
11
interface DurableEvent extends Record<string, unknown> {
  actorId: string
  payload: Record<string, unknown>
  sync?: boolean
}

export interface WorkflowService {
  create(): Promise<string>
  resolve(id: string, output: Record<string, unknown>): Promise<void>
  get(id: string): Promise<Record<string, unknown> | undefined>
}

This prevents runtime errors at compile time.

CLI Tool with Bun

I built the CLI using Bun's shell API:

1
2
3
4
import { $ } from "bun"

await $`git clone https://github.com/0xdsqr/durable-lambda.git --depth 1 ${projectPath}/.tmp`
await $`cp -r ${projectPath}/.tmp/examples/basic/. ${projectPath}/`

Bun executes shell commands with TypeScript interpolation, no subprocess library needed.

What You Get

  • 6 Working Examples - Counter, rate limiter, leaderboard, order processing, distributed cron, circuit breaker
  • Nix Environment - Reproducible dev setup with Bun, TypeScript, AWS CDK, Biome
  • Monitoring Guide - CloudWatch log queries, DynamoDB metrics, SQS depth tracking
  • CDK Construct - Deploy with one line of code
  • Full Type Safety - Zero any types in core library

Performance Characteristics

OperationLatency
Lock acquisition5-10ms
State load/save10-20ms
Sync call end-to-end50-100ms
Async message processing100-200ms
Max state size400KB (DynamoDB limit)

Learning Outcomes

Building this taught me:

  • How distributed locking prevents race conditions
  • Actor model semantics and guarantees
  • DynamoDB condition expressions for atomic operations
  • SQS FIFO ordering guarantees
  • Bun's performance and developer experience
  • CDK construct composition

This is v1.0.0—a learning project that actually works for stateful serverless applications.

Check out the GitHub repo for full documentation and deployment guide.