Skip to content

Self-Hosted Milestone ​

The near-term adoption goal is a compute-first self-hosted durable coordinator path that users can run, inspect, and operate directly.

The first reference is examples/restaurant-approval/self-host, which runs one batteries-included coordinator process from the packaged monolith-svc artifact. It uses the local in-process transition worker by default so users can exercise hosted-style submission, release activation, await completion, and result inspection without starting a second service.

Current Proof ​

CapabilityStatus
Batteries-included local coordinator demopresent in restaurant-approval
Coordinator and worker split-process proofpresent in RestaurantApprovalHostedCoordinatorRestWorkerIT
Containerized HA referencepresent in restaurant-approval/self-host/container
Release registration and activationpresent
Execution pinning to active release versionpresent
Worker availability check before submitpresent
Await pending query and completionpresent
Accepted/declined terminal resultspresent
Failure/DLQ incident walkthroughpresent in restaurant-approval/self-host
Single-execution operator re-drivepresent for terminal DLQ and explicit FAILED executions
Operator walkthroughpresent in restaurant-approval/self-host
Production-ish deployment recipepresent in Self-Hosted Deployment
Durable release metadataDynamo registry with immutable release records and append-only activation events
Shared/replicated artifact storagelocal filesystem or S3-compatible blob store for coordinator-managed artifacts; OCI/Maven-style repositories remain preferred for native artifact forms
Minimal worker lifecycleregistration, heartbeat, drain, stale detection, and submit-time healthy-worker gate
Kafka await over streamcovered separately by csv-payments

Remaining Gap ​

GapWhy it matters
Bulk DLQ replay campaignssingle-execution re-drive exists, but there is no DLQ-message consumer or batch replay surface
Append-only execution/await storesexisting Dynamo execution and await stores still use conditional updates for leases and state transitions

The self-host path is compute-first: a coordinator service owns durable execution state and dispatches work to local, REST, gRPC, or SQS workers. Current FUNCTION support is serverless invocation/adapter support, not a TPF-owned durable HA coordinator.

The main remaining HA gap is not the release metadata model, the minimum worker lifecycle gate, single-execution re-drive, or FUNCTION support. It is making execution/await state fully append-only and deciding whether bulk replay belongs in the framework or in operator/application runbooks.

All-serverless durable orchestration would be a separate design. It would need a coordinator loop backed by durable services such as DynamoDB, SQS, and EventBridge-style scheduling rather than relying on a Lambda/Azure/GCP function process to hold orchestration state.