Orchestrator Runtime
The orchestrator runtime coordinates step execution for a pipeline and exposes generated transport entrypoints.
Runtime Modes
TPF supports two orchestrator runtime modes:
SYNC(default): in-process request/response execution.QUEUE_ASYNC: queue-driven async job execution with durable execution state providers.
Set mode with:
pipeline.orchestrator.mode=SYNC
# or
pipeline.orchestrator.mode=QUEUE_ASYNCTransport Surfaces
Generated orchestrator endpoints are transport-native:
- REST:
POST /pipeline/runPOST /pipeline/run-asyncGET /pipeline/executions/{executionId}GET /pipeline/executions/{executionId}/resultPOST /pipeline/ingestGET /pipeline/subscribe
- gRPC:
RunRunAsyncGetExecutionStatusGetExecutionResultIngestSubscribe
- Function/Lambda:
PipelineRunFunctionHandlerPipelineRunAsyncFunctionHandlerPipelineExecutionStatusFunctionHandlerPipelineExecutionResultFunctionHandler
Checkpoint Publication
Reliable cross-pipeline handoff is orchestrator-owned and checkpoint-based.
Supported in this release:
- source: final pipeline checkpoint publication from a
QUEUE_ASYNCorchestrator, - target: downstream orchestrator async admission bound by a named logical publication,
- idempotency: preserve incoming dispatch metadata when present, otherwise derive a deterministic handoff key from configured checkpoint fields,
- ownership: downstream retry/DLQ remains orchestrator-owned after async admission.
Declare reliable handoff in pipeline.yaml:
input:
subscription:
publication: "checkout.orders.ready.v1"
mapper: "com.example.pipeline.mapper.ReadyOrderMapper"
output:
checkpoint:
publication: "checkout.orders.dispatched.v1"
idempotencyKeyFields: ["orderId", "customerId", "readyAt"]Runtime behavior:
- build-time validation checks checkpoint boundary declarations and mapper compatibility,
- runtime endpoint bindings come from
pipeline.handoff.bindings.<publication>.targets.*, - publication is generated into existing orchestrator ownership; no separate connector runtime or deployment role is introduced,
- subscriber admission is handled by framework-owned HTTP and gRPC checkpoint publication endpoints instead of runtime subscription discovery,
- protobuf-over-HTTP and gRPC use the same framework-owned checkpoint protobuf envelope for transport-native admission,
- reliable handoff is supported only for
QUEUE_ASYNCorchestrators and is rejected forFUNCTIONpipelines, - live
Subscriberemains a weaker observer/tap surface and is not the reliable checkpoint handoff path.
Not yet supported:
- generic broker-message re-drive,
- a separate checkpoint-publication durable plane or publication-specific DLQ,
- dynamic runtime publication discovery or runtime subscription registration,
- broker-backed publication targets such as
SQSorKAFKA.
Queue-Async Semantics
In QUEUE_ASYNC mode:
- committed execution state transitions are exactly-once (OCC/conditional-write guarded),
- dispatch and operator invocation are at-least-once,
- duplicate invocation can occur and must be handled with idempotency keys,
- streaming outputs are rejected for async execution in the current 26.2.x release line,
- persisted protobuf payload metadata stores
_tpf_messageas the protobuf schema full name.
Queue-Async Control Plane
Runtime provider choices:
ExecutionStateStore:memory(dev),dynamo(durable).WorkDispatcher:event(in-process),sqs(durable queue).DeadLetterPublisher:log(built-in fallback),sqs(durable DLQ).
Failure channel split:
- Execution-level terminal failures use orchestrator DLQ (
DeadLetterPublisher). - Step-level recover-and-continue failures use Item Reject Sink (
pipeline.item-reject.*,rejectItem/rejectStream).
Execution lifecycle (one transition per worker claim):
Submit(run-async)
-> createOrGetExecution (dedupe key + execution row)
-> enqueue work item
-> worker claimLease (OCC + lease expiry)
-> execute transition
-> commit transition (markSucceeded / scheduleRetry / markTerminalFailure)
-> enqueue next transition OR finalize terminal stateRecovery points:
- crash before commit: queue redelivery replays the transition.
- crash after commit before next enqueue: due sweeper re-dispatches.
- worker death while leased: lease expiry allows takeover.
These guarantees are deterministic for orchestrator state, not for external side effects; downstream step boundaries must accept at-least-once invocation.
Queue-Async HA Baseline
Use this as a minimum production baseline for queue-driven HA:
pipeline.orchestrator.mode=QUEUE_ASYNC
pipeline.orchestrator.state-provider=dynamo
pipeline.orchestrator.dispatcher-provider=sqs
pipeline.orchestrator.dlq-provider=sqs
pipeline.orchestrator.queue-url=https://sqs.eu-west-1.amazonaws.com/123456789012/tpf-work
pipeline.orchestrator.dlq-url=https://sqs.eu-west-1.amazonaws.com/123456789012/tpf-dlq
pipeline.orchestrator.idempotency-policy=CLIENT_KEY_REQUIRED
pipeline.orchestrator.strict-startup=trueOperational expectations for this baseline:
- state transitions remain OCC-guarded and lease-claimed,
- queue delivery and operator invocation remain at-least-once,
- terminal dead-letter events are durable, not process-local log-only.
CI confidence for this baseline:
SYNCremains the default runtime mode and the fast baseline configuration.QUEUE_ASYNCremains opt-in and requires explicit durable provider configuration.- the durable HA gate exercises the checkout
deliver-orderrecovery path againstdynamo+sqssemantics withDynamoDB Local+ElasticMQ. - this gate covers:
- worker kill takeover,
- sweeper redispatch,
- duplicate submit determinism,
- durable DLQ publication.
Generated Structure
orchestrator-svc/
├── src/main/java/<base>/orchestrator/service/
│ ├── PipelineRunResource.java
│ ├── OrchestratorGrpcService.java
│ ├── PipelineRunFunctionHandler.java
│ ├── PipelineRunAsyncFunctionHandler.java
│ ├── PipelineExecutionStatusFunctionHandler.java
│ └── PipelineExecutionResultFunctionHandler.java
└── src/main/resources/application.properties