Protobuf-over-HTTP Dispatch Design (Orchestrator)
Scope
This document defines orchestrator-side dispatch changes required to make Protobuf-over-HTTP replay-safe under at-least-once delivery and multi-instance deployments.
Status in current GA track:
- Contract parity is being aligned across REST, gRPC, Function, and Protobuf-over-HTTP.
- Async control-plane semantics are shared across transports (same transition identity and retry model).
- Event-sourced dispatch journaling remains a future evolution path, not a GA blocker.
Dispatch Metadata Generation
For every outbound dispatch, orchestrator must emit:
x-tpf-correlation-idx-tpf-execution-idx-tpf-idempotency-keyx-tpf-retry-attemptx-tpf-deadline-epoch-ms(optional; when absent, receiver executes without deadline guard)x-tpf-dispatch-ts-epoch-msx-tpf-parent-item-id(optional)
Idempotency keys must be deterministic from stable work identity (pipeline + step + business key + lineage index), never random UUIDs.
Durable Dispatch State
Current baseline in GA code:
- execution-level durable state is tracked in shared store (
ExecutionStateStore), - work dispatch is durable through queue dispatcher providers (
WorkDispatcher), - per-dispatch item journaling is not yet the runtime engine of record.
Future hardening target:
- Track every outbound item in durable shared storage keyed by
(executionId, stepId, idempotencyKey).
Required states:
PENDINGSENTACKEDFAILED_RETRYABLEFAILED_FINAL
Transitions:
PENDING -> SENTwhen wire dispatch starts.SENT -> ACKEDon successful response commit.SENT -> FAILED_RETRYABLEfor retryable envelope codes.SENT -> FAILED_FINALfor non-retryable envelope codes.FAILED_RETRYABLE -> SENTon redelivery attempt (retry attempt incremented).
Crash Recovery
On restart:
- Scan records not in
ACKED/FAILED_FINAL. - Re-enqueue deterministically honoring ordering constraints.
- Preserve idempotency key and correlation id.
- Increment retry attempt only for true redelivery.
Multi-instance Coordination
- Use shared durable storage (database or consistent KV) for dispatch state.
- Leader election is optional if state transitions are CAS/transactional.
- Duplicate in-flight sends are tolerated because operator side is dedupe-safe using stable idempotency keys.
Planned follow-up implementation work
- Add a dispatch store SPI for pluggable per-item durability.
- Add orchestrator worker reconciliation loop for orphaned
SENTrecords. - Add durable dead-letter routing for
FAILED_FINALwith operator-facing diagnostics.
Additional slice: dispatch parity across all transports
The dispatch metadata/state-machine contract must be enforced consistently beyond Protobuf-over-HTTP.
Target transports
- gRPC transport
- REST/JSON transport
- FUNCTION remote invoke path
- LOCAL transport (for parity testing and deterministic simulation)
Required parity outcomes
- Same canonical dispatch metadata keys and semantics.
- Same retry-attempt increment rules.
- Same deadline evaluation behavior (absolute deadline only).
- Same duplicate suppression contract (
executionId + stepId + idempotencyKey). - Same terminal classification into retryable/non-retryable outcomes.
Parity test slice
- Add transport-parameterized tests asserting identical behavior for:
- duplicate dispatch
- replay after crash simulation
- retry exhaustion handling
- deadline-expired fast-fail