Field Materialization
Field materialization lets TPF keep large payload fields out of line while preserving the semantic message value. It is representation-level behavior: it is aspect-like in configuration and lifecycle, but it is not an ordinary plugin side effect because it may swap an inline field for a payload_ref sibling field.
Use it for claim-check style payloads such as parsed document text, byte blobs, large JSON fragments, or future protobuf/domain payloads that should not be carried through every runtime boundary.
Message Contract
Mark only fields that are safe to externalize. The inline field keeps its normal semantic type, and the reference is an explicit sibling field so wire compatibility stays visible.
messages:
ParsedDocument:
fields:
- number: 1
name: docId
type: string
- number: 2
name: text
type: string
optional: true
referenceable:
refField: textRef
- number: 3
name: textRef
type: payload_ref
optional: trueV1 materialization supports scalar string and bytes fields. Repeated fields, map fields, and nested paths are intentionally deferred.
Materialization Policy
Policies live under materialization.aspects, not business steps and not runtime mapping. This keeps the pipeline topology stable while allowing storage cost and replay policy to change.
materialization:
aspects:
- name: parsed-text-claim-check
enabled: true
scope: STEPS
position: AFTER_STEP
targetSteps: [Parse Document]
action: reference
message: ParsedDocument
fields: [text]
- name: chunker-needs-text
enabled: true
scope: STEPS
position: BEFORE_STEP
targetSteps: [Chunk Document]
action: dereference
message: ParsedDocument
fields: [text]reference stores the field in the configured repository provider, clears the inline value, and writes the sibling payload_ref. dereference loads the payload when the inline value is absent and the reference field is present.
Repository Providers
Add the repository provider dependency where materialization runs, then select a provider with runtime configuration.
<dependency>
<groupId>org.pipelineframework</groupId>
<artifactId>repository-plugin</artifactId>
<version>${pipelineframework.version}</version>
</dependency>For Gradle builds, add the equivalent org.pipelineframework:repository-plugin:${pipelineframeworkVersion} dependency.
pipeline.repository.provider=filesystem
pipeline.repository.filesystem.root=target/tpf-repository
pipeline.repository.verify-checksum=trueFor S3-compatible object storage:
pipeline.repository.provider=s3
pipeline.repository.s3.bucket=my-pipeline-payloads
pipeline.repository.s3.prefix=dev/search/
pipeline.repository.s3.region=eu-west-1
pipeline.repository.verify-checksum=trueUse pipeline.repository.s3.endpoint-override and pipeline.repository.s3.path-style=true for LocalStack or MinIO.
Validation
The compiler-facing YAML loader validates these rules early:
referenceable.refFieldmust point to an existing sibling field.- The sibling field must be optional and typed as
payload_ref. - Materialized fields must be scalar
stringorbytesin this first slice. - Materialization policies must name existing messages, fields, positions, actions, and target steps.
Ordinary aspects remain side-effect observations. Field materialization is a framework-owned representation transition that should be transparent to business operators when the policy says the operator receives hydrated data.