Field Materialization ​
Field materialization lets TPF keep large payload fields out of line while preserving the semantic message value. It is representation-level behavior: it is aspect-like in configuration and lifecycle, but it is not an ordinary plugin side effect because it may swap an inline field for a payload_ref sibling field.
Use it for claim-check style payloads such as parsed document text, byte blobs, large JSON fragments, or future protobuf/domain payloads that should not be carried through every runtime boundary.
Message Contract ​
Mark only fields that are safe to externalize. The inline field keeps its normal semantic type, and the reference is an explicit sibling field so wire compatibility stays visible.
messages:
ParsedDocument:
fields:
- number: 1
name: docId
type: string
- number: 2
name: text
type: string
optional: true
referenceable:
refField: textRef
- number: 3
name: textRef
type: payload_ref
optional: trueV1 materialization supports scalar string and bytes fields. Repeated fields, map fields, and nested paths are intentionally deferred.
Materialization Policy ​
Policies live under materialization.aspects, not business steps and not runtime mapping. This keeps the pipeline topology stable while allowing storage cost and replay policy to change.
materialization:
aspects:
- name: parsed-text-claim-check
enabled: true
scope: STEPS
position: AFTER_STEP
targetSteps: [Parse Document]
action: reference
message: ParsedDocument
fields: [text]
- name: chunker-needs-text
enabled: true
scope: STEPS
position: BEFORE_STEP
targetSteps: [Chunk Document]
action: dereference
message: ParsedDocument
fields: [text]reference stores the field in the configured repository provider, clears the inline value, and writes the sibling payload_ref. dereference loads the payload when the inline value is absent and the reference field is present.
Repository Providers ​
Add the repository provider dependency where materialization runs, then select a provider with runtime configuration.
<dependency>
<groupId>org.pipelineframework</groupId>
<artifactId>repository-plugin</artifactId>
<version>${pipelineframework.version}</version>
</dependency>For Gradle builds, add the equivalent org.pipelineframework:repository-plugin:${pipelineframeworkVersion} dependency.
pipeline.repository.provider=filesystem
pipeline.repository.filesystem.root=target/tpf-repository
pipeline.repository.verify-checksum=trueFor S3-compatible object storage:
pipeline.repository.provider=s3
pipeline.repository.s3.bucket=my-pipeline-payloads
pipeline.repository.s3.prefix=dev/search/
pipeline.repository.s3.region=eu-west-1
pipeline.repository.verify-checksum=trueUse pipeline.repository.s3.endpoint-override and pipeline.repository.s3.path-style=true for LocalStack or MinIO.
Validation ​
The compiler-facing YAML loader validates these rules early:
referenceable.refFieldmust point to an existing sibling field.- The sibling field must be optional and typed as
payload_ref. - Materialized fields must be scalar
stringorbytesin this first slice. - Materialization policies must name existing messages, fields, positions, actions, and target steps.
Ordinary aspects remain side-effect observations. Field materialization is a framework-owned representation transition that should be transparent to business operators when the policy says the operator receives hydrated data.