Search Replay Walkthrough ​
This walkthrough uses the Search pipeline to show replay and versioned caching end to end.
Pipeline shape ​
The expensive stage is Crawl. We want to re-index with a new tokenizer without re-crawling.
Search use cases ​
- Normal production run:
x-pipeline-cache-policy: prefer-cachekeeps throughput high and reuses stable outputs. - Deterministic replay:
x-pipeline-cache-policy: require-cacheensures only cached entries are used. - Forced rebuild:
x-pipeline-cache-policy: cache-onlyoverwrites cached outputs without reads. - Debug or verification:
x-pipeline-cache-policy: bypass-cacheruns the pipeline without cache I/O.
Invalidation steps are reserved for targeted corrections (bug fixes, schema changes). The runtime only propagates x-pipeline-replay as metadata; replay-aware tooling or your custom invalidation logic, not the runtime, must read that header and perform cache invalidation or replay-specific actions.
Step 1: Choose cache keys ​
Define cache key strategies that emit stable keys for each output type:
import java.util.Optional;
public class ParsedDocumentKeyStrategy implements CacheKeyStrategy {
@Override
public Optional<String> resolveKey(Object item, PipelineContext context) {
if (!(item instanceof ParsedDocument doc) || doc.docId == null) {
return Optional.empty();
}
return Optional.of(doc.getClass().getName() + ":" + doc.docId);
}
}Step 2: Run baseline (v1) ​
x-pipeline-version: v1
x-pipeline-cache-policy: cache-onlyThis caches every stage output under:
v1:{Type}:{docId}Step 3: Recompute downstream while reusing cached upstream outputs ​
Change the tokenizer logic and reuse cached outputs from earlier steps by keeping the same version tag:
x-pipeline-version: v1
x-pipeline-cache-policy: prefer-cacheNow:
- Parse cache lookup hits
v1:{Type}:{docId}. - Tokenize runs with new logic.
- Index runs with new logic.
- Outputs are cached under
v1:{Type}:{docId}.
x-pipeline-replay is forwarded as a header only by the core runtime. If your deployment adds replay-aware invalidation logic, document that component, not the runtime, as the interpreter of the header.
Caching happens in the cache plugin side effect steps, so the step services remain unchanged.
Step 4: Fork a new version ​
If you want a clean namespace for a new run, bump the version tag:
x-pipeline-version: v2
x-pipeline-cache-policy: cache-onlyManual verification (curl + Redis) ​
Warm the cache:
curl -k -X POST https://localhost:8443/pipeline/run \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'x-pipeline-version: v4' \
-H 'x-pipeline-cache-policy: prefer-cache' \
-d '{"docId":"00000000-0000-0000-0000-000000000001","sourceUrl":"https://example.com"}'Inspect Redis keys:
redis-cli --scan --pattern "pipeline-cache:v4:*"Expected key shapes (Search example):
v4:org.pipelineframework.search.common.domain.RawDocument:https://example.com|method=GET|accept=text/htmlv4:org.pipelineframework.search.common.domain.ParsedDocument:<rawContentHash>v4:org.pipelineframework.search.common.domain.TokenBatch:<contentHash>:model=v1v4:org.pipelineframework.search.common.domain.IndexAck:<tokensHash>:schema=v1
Force deterministic replay (should fail on cold cache, succeed on warm cache):
curl -k -X POST https://localhost:8443/pipeline/run \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'x-pipeline-version: v4' \
-H 'x-pipeline-cache-policy: require-cache' \
-d '{"docId":"00000000-0000-0000-0000-000000000001","sourceUrl":"https://example.com"}'Use invalidation only when replaying:
The runtime only forwards x-pipeline-replay; replay-aware tooling or custom invalidation logic must perform any invalidation triggered by this request.
curl -k -X POST https://localhost:8443/pipeline/run \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'x-pipeline-version: v4' \
-H 'x-pipeline-cache-policy: prefer-cache' \
-H 'x-pipeline-replay: true' \
-d '{"docId":"00000000-0000-0000-0000-000000000001","sourceUrl":"https://example.com"}'This will miss old cache entries and recompute the pipeline when replay-aware invalidation or equivalent external replay logic is configured.
Replay flow diagram ​
Header propagation diagram ​
Outcome ​
- Old outputs remain under
v1. - New outputs land under
v2. - You can compare Index outputs across versions without re-crawling.