Heimdall plugins
All plugins are declared in the top-level plugins list of EndpointPickerConfig. Heimdall assigns each plugin to the appropriate extension point based on the interfaces it implements. A single plugin can implement multiple interfaces (for example, context-length-aware acts as both a Filter and a Scorer; active-request-scorer acts as both a Scorer and a response lifecycle hook).
- Profile handlers manage the outer scheduling loop — selecting which profiles to run and aggregating results. They are not referenced from
schedulingProfiles. - Deciders are helper plugins consumed by
disagg-profile-handler(via itsdeciders.*parameters) or by the legacypd-profile-handler(via its flatdeciderPluginNameparameter, prefill decider only). They are declared in the top-levelpluginslist but not referenced fromschedulingProfiles. - Filter, Scorer, and Picker run within a
SchedulingProfile. Each profile executes them in order: Filters → Scorers → Picker. These are the only plugin types that can be referenced inschedulingProfiles[].plugins[].pluginRef. - Pre-request handlers hook into the request path before the scheduler runs. They are activated automatically when declared in the top-level
pluginslist and are not referenced fromschedulingProfiles. - Prepare-data plugins enrich the request with derived data (for example tokenized prompts) that downstream plugins can consume. They are activated automatically when declared in the top-level
pluginslist, provided theprepareDataPluginsfeature gate is enabled. - Response plugins hook into the response lifecycle after scheduling. They are activated automatically when declared in the top-level
pluginslist and are not referenced fromschedulingProfiles. - Data layer plugins (sources and extractors) feed pod metrics into the scheduler. They are referenced from the top-level
datafield, not fromschedulingProfiles, and require thedataLayerfeature gate to be enabled. - Store plugins manage multi-turn conversation state. They are activated automatically when declared in the top-level
pluginslist, provided theprepareDataPluginsfeature gate is enabled.
Profile handlers​
single-profile-handler​
Handles a single profile, which is treated as the primary profile. Suitable when you only need one scheduling profile per request.
No parameters.
disagg-profile-handler​
Unified profile handler for disaggregated inference deployments. It orchestrates up to three stages — decode, prefill, and encode — and consults per-stage decider plugins to decide whether each stage should run for a given request.
Stage pipeline:
- Decode always runs first and selects the primary endpoint.
- Encode (optional) runs next when the
deciders.encodeplugin decides the request benefits from a dedicated encode stage (for example, multimodal inputs). - Prefill (optional) runs last when the
deciders.prefillplugin decides the request has enough uncached tokens to justify dedicated prefill.
When prefill or encode selects an endpoint, disagg-headers-handler (declared separately) writes the chosen endpoint(s) into request headers so the decode pod can reach them.
Parameters use a nested format (preferred). A legacy flat format is still accepted for backward compatibility.
| Parameter | Type | Default | Description |
|---|---|---|---|
profiles.decode | string | "decode" | Name of the SchedulingProfile to use for decode endpoints. |
profiles.prefill | string | "prefill" | Name of the SchedulingProfile to use for prefill endpoints. |
profiles.encode | string | "encode" | Name of the SchedulingProfile to use for encode endpoints. |
deciders.prefill | string | (unset) | Name of the decider plugin that decides whether prefill runs. Unset disables the prefill stage. Must be registered on the plugin list. |
deciders.encode | string | (unset) | Name of the decider plugin that decides whether encode runs. Unset disables the encode stage. Must be registered on the plugin list. |
Legacy flat parameters (deprecated, still accepted — each logs a deprecation warning and maps into the nested form):
| Legacy parameter | Maps to |
|---|---|
decodeProfile | profiles.decode |
prefillProfile | profiles.prefill |
encodeProfile | profiles.encode |
prefillDeciderPluginName | deciders.prefill |
encodeDeciderPluginName | deciders.encode |
deciderPluginName | deciders.prefill (lower priority than prefillDeciderPluginName) |
When deciders.prefill or deciders.encode is set, disagg-profile-handler requires disagg-headers-handler to also be registered. The lookup happens at initialization, so both the referenced decider plugin and disagg-headers-handler must appear earlier in the top-level plugins list than disagg-profile-handler itself.
Minimal canonical example enabling prefill disaggregation with the prefix-based decider:
plugins:
- type: disagg-headers-handler
- type: prefix-based-pd-decider
parameters:
nonCachedTokens: 16
- type: prefill-filter
- type: decode-filter
- type: max-score-picker
- type: disagg-profile-handler
parameters:
deciders:
prefill: prefix-based-pd-decider
schedulingProfiles:
- name: prefill
plugins:
- pluginRef: prefill-filter
- pluginRef: max-score-picker
- name: decode
plugins:
- pluginRef: decode-filter
- pluginRef: max-score-picker
For a full walkthrough, see PD disaggregation.
pd-profile-handler (legacy)​
Separate profile handler for Prefill-Decode (PD) disaggregation, predating disagg-profile-handler. Still registered in moreh-v0.7.x with its own factory (parameter struct is flat, not nested).
| Parameter | Type | Default | Description |
|---|---|---|---|
decodeProfile | string | "decode" | Name of the SchedulingProfile to use for decode endpoints. |
prefillProfile | string | "prefill" | Name of the SchedulingProfile to use for prefill endpoints. |
prefixPluginType | string | "prefix-cache-scorer" | Plugin type of the prefix cache scorer the decider reads from. Must be the registered type string. |
prefixPluginName | string | (value of prefixPluginType) | Plugin name (instance name) of the prefix cache scorer. |
primaryPort | int | 0 | When non-zero, rewrites the decode endpoint's port to this value (used with data parallelism). Must be between 1 and 65535 when set. |
deciderPluginName | string | "prefix-based-pd-decider" | Name of the decider plugin. The referenced plugin must implement the PD decider interface. |
Like disagg-profile-handler, the decider plugin (and disagg-headers-handler) must appear earlier in the top-level plugins list than pd-profile-handler. New deployments should prefer disagg-profile-handler — pd-profile-handler is kept for backward compatibility with existing heimdall-values.yaml files.
Deciders​
Decider plugins are consumed by disagg-profile-handler through its nested deciders.* parameters, and by the legacy pd-profile-handler through its flat deciderPluginName parameter. pd-profile-handler only supports a prefill decider; encode deciders are exclusive to disagg-profile-handler. Each decider answers one of two questions:
- Prefill deciders — "should this request run prefill?" Consumed via
disagg-profile-handler.deciders.prefillorpd-profile-handler.deciderPluginName. - Encode deciders — "should this request run encode?" Consumed via
disagg-profile-handler.deciders.encode.
Declare the decider in the top-level plugins list (before the profile handler) and reference it by name.
prefix-based-pd-decider​
Runs prefill only when the request has enough non-cached tokens, based on how many prefix tokens already hit the cache. Prefill decider.
| Parameter | Type | Default | Description |
|---|---|---|---|
nonCachedTokens | int | 0 | Minimum number of non-cached tokens required to trigger prefill. With the default 0, P/D disaggregation is disabled and prefill never runs; set a positive threshold to enable it. |
always-disagg-pd-decider​
Always requests prefill. Equivalent to "PD disaggregation enabled for every request." Prefill decider.
No parameters.
always-disagg-multimodal-decider​
Runs encode whenever the incoming request contains multimodal content (image, audio, or video blocks). Encode decider.
No parameters.
Filters​
by-label​
Filters out pods based on the values defined by the given label.
| Parameter | Type | Default | Description |
|---|---|---|---|
label | string | - | The label key to filter by. (Required) |
validValues | []string | - | List of allowed values for the label. (Required unless allowsNoLabel is true) |
allowsNoLabel | bool | false | Whether to allow pods that do not have the specified label. |
by-label-selector​
Filters out pods that do not match the configured label selector criteria.
| Parameter | Type | Default | Description |
|---|---|---|---|
matchLabels | map[string]string | - | Key-value pairs of labels that must match. |
matchExpressions | []LabelSelectorRequirement | - | List of label selector requirements (set-based matching). |
prefill-filter​
Filters for pods designated with the prefill role. It retains pods whose label mif.moreh.io/role is set to prefill.
No parameters.
decode-filter​
Filters for pods designated with the decode role. It retains pods that satisfy one of the following conditions:
- The label
mif.moreh.io/roleis set todecodeorboth. - The label
mif.moreh.io/roleis not set.
No parameters.
encode-filter​
Filters for pods designated with an encode role. It retains pods whose mif.moreh.io/role label value is one of encode, encode-prefill, or encode-prefill-decode. Pods without the role label are rejected.
No parameters.
context-length-aware​
Also functions as a filter when enableFiltering is set to true. Pods whose label-defined range does not cover the estimated token count of the request are removed. See the scorer section for parameters.
Scorers​
active-request-scorer​
Scores pods based on the number of active (in-flight) requests being served. Scores are normalized from 0 to 1. Also hooks the request/response lifecycle to maintain its in-flight counter.
| Parameter | Type | Default | Description |
|---|---|---|---|
requestTimeout | string | "2m" | Go duration string (for example "30s", "1m"). A request older than this is treated as dropped. |
idleThreshold | int | 0 | Maximum active-request count for a pod to be treated as idle. Idle pods score 1.0. |
maxBusyScore | float | 1.0 | Upper bound on the score assigned to busy pods (range 0.0-1.0). Lower values widen the gap between idle and busy. |
load-aware-scorer​
Scores pods based on queue load. Pods with empty or lightly loaded queues receive higher scores.
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | int | 128 | Queue-size threshold used when normalizing load. |
no-hit-lru-scorer​
Favors pods that were least recently used for cold requests (requests that missed the prefix cache). Spreads cache growth across pods instead of piling it onto a single pod.
| Parameter | Type | Default | Description |
|---|---|---|---|
prefixPluginType | string | "prefix-cache-scorer" | Plugin type of the prefix cache scorer whose hit/miss state is observed. |
prefixPluginName | string | "prefix-cache-scorer" | Plugin name (instance name) of that prefix cache scorer. |
lruSize | int | 1024 | Maximum number of endpoints tracked in the LRU window. |
precise-prefix-cache-scorer​
Scores pods based on precise prefix-cache KV-block locality, computed from real-time KV-cache events published by each pod. Requires a tokenizer for the target model.
| Parameter | Type | Default | Description |
|---|---|---|---|
tokenProcessorConfig | Object | Library defaults (vllm scheme, block size 16). | Configuration for the token processor. |
indexerConfig | Object | Library defaults + tokenizersPoolConfig.modelName must be set. | Configuration for the KV cache indexer. |
kvEventsConfig | Object | Library defaults. | Configuration for KV events subscription. |
speculativeIndexing | bool | false | When true, proactively inserts predicted cache entries into the index right after a routing decision, closing the short window between the decision and KV-event arrival. |
speculativeTTL | string | "2s" | Go duration string. TTL for speculative entries before they are evicted. Ignored when speculativeIndexing is false. |
tokenProcessorConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
blockSize | int | 16 | Number of tokens per block. Must match the InferenceService's --block-size (the value vLLM is started with on the inference pods). |
indexerConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
kvBlockIndexConfig | Object | - | Configuration for the KV-block index backend. |
tokenizersPoolConfig | Object | - | Configuration for the tokenizers pool. (Required; must set modelName.) |
kvBlockIndexConfig​
Configure exactly one backend.
| Parameter | Type | Default | Description |
|---|---|---|---|
inMemoryConfig | Object | - | Configuration for in-memory index. |
redisConfig | Object | - | Configuration for Redis index. |
valkeyConfig | Object | - | Configuration for Valkey index. |
costAwareMemoryConfig | Object | - | Configuration for cost-aware memory index. |
enableMetrics | bool | false | Whether to enable metrics for the indexer. |
metricsLoggingInterval | string | 0s | Interval for logging metrics (for example, "10s"). |
inMemoryConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
size | int | 1e8 | Maximum number of keys in the index. |
podCacheSize | int | 10 | Maximum number of pod entries per key. |
redisConfig / valkeyConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
address | string | "redis://127.0.0.1:6379" | Address of the Redis/Valkey server. |
backendType | string | "redis" | Backend type ("redis" or "valkey"). |
enableRDMA | bool | false | Enable RDMA (experimental, Valkey only). |
costAwareMemoryConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
size | string | "2GiB" | Maximum memory size (for example "2GiB", "500MiB"). |
tokenizersPoolConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
modelName | string | - | Base model name for the tokenizer. (Required) |
workersCount | int | 5 | Number of concurrent tokenizer workers. |
hf | Object | - | Configuration for HuggingFace tokenizer. |
local | Object | - | Configuration for local tokenizer. |
uds | Object | - | Configuration for UDS-based tokenizer. |
hf (HuggingFace Tokenizer)​
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable HuggingFace tokenizer. |
huggingFaceToken | string | "" | HuggingFace API token. |
tokenizersCacheDir | string | bin | Directory to cache downloaded tokenizers. |
tokenizer | string | "" | Specific tokenizer to use (defaults to model name). |
tokenizerMode | string | "auto" | Tokenizer mode. One of "auto", "hf", "slow", "mistral", "deepseek_v32". |
tokenizerRevision | string | "" | Revision of the tokenizer. |
local (Local Tokenizer)​
| Parameter | Type | Default | Description |
|---|---|---|---|
autoDiscoveryDir | string | /mnt/models | Directory to search for tokenizers. |
autoDiscoveryTokenizerFileName | string | tokenizer.json | Filename to search for. |
modelTokenizerMap | map[string]string | - | Manual mapping of model names to tokenizer paths. |
tokenizer | string | "" | Specific tokenizer to use (defaults to model name). |
tokenizerMode | string | "auto" | Tokenizer mode. One of "auto", "hf", "slow", "mistral", "deepseek_v32". |
tokenizerRevision | string | "" | Revision of the tokenizer. |
uds (UDS Tokenizer)​
| Parameter | Type | Default | Description |
|---|---|---|---|
socketFile | string | /tmp/tokenizer/tokenizer-uds.socket | Path to the UDS socket file. |
useTCP | bool | false | Use TCP instead of Unix domain socket. |
modelTokenizerMap | map[string]string | - | Manual mapping of model names to tokenizer paths. |
kvEventsConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
zmqEndpoint | string | - | ZMQ endpoint to connect to (for example tcp://indexer:5557). |
topicFilter | string | "kv@" | ZMQ topic filter subscription. |
concurrency | int | 4 | Number of event processing workers. |
discoverPods | bool | true | Enable automatic pod discovery. |
podDiscoveryConfig | Object | - | Configuration for pod discovery. |
podDiscoveryConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
podNamespace | string | "" | Namespace to watch pods in (empty = all). |
socketPort | int | 5557 | Port where pods expose their ZMQ socket. |
prefix-cache-scorer​
Scores pods based on the length of an approximate prefix match against recent requests, using an in-process LRU indexer. Lighter-weight than precise-prefix-cache-scorer because it does not need a tokenizer or KV-cache event subscription.
| Parameter | Type | Default | Description |
|---|---|---|---|
autoTune | bool | true | Automatically tunes blockSizeTokens, maxPrefixBlocksToMatch, and lruCapacityPerServer based on observed model server metrics. |
blockSizeTokens | int | 16 | Number of tokens per hash block. Requests shorter than one block are ignored. |
blockSize | int | 0 | Deprecated. Legacy block size expressed in characters. Setting only blockSize (with blockSizeTokens left unset) fails initialization. Prefer blockSizeTokens. |
maxPrefixBlocksToMatch | int | 256 | Maximum number of prefix blocks to match. Longer prefixes are truncated at this limit. |
lruCapacityPerServer | int | 31250 | LRU indexer capacity per model server (in blocks). |
session-affinity-scorer​
Routes subsequent requests in a session to the same pod as the first request. Relies on the x-session-token HTTP header to maintain affinity:
- Response: When a request is served, the plugin sets the
x-session-tokenheader on the response. The value is the Base64-encoded name of the serving pod. - Request: For subsequent requests, the client includes this
x-session-tokenheader. The scorer decodes it to identify the target pod and assigns it a high score.
No parameters.
kv-cache-utilization-scorer​
Scores pods based on their KV cache utilization (lower utilization yields a higher score).
No parameters.
lora-affinity-scorer​
Scores pods based on LoRA adapter availability and capacity.
No parameters.
queue-scorer​
Scores pods based on their waiting queue size (smaller queue yields a higher score).
No parameters.
running-requests-size-scorer​
Scores pods based on the number of running requests.
No parameters.
context-length-aware​
Scores pods based on how well their context-length range matches the estimated token count of the request. Pods with a matching range receive higher scores. Also functions as a filter when enableFiltering is enabled.
| Parameter | Type | Default | Description |
|---|---|---|---|
label | string | "mif.moreh.io/context-length-range" | Pod label whose value specifies context length ranges (format: "min-max", comma-separated for multiple). |
enableFiltering | bool | false | Whether to also filter out pods that do not match the request's context length. |
predicted-latency-scorer​
Advanced scorer that predicts TTFT (time-to-first-token) and TPOT (time-per-output-token) per pod using an online running-request model, then scores pods so the request is routed to the pod most likely to meet its latency SLO. Emits per-pod latency metrics.
This scorer has a large parameter surface (20+ fields covering sampling, headroom weights, affinity gates, and selection strategies). Most deployments should leave every field at its default. For the complete parameter list, refer to pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/scorer.go in the moreh-dev/heimdall-inference-extension repository, and tune only after establishing a baseline.
Pickers​
max-score-picker​
Picks the pod(s) with the maximum score from the list of candidates.
| Parameter | Type | Default | Description |
|---|---|---|---|
maxNumOfEndpoints | int | 1 | Maximum number of endpoints to pick. |
random-picker​
Picks random pod(s) from the candidates.
| Parameter | Type | Default | Description |
|---|---|---|---|
maxNumOfEndpoints | int | 1 | Maximum number of endpoints to pick. |
weighted-random-picker​
Picks pod(s) based on weighted random sampling (A-Res algorithm) derived from their scores.
| Parameter | Type | Default | Description |
|---|---|---|---|
maxNumOfEndpoints | int | 1 | Maximum number of endpoints to pick. |
Pre-request handlers​
disagg-headers-handler​
Publishes the endpoints selected by disagg-profile-handler or the legacy pd-profile-handler as request headers, so the decode pod can reach prefill / encode pods:
mif-prefill-endpoint— host:port of the prefill endpoint, when prefill ran.mif-encode-endpoints— comma-separated host:port list of the encode endpoints, when encode ran.
| Parameter | Type | Default | Description |
|---|---|---|---|
prefillProfile | string | "prefill" | Name of the SchedulingProfile whose result provides the prefill endpoint. |
encodeProfile | string | "encode" | Name of the SchedulingProfile whose result provides the encode endpoint list. |
prefill-header-handler is kept as a legacy alias that resolves to this same plugin (both names share DisaggHeadersHandlerFactory). Existing heimdall-values.yaml files that reference prefill-header-handler continue to work.
Prepare-data plugins​
tokenizer​
Runs a tokenizer on each incoming request and stores the tokenized prompt on the request so downstream plugins (for example precise-prefix-cache-scorer, disagg-profile-handler's deciders) can reuse it without re-tokenizing. Fails open: if tokenization errors, the request continues with no tokenized prompt attached.
This plugin only activates when the prepareDataPlugins feature gate is enabled. Add featureGates: [prepareDataPlugins] to the top of your EndpointPickerConfig; otherwise the plugin registration is silently skipped.
| Parameter | Type | Default | Description |
|---|---|---|---|
modelName | string | - | Base model name for the tokenizer. (Required) |
udsTokenizerConfig | Object | (unset) | Unix domain socket tokenizer configuration. When unset, falls back to the in-process default tokenizer. |
udsTokenizerConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
socketFile | string | /tmp/tokenizer/tokenizer-uds.socket | Path to the tokenizer UDS socket. |
Example:
- type: tokenizer
parameters:
modelName: meta-llama/Llama-3.2-1B-Instruct
udsTokenizerConfig:
socketFile: /tmp/tokenizer/tokenizer-uds.socket
Response plugins​
Response plugins hook into the response lifecycle. They are invoked by the request-control layer in the following order:
- ResponseReceived — Called when response headers arrive from the model server, indicating the beginning of response handling.
- ResponseStreaming — Called after each chunk of a streaming response is sent.
- ResponseComplete — Called when the request lifecycle terminates (response fully sent, or request failed/disconnected after a pod was scheduled). This is the final cleanup hook.
response-header-handler​
Adds serving-pod information to the response headers. Implements the ResponseReceived extension point.
x-decoder-host-port: Always set to the address and port of the pod that handled the decode phase (the primary target).x-prefiller-host-port: Set to the address and port of the prefill pod, if a separate prefill pod was used (PD disaggregation).
No parameters.
When heimdall-proxy is deployed with --response-header, the proxy natively sets the same headers. In that case, this plugin is not needed.
Data layer plugins​
Data layer plugins feed pod-level signals (metrics, running model names, and so on) into the scheduler. They are declared in the top-level plugins list and wired together through the data field of EndpointPickerConfig: a DataLayerSource references a source plugin via pluginRef and attaches a list of extractor plugins.
models-data-source​
Polls each pod's /v1/models endpoint (or a configurable path) to discover which models are currently being served.
| Parameter | Type | Default | Description |
|---|---|---|---|
scheme | string | "http" | URL scheme used to reach the pod ("http" or "https"). |
path | string | "/v1/models" | URL path of the models endpoint. |
insecureSkipVerify | bool | true | Skip TLS certificate verification on the pod connection. |
model-server-protocol-models​
Extracts the list of running model identifiers from a models-data-source and publishes them on the pod's data-layer record, where downstream plugins can read them.
No parameters.
Store plugins​
responses-store​
Persists multi-turn conversation state for the OpenAI Responses API (/v1/responses with previous_response_id). Exposes PrepareDataPlugin (to look up prior responses on request), ResponseStreaming (to accumulate streamed chunks), and ResponseComplete (to commit the final response to the store).
This plugin only activates when the prepareDataPlugins feature gate is enabled. Add featureGates: [prepareDataPlugins] to the top of your EndpointPickerConfig; otherwise the plugin registration is silently skipped.
Supported backends: in-memory or a Redis/Valkey-based tier with optional MongoDB tier-2 sync. Omit storeConfig entirely to use the default in-memory backend (ttl: 24h). When storeConfig is set, configure at least one of storeConfig.inMemoryConfig or storeConfig.tieredConfig; if both are set, tieredConfig takes precedence (Redis is required inside the tiered backend; MongoDB is optional).
| Parameter | Type | Default | Description |
|---|---|---|---|
storeConfig | Object | in-memory backend, ttl: 24h | Backend selection and configuration. When omitted, Heimdall uses an in-memory backend with a 24-hour TTL. |
Example (in-memory backend):
- type: responses-store
parameters:
storeConfig:
inMemoryConfig:
ttl: 24h
maxEntries: 10000
maxEntryBytes: 1048576
Example (tiered Redis + MongoDB backend):
- type: responses-store
parameters:
storeConfig:
tieredConfig:
redis:
address: redis://redis.responses-store.svc:6379
ttl: 24h
mongo:
uri: mongodb://mongo.responses-store.svc:27017
database: heimdall
collection: responses
ttl: 720h
stream:
key: heimdall:responses:mongo_sync
consumerGroup: mongo-sync
batchSize: 100
blockTimeout: 1s
claimAge: 30s
storeConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
inMemoryConfig | Object | - | Configuration for the in-memory backend. Used when tieredConfig is not set; if both are configured, tieredConfig takes precedence. |
tieredConfig | Object | - | Configuration for the Redis/Valkey + optional MongoDB tiered backend. Takes precedence over inMemoryConfig when both are configured. At least one of the two backends must be set. |
inMemoryConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
ttl | string | "24h" | Go duration string. TTL applied to entries. |
maxEntries | int | 10000 | Maximum number of entries retained in memory. |
maxEntryBytes | int | 1048576 | Maximum size in bytes for a single entry. |
tieredConfig​
| Parameter | Type | Default | Description |
|---|---|---|---|
redis | Object | - | Required Redis/Valkey configuration. Used for tier-1 storage and stream coordination. |
mongo | Object | - | Optional MongoDB configuration. When set, enables tier-2 sync via the Redis stream consumer. |
stream | Object | - | Stream tuning for the Redis-to-Mongo sync goroutine. |
redis​
| Parameter | Type | Default | Description |
|---|---|---|---|
address | string | - | Standalone redis:///valkey:// URL. Mutually exclusive with addresses. |
addresses | []string | - | Host:port entries. Combine with masterName for Sentinel mode; multiple bare entries select Cluster. |
masterName | string | "" | Sentinel master name. Required for Sentinel mode. |
username | string | "" | Username used to authenticate to Redis/Valkey. |
password | string | "" | Password used to authenticate to Redis/Valkey. |
db | int | 0 | Database index. |
ttl | string | "24h" | Go duration string. TTL applied to entries stored in Redis/Valkey. |
maxEntryBytes | int | 1048576 | Maximum size in bytes for a single entry. |
mongo​
| Parameter | Type | Default | Description |
|---|---|---|---|
uri | string | - | MongoDB connection URI. |
database | string | "heimdall" | Database name. |
collection | string | "responses" | Collection name. |
ttl | string | "720h" | Go duration string. TTL applied to entries. |
timeout | string | "500ms" | Go duration string. Per-operation timeout. |
stream​
| Parameter | Type | Default | Description |
|---|---|---|---|
key | string | "heimdall:responses:mongo_sync" | Redis stream key used to buffer MongoDB syncs. |
consumerGroup | string | "mongo-sync" | Stream consumer group name. |
maxLen | int64 | 1000000 | Maximum stream length retained. |
batchSize | int | 100 | Number of entries claimed per batch. |
blockTimeout | string | "1s" | Go duration string. Block timeout when reading. |
claimAge | string | "30s" | Go duration string. Minimum age to re-claim entries. |