Skip to main content
Version: Dev 🚧

Heimdall API Reference

inference.networking.k8s.io/v1​

InferencePool​

kubectl explain --api-version inference.networking.k8s.io/v1 inferencepools
FieldTypeDescription
apiVersionstringAPIVersion defines the versioned schema of this representation of an object.
kindstringKind is a string value representing the REST resource this object represents.
metadataobjectStandard object's metadata.
specInferencePoolSpecSpecification of the desired behavior of the InferencePool.

InferencePoolSpec​

kubectl explain --api-version inference.networking.k8s.io/v1 inferencepools.spec
FieldTypeDescription
endpointPickerRefEndpointPickerRefReference to the EndpointPicker.
selectorLabelSelectorSelects the pods that belong to the inference pool.
targetPorts[]TargetPortList of ports exposed by the inference pool.

EndpointPickerRef​

kubectl explain --api-version inference.networking.k8s.io/v1 inferencepools.spec.endpointPickerRef
FieldTypeDescription
failureModestringFailureMode configures how the parent handles the case when the Endpoint Picker extension is non-responsive. Defaults to "FailClose".
groupstringGroup is the group of the referent API object. Defaults to "".
kindstringKind is the Kubernetes resource kind of the referent. Defaults to "Service".
namestringName is the name of the referent API object. Required.
portPortPort is the port of the Endpoint Picker extension service.

Port​

FieldTypeDescription
numberintegerNumber defines the port number to access the selected model server Pods.

LabelSelector​

FieldTypeDescription
matchLabelsmap[string]stringmatchLabels is a map of {key,value} pairs.

TargetPort​

FieldTypeDescription
numberintegerNumber of the port.

inference.networking.x-k8s.io/v1alpha1​

EndpointPickerConfig​

The Heimdall Endpoint Picker is configured with a YAML document that parses into the EndpointPickerConfig type. The configuration loader uses strict decoding, so unknown or misspelled fields cause startup to fail — copy exact field names from the tables below.

FieldTypeRequiredDescription
plugins[]PluginSpecYesList of plugins that will be instantiated. See Plugins for more details.
schedulingProfiles[]SchedulingProfileYesNamed SchedulingProfile entries created from the plugin list.
featureGates[]stringNoExperimental features to enable. Currently recognized values: "dataLayer", "prepareDataPlugins", "flowControl". See Feature gates for what each one unlocks.
saturationDetectorSaturationDetectorNoConfiguration of the saturation detector. When omitted, default values are used.
dataDataLayerConfigNoConfigures the DataLayer. Required when the experimental DataLayer feature gate is enabled.
flowControlFlowControlConfigNoConfigures the Flow Control layer. Only respected when the flowControl feature gate is enabled.

PluginSpec​

For more details on available plugins, see Plugins.

FieldTypeRequiredDescription
typestringYesPlugin type string (for example "pd-profile-handler", "queue-scorer").
namestringNoUnique name for this plugin instance. When omitted, the value of type is used. Referenced by pluginRef elsewhere.
parametersobjectNoParameters passed to the plugin's factory function. The schema is defined per plugin type.

SchedulingProfile​

A SchedulingProfile executes its plugins in order: Filters, Scorers, then Picker. Only plugins that implement the Filter, Scorer, or Picker interface can be referenced here. Profile handlers operate at the top level and are not part of individual profiles.

FieldTypeRequiredDescription
namestringYesName of this SchedulingProfile.
plugins[]SchedulingPluginYesOrdered list of plugins. Slots (filter / scorer / picker) are assigned by plugin type.

SchedulingPlugin​

FieldTypeRequiredDescription
pluginRefstringYesName of a plugin from the top-level plugins list. The plugin must implement Filter, Scorer, or Picker.
weightfloatNoWeight for Scorer plugins, controlling relative influence when aggregating scores. Omit (or leave unset) to use a weight of 1. Ignored for Filters and Pickers.

SaturationDetector​

Saturation detectors judge whether a pod has capacity for more requests. Two types are supported, selected via type. Fields below the type row apply only to the named detector.

FieldTypeApplies toDescription
typestringAllDetector implementation: "utilization" (default) or "running-requests".
kvCacheUtilThresholdfloatutilizationKV cache utilization (0.0 to 1.0) above which a pod is considered saturated.
queueDepthThresholdintegerutilizationBackend waiting-queue size above which a pod is considered saturated.
metricsStalenessThresholdstringutilizationGo duration string. Pods whose metrics are older than this may be excluded from capacity decisions.
maxConcurrencyPerPodintegerrunning-requestsIdeal per-pod request concurrency target.
cacheTTLstringrunning-requestsGo duration string. How long to cache the computed saturation value before re-reading the Prometheus registry.

DataLayerConfig​

FieldTypeRequiredDescription
sources[]DataLayerSourceYesList of sources to feed the DataLayer.

DataLayerSource​

FieldTypeRequiredDescription
pluginRefstringYesName of the source plugin (from the top-level plugins list).
extractors[]DataLayerExtractorYesExtractor plugins attached to this source.

DataLayerExtractor​

FieldTypeRequiredDescription
pluginRefstringYesName of an extractor plugin (from the top-level plugins list).

FlowControlConfig​

Controls the Flow Control layer, which manages per-priority admission and global byte capacity. Only respected when the flowControl feature gate is enabled.

FieldTypeRequiredDescription
maxBytesintegerNoGlobal byte capacity across all priority bands. Requests over this limit are rejected even if their band has capacity. Omit or set to 0 for unlimited.
defaultRequestTTLstringNoGo duration string. Fallback TTL for requests that do not specify their own deadline. Omit to let requests wait until the client cancels.
defaultPriorityBandPriorityBandConfigNoTemplate policy applied to priority levels that are not explicitly listed in priorityBands. Omit to use system defaults.
priorityBands[]PriorityBandConfigNoExplicit policies per priority level. Priorities not listed fall back to defaultPriorityBand (or the system template).

PriorityBandConfig​

FieldTypeRequiredDescription
priorityintegerYesInteger priority level. Higher values indicate higher priority.
maxBytesintegerNoByte capacity for this band. Omit (or set to 0) to use the system default of 1 GiB.

Feature gates​

Set the experimental features to enable through the top-level featureGates list. Each gate unlocks a specific subsystem that is otherwise inert:

GateWhat it enables
dataLayerActivates the DataLayer subsystem. Required for the data: section to have any effect and for the models-data-source / model-server-protocol-models plugins to run.
prepareDataPluginsActivates the PrepareDataPlugin interface. Required by the tokenizer and responses-store plugins.
flowControlActivates the Flow Control subsystem. Required for the flowControl: section to have any effect.