inference.networking.k8s.io/v1​
InferencePool​
kubectl explain --api-version inference.networking.k8s.io/v1 inferencepools
| Field | Type | Description |
|---|
apiVersion | string | APIVersion defines the versioned schema of this representation of an object. |
kind | string | Kind is a string value representing the REST resource this object represents. |
metadata | object | Standard object's metadata. |
spec | InferencePoolSpec | Specification of the desired behavior of the InferencePool. |
InferencePoolSpec​
kubectl explain --api-version inference.networking.k8s.io/v1 inferencepools.spec
| Field | Type | Description |
|---|
endpointPickerRef | EndpointPickerRef | Reference to the EndpointPicker. |
selector | LabelSelector | Selects the pods that belong to the inference pool. |
targetPorts | []TargetPort | List of ports exposed by the inference pool. |
EndpointPickerRef​
kubectl explain --api-version inference.networking.k8s.io/v1 inferencepools.spec.endpointPickerRef
| Field | Type | Description |
|---|
failureMode | string | FailureMode configures how the parent handles the case when the Endpoint Picker extension is non-responsive. Defaults to "FailClose". |
group | string | Group is the group of the referent API object. Defaults to "". |
kind | string | Kind is the Kubernetes resource kind of the referent. Defaults to "Service". |
name | string | Name is the name of the referent API object. Required. |
port | Port | Port is the port of the Endpoint Picker extension service. |
| Field | Type | Description |
|---|
number | integer | Number defines the port number to access the selected model server Pods. |
LabelSelector​
| Field | Type | Description |
|---|
matchLabels | map[string]string | matchLabels is a map of {key,value} pairs. |
TargetPort​
| Field | Type | Description |
|---|
number | integer | Number of the port. |
inference.networking.x-k8s.io/v1alpha1​
EndpointPickerConfig​
The Heimdall Endpoint Picker is configured with a YAML document that parses into the EndpointPickerConfig type. The configuration loader uses strict decoding, so unknown or misspelled fields cause startup to fail — copy exact field names from the tables below.
| Field | Type | Required | Description |
|---|
plugins | []PluginSpec | Yes | List of plugins that will be instantiated. See Plugins for more details. |
schedulingProfiles | []SchedulingProfile | Yes | Named SchedulingProfile entries created from the plugin list. |
featureGates | []string | No | Experimental features to enable. Currently recognized values: "dataLayer", "prepareDataPlugins", "flowControl". See Feature gates for what each one unlocks. |
saturationDetector | SaturationDetector | No | Configuration of the saturation detector. When omitted, default values are used. |
data | DataLayerConfig | No | Configures the DataLayer. Required when the experimental DataLayer feature gate is enabled. |
flowControl | FlowControlConfig | No | Configures the Flow Control layer. Only respected when the flowControl feature gate is enabled. |
PluginSpec​
For more details on available plugins, see Plugins.
| Field | Type | Required | Description |
|---|
type | string | Yes | Plugin type string (for example "pd-profile-handler", "queue-scorer"). |
name | string | No | Unique name for this plugin instance. When omitted, the value of type is used. Referenced by pluginRef elsewhere. |
parameters | object | No | Parameters passed to the plugin's factory function. The schema is defined per plugin type. |
SchedulingProfile​
A SchedulingProfile executes its plugins in order: Filters, Scorers, then Picker. Only plugins that implement the Filter, Scorer, or Picker interface can be referenced here. Profile handlers operate at the top level and are not part of individual profiles.
| Field | Type | Required | Description |
|---|
name | string | Yes | Name of this SchedulingProfile. |
plugins | []SchedulingPlugin | Yes | Ordered list of plugins. Slots (filter / scorer / picker) are assigned by plugin type. |
SchedulingPlugin​
| Field | Type | Required | Description |
|---|
pluginRef | string | Yes | Name of a plugin from the top-level plugins list. The plugin must implement Filter, Scorer, or Picker. |
weight | float | No | Weight for Scorer plugins, controlling relative influence when aggregating scores. Omit (or leave unset) to use a weight of 1. Ignored for Filters and Pickers. |
SaturationDetector​
Saturation detectors judge whether a pod has capacity for more requests. Two types are supported, selected via type. Fields below the type row apply only to the named detector.
| Field | Type | Applies to | Description |
|---|
type | string | All | Detector implementation: "utilization" (default) or "running-requests". |
kvCacheUtilThreshold | float | utilization | KV cache utilization (0.0 to 1.0) above which a pod is considered saturated. |
queueDepthThreshold | integer | utilization | Backend waiting-queue size above which a pod is considered saturated. |
metricsStalenessThreshold | string | utilization | Go duration string. Pods whose metrics are older than this may be excluded from capacity decisions. |
maxConcurrencyPerPod | integer | running-requests | Ideal per-pod request concurrency target. |
cacheTTL | string | running-requests | Go duration string. How long to cache the computed saturation value before re-reading the Prometheus registry. |
DataLayerConfig​
| Field | Type | Required | Description |
|---|
sources | []DataLayerSource | Yes | List of sources to feed the DataLayer. |
DataLayerSource​
| Field | Type | Required | Description |
|---|
pluginRef | string | Yes | Name of the source plugin (from the top-level plugins list). |
extractors | []DataLayerExtractor | Yes | Extractor plugins attached to this source. |
| Field | Type | Required | Description |
|---|
pluginRef | string | Yes | Name of an extractor plugin (from the top-level plugins list). |
FlowControlConfig​
Controls the Flow Control layer, which manages per-priority admission and global byte capacity. Only respected when the flowControl feature gate is enabled.
| Field | Type | Required | Description |
|---|
maxBytes | integer | No | Global byte capacity across all priority bands. Requests over this limit are rejected even if their band has capacity. Omit or set to 0 for unlimited. |
defaultRequestTTL | string | No | Go duration string. Fallback TTL for requests that do not specify their own deadline. Omit to let requests wait until the client cancels. |
defaultPriorityBand | PriorityBandConfig | No | Template policy applied to priority levels that are not explicitly listed in priorityBands. Omit to use system defaults. |
priorityBands | []PriorityBandConfig | No | Explicit policies per priority level. Priorities not listed fall back to defaultPriorityBand (or the system template). |
PriorityBandConfig​
| Field | Type | Required | Description |
|---|
priority | integer | Yes | Integer priority level. Higher values indicate higher priority. |
maxBytes | integer | No | Byte capacity for this band. Omit (or set to 0) to use the system default of 1 GiB. |
Feature gates​
Set the experimental features to enable through the top-level featureGates list. Each gate unlocks a specific subsystem that is otherwise inert:
| Gate | What it enables |
|---|
dataLayer | Activates the DataLayer subsystem. Required for the data: section to have any effect and for the models-data-source / model-server-protocol-models plugins to run. |
prepareDataPlugins | Activates the PrepareDataPlugin interface. Required by the tokenizer and responses-store plugins. |
flowControl | Activates the Flow Control subsystem. Required for the flowControl: section to have any effect. |