Heimdall API Reference
inference.networking.k8s.io/v1​
InferencePool​
kubectl explain --api-version inference.networking.k8s.io/v1 inferencepools
| Field | Type | Description |
|---|---|---|
apiVersion | string | APIVersion defines the versioned schema of this representation of an object. |
kind | string | Kind is a string value representing the REST resource this object represents. |
metadata | object | Standard object's metadata. |
spec | InferencePoolSpec | Specification of the desired behavior of the InferencePool. |
InferencePoolSpec​
kubectl explain --api-version inference.networking.k8s.io/v1 inferencepools.spec
| Field | Type | Description |
|---|---|---|
endpointPickerRef | EndpointPickerRef | Reference to the EndpointPicker. |
selector | LabelSelector | Selects the pods that belong to the inference pool. |
targetPorts | []TargetPort | List of ports exposed by the inference pool. |
EndpointPickerRef​
kubectl explain --api-version inference.networking.k8s.io/v1 inferencepools.spec.endpointPickerRef
| Field | Type | Description |
|---|---|---|
failureMode | string | FailureMode configures how the parent handles the case when the Endpoint Picker extension is non-responsive. Defaults to "FailClose". |
group | string | Group is the group of the referent API object. Defaults to "". |
kind | string | Kind is the Kubernetes resource kind of the referent. Defaults to "Service". |
name | string | Name is the name of the referent API object. Required. |
port | Port | Port is the port of the Endpoint Picker extension service. |
Port​
| Field | Type | Description |
|---|---|---|
number | integer | Number defines the port number to access the selected model server Pods. |
LabelSelector​
| Field | Type | Description |
|---|---|---|
matchLabels | map[string]string | matchLabels is a map of {key,value} pairs. |
TargetPort​
| Field | Type | Description |
|---|---|---|
number | integer | Number of the port. |
inference.networking.k8s-x.io/v1alpha1​
EndpointPickerConfig​
| Field | Type | Description |
|---|---|---|
data | DataLayerConfig | Data configures the DataLayer. It is required if the new DataLayer is enabled. |
featureGates | []string | FeatureGates is a set of flags that enable various experimental features with the EPP. |
plugins | []PluginSpec | Plugins is the list of plugins that will be instantiated. See Plugins for more details. |
saturationDetector | SaturationDetector | SaturationDetector when present specifies the configuration of the Saturation detector. |
schedulingProfiles | []SchedulingProfile | SchedulingProfiles is the list of named SchedulingProfiles that will be created. |
PluginSpec​
For more details on available plugins, see Plugins.
| Field | Type | Description |
|---|---|---|
name | string | Name provides a name for plugin entries to reference. If omitted, the value of the Plugin's Type field will be used. |
parameters | object | Parameters are the set of parameters to be passed to the plugin's factory function. |
type | string | Type specifies the plugin type to be instantiated. |
SchedulingProfile​
A SchedulingProfile executes its plugins in order: Filters, Scorers, then Picker. Only plugins that implement the Filter, Scorer, or Picker interface can be referenced here. Profile handlers operate at the top level and are not part of individual profiles.
| Field | Type | Description |
|---|---|---|
name | string | Name specifies the name of this SchedulingProfile. |
plugins | []SchedulingPlugin | Plugins is the list of plugins for this SchedulingProfile. They are assigned to the appropriate "slots" based on their type. |
SchedulingPlugin​
| Field | Type | Description |
|---|---|---|
pluginRef | string | References a plugin from the top-level plugins list. The plugin must implement Filter, Scorer, or Picker. |
weight | integer | Optional weight for Scorer plugins, controlling relative influence when aggregating scores. Defaults to 1 if omitted. Ignored for Filters and Pickers. |
SaturationDetector​
| Field | Type | Description |
|---|---|---|
kvCacheUtilThreshold | float | KVCacheUtilThreshold defines the KV cache utilization (0.0 to 1.0) above which a pod is considered to have insufficient capacity. |
metricsStalenessThreshold | string | MetricsStalenessThreshold defines how old a pod's metrics can be. |
queueDepthThreshold | integer | QueueDepthThreshold defines the backend waiting queue size above which a pod is considered to have insufficient capacity for new requests. |
DataLayerConfig​
| Field | Type | Description |
|---|---|---|
sources | []DataLayerSource | Sources is the list of sources to define to the DataLayer. |
DataLayerSource​
| Field | Type | Description |
|---|---|---|
extractors | []DataLayerExtractor | Extractors specifies the list of Plugin instances to be associated with this Source. |
pluginRef | string | PluginRef specifies a partiular Plugin instance to be associated with this Source. |
DataLayerExtractor​
| Field | Type | Description |
|---|---|---|
pluginRef | string | PluginRef specifies a partiular Plugin instance to be associated with this Extractor. |