Skip to main content
Version: v0.4.0

Odin API Reference

odin.moreh.io/v1alpha1​

InferenceService​

kubectl explain --api-version odin.moreh.io/v1alpha1 inferenceservice
FieldTypeDescription
apiVersionstringAPIVersion defines the versioned schema of this representation of an object.
kindstringKind is a string value representing the REST resource this object represents.
metadataobjectStandard object's metadata.
specInferenceServiceSpecSpecification of the desired behavior of the InferenceService.
statusInferenceServiceStatusMost recently observed status of the InferenceService.

InferenceServiceSpec​

kubectl explain --api-version odin.moreh.io/v1alpha1 inferenceservice.spec
FieldTypeDescription
frameworkstringFramework specifies the inference engine. Enum: vllm, sglang.
inferencePoolRefs[]LocalObjectReferenceInferencePoolRefs is a list of references to InferencePools.
modelModelSpecModel identifies the model to serve.
parallelismParallelismParallelism defines the parallelism parameters for distributed inference.
replicasintegerNumber of replicas for Deployments or LeaderWorkerSets. Default is 1.
serviceServiceSpecService defines configuration for the Kubernetes Service associated with the InferenceService.
templatePodTemplateSpecTemplate describes the pod template for Deployment or LeaderWorkerSet leader.
templateRefs[]TemplateReferenceTemplateRefs is a list of references to InferenceServiceTemplates.
workerTemplatePodTemplateSpecWorkerTemplate describes the pod template for LeaderWorkerSet workers.

LocalObjectReference​

FieldTypeDescription
namestringName is the name of the referent. Required.

TemplateReference​

kubectl explain --api-version odin.moreh.io/v1alpha1 inferenceservice.spec.templateRefs
FieldTypeDescription
groupstringGroup is the group of the referent.
kindstringKind is the kind of the referent.
namestringName is the name of the referent. Required.

Parallelism​

kubectl explain --api-version odin.moreh.io/v1alpha1 inferenceservice.spec.parallelism
FieldTypeDescription
dataintegerData parallelism size.
dataLocalintegerDataLocal data local parallelism size.
dataRPCPortintegerDataRPCPort is the data parallelism RPC port.
expertbooleanExpert enables expert parallelism.
pipelineintegerPipeline parallelism size.
tensorintegerTensor parallelism size.

ModelSpec​

kubectl explain --api-version odin.moreh.io/v1alpha1 inferenceservice.spec.model
FieldTypeDescription
namestringName is the model identifier (e.g. HuggingFace model ID). Required.

ServiceSpec​

kubectl explain --api-version odin.moreh.io/v1alpha1 inferenceservice.spec.service
FieldTypeDescription
targetstringSpecifies which pods receive the mif.moreh.io/inferenceservice label for Service endpoint selection. For LeaderWorkerSet: leader, workers, all, or auto. For Deployment: target is ignored. Default: auto.

InferenceServiceStatus​

kubectl explain --api-version odin.moreh.io/v1alpha1 inferenceservice.status
FieldTypeDescription
replicasintegerTotal number of groups that have been created (updated or not, ready or not).
updatedReplicasintegerNumber of groups that have been updated (ready or not).
readyReplicasintegerNumber of groups that are in a ready state (updated or not).
hpaPodSelectorstringPod selector string for HPA to identify pods belonging to this InferenceService.
renderedSpecInferenceServiceSpecFully resolved InferenceServiceSpec after merging templateRefs and template rendering.
conditions[]ConditionConditions represent the latest available observations of an object's state.

InferenceServiceTemplate​

kubectl explain --api-version odin.moreh.io/v1alpha1 inferenceservicetemplate
FieldTypeDescription
apiVersionstringAPIVersion defines the versioned schema of this representation of an object.
kindstringKind is a string value representing the REST resource this object represents.
metadataobjectStandard object's metadata.
specInferenceServiceTemplateSpecSpecification of the desired behavior of the InferenceServiceTemplate.

InferenceServiceTemplateSpec​

kubectl explain --api-version odin.moreh.io/v1alpha1 inferenceservicetemplate.spec
FieldTypeDescription
frameworkstringFramework specifies the inference engine. Enum: vllm, sglang.
modelModelSpecModel identifies the model to serve.
parallelismParallelismParallelism defines the parallelism parameters for distributed inference.
serviceServiceSpecService defines configuration for the Kubernetes Service associated with the InferenceService.
templatePodTemplateSpecTemplate describes the pod template for Deployment or LeaderWorkerSet leader.
workerTemplatePodTemplateSpecWorkerTemplate describes the pod template for LeaderWorkerSet workers.