Skip to main content
Version: v0.0.0

Getting Started

Set up your environment and run your first inference with MIF.

📄️ Overview

MoAI Inference Framework is designed to enable efficient and automated distributed inference on cluster systems and Kubernetes environments. It supports a wide range of distributed inference techniques — such as prefill-decode disaggregation, expert parallelism, and prefix-cache-aware routing. Leveraging its unique cost model, it automatically identifies, applies, and dynamically adjusts the optimal way to utilize various accelerators so as to meet the defined service level objectives (SLOs). All of these capabilities are seamlessly integrated not only for NVIDIA GPUs but also for other accelerators, especially AMD GPUs.