<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://test-docs.moreh.io/blog/</id>
    <title>Moreh Blog</title>
    <updated>2025-11-11T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://test-docs.moreh.io/blog/"/>
    <subtitle>Moreh Blog</subtitle>
    <icon>https://test-docs.moreh.io/moreh-icon.png</icon>
    <entry>
        <title type="html"><![CDATA[DeepSeek R1 671B on AMD MI300X GPUs: Maximum throughput]]></title>
        <id>https://test-docs.moreh.io/blog/2025/11/11/deepseek-r1-671b-on-amd-mi300x-gpus-maximum-throughput/</id>
        <link href="https://test-docs.moreh.io/blog/2025/11/11/deepseek-r1-671b-on-amd-mi300x-gpus-maximum-throughput/"/>
        <updated>2025-11-11T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[This article presents the performance evaluation method and results of DeepSeek R1 671B inference on 5x AMD MI300X servers (40 GPUs in total).]]></summary>
        <content type="html"><![CDATA[<p>This article presents the performance evaluation method and results of <strong>DeepSeek R1 671B</strong> inference on 5x AMD MI300X servers (40 GPUs in total).</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="overview">Overview<a href="https://test-docs.moreh.io/blog/2025/11/11/deepseek-r1-671b-on-amd-mi300x-gpus-maximum-throughput/#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p>The purpose of this benchmarking is to measure the maximum throughput (output tokens/sec) achievable when running distributed inference of the DeepSeek R1 671B model on a 5-node AMD MI300X GPU cluster. This metric directly determines the cost efficiency of inference service (tokens/$). This benchmarking demonstrates three key points:</p>
<ul>
<li class="">The benchmarking evaluates a distributed inference system operating at the AMD GPU cluster level <strong>in real deployments</strong>, which efficiently handles high-concurrency requests via prefill-decode disaggregation and expert parallelism.</li>
<li class="">MoAI Inference Framework delivers industry-leading throughput on AMD MI300X GPU clusters, which enables lower cost-per-token ($/token) configurations.</li>
<li class="">MoAI Inference Framework achieves throughput on AMD MI300X GPU clusters that is on par with what is attainable on NVIDIA H100 GPU clusters.</li>
</ul>
<p>The experimental methodology was largely designed by referring to the following report from the SGLang team, which measures the performance of PD disaggregation and expert parallelism on an NVIDIA H100 GPU cluster. The key difference is that, while the SGLang team measures prefill-only and decode-only performance separately, this benchmarking integrates prefill and decode instances and measures performance in an end-to-end inference environment, which more accurately reflects real-world achievable performance.</p>
<ul>
<li class="">Reference: <a href="https://lmsys.org/blog/2025-05-05-large-scale-ep/" target="_blank" rel="noopener noreferrer" class="">Deploying DeepSeek with PD Disaggregation and Large-Scale Expert Parallelism on 96 H100 GPUs</a></li>
</ul>
<!-- -->
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="target-environment-and-configuration">Target environment and configuration<a href="https://test-docs.moreh.io/blog/2025/11/11/deepseek-r1-671b-on-amd-mi300x-gpus-maximum-throughput/#target-environment-and-configuration" class="hash-link" aria-label="Direct link to Target environment and configuration" title="Direct link to Target environment and configuration" translate="no">​</a></h2>
<table><thead><tr><th>Item</th><th>Description</th></tr></thead><tbody><tr><td>GPU servers</td><td>5x servers, each equipped with 8x AMD MI300X GPUs</td></tr><tr><td>Networking</td><td>InfiniBand HDR</td></tr><tr><td>Inference engine</td><td>Moreh vLLM (0.11.0rc2.moreh20251212)</td></tr><tr><td>Model</td><td><code>deepseek-ai/DeepSeek-R1</code></td></tr><tr><td>PD disaggregation</td><td>2x prefill, 3x decode instances</td></tr><tr><td>Parallelization</td><td>EP=8 + DP=8</td></tr></tbody></table>
<p>The specifications of each GPU server are as follows:</p>
<ul>
<li class="">CPU: 2x AMD EPYC 9474F 48-core 3.6 GHz</li>
<li class="">Main memory: 2,304 GB</li>
<li class="">GPU: 8x AMD Instinct MI300X OAM GPU 192 GB</li>
<li class="">Server: Gigabyte G593-ZX1-AAX1</li>
<li class="">Operating system: Ubuntu 22.04.4 LTS</li>
<li class="">ROCm version: 6.4.1</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="deployment">Deployment<a href="https://test-docs.moreh.io/blog/2025/11/11/deepseek-r1-671b-on-amd-mi300x-gpus-maximum-throughput/#deployment" class="hash-link" aria-label="Direct link to Deployment" title="Direct link to Deployment" translate="no">​</a></h2>
<p>Please make sure to install all <a class="" href="https://test-docs.moreh.io/docs/getting-started/prerequisites/">prerequisites</a> before starting this benchmarking. Also, please refer to the <a class="" href="https://test-docs.moreh.io/docs/getting-started/quickstart/">quickstart</a> to understand how to run the MoAI Inference Framework.</p>
<p>In this benchmarking, you need to deploy the <strong>Istio</strong> gateway, the <strong>Heimdall</strong> scheduler configured to specify the basic routing strategy for PD disaggregation, and the <strong>Odin</strong> inference service configured to run two prefill instances and three decode instances across five GPU servers using the <code>quickstart</code> preset for DeepSeek R1 with DP8+EP8 on AMD MI300X.</p>
<p>First, you need to have a namespace for deploying and running the components of the MoAI Inference Framework. In this guide, we assume the namespace is named <code>deepseek-r1-benchmark</code>.</p>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">kubectl create namespace deepseek-r1-benchmark</span><br></div></code></pre></div></div>
<p><strong>AWS credentials must be configured in this namespace to allow the container images of the MoAI Inference Framework to be downloaded</strong>. For details, refer to the "Amazon ECR token for Moreh's container image repository" section in the <a class="" href="https://test-docs.moreh.io/docs/getting-started/prerequisites/">prerequisites</a>.</p>
<p>Then, you can use the following configuration files for the components. Click to view their contents. <strong>You must store the DeepSeek-R1 model checkpoint on the host of every worker node and specify its path in the <code>deepseek-r1-decode.yaml</code> and <code>deepseek-r1-prefill.yaml</code> files</strong>. This path will be mounted to <code>/app/model/DeepSeek-R1</code> inside the pod and used to run the Moreh vLLM server.</p>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Istio gateway configuration (gateway.yaml)</summary><div><div class="collapsibleContent_i85q"><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockTitle_OeMC">gateway.yaml</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token key atrule">apiVersion</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> v1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">kind</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> ConfigMap</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">metadata</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> mif</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">gateway</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">infrastructure</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">data</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">service</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">|</span><span class="token scalar string" style="color:rgb(206, 145, 120)"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">    spec:</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">      type: ClusterIP</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">deployment</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">|</span><span class="token scalar string" style="color:rgb(206, 145, 120)"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">    spec:</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">      template:</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">        metadata:</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">          annotations:</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">            proxy.istio.io/config: |</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">              accessLogFile: /dev/stdout</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">              accessLogEncoding: JSON</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">        spec:</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">          containers:</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">            - name: istio-proxy</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">              resources:</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token scalar string" style="color:rgb(206, 145, 120)">                limits: null</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token punctuation" style="color:rgb(212, 212, 212)">---</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">apiVersion</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> gateway.networking.k8s.io/v1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">kind</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> Gateway</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">metadata</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> mif</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">spec</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">gatewayClassName</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> istio</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">infrastructure</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token key atrule">parametersRef</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">group</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">""</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">kind</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> ConfigMap</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> mif</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">gateway</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">infrastructure</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">listeners</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> http</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">protocol</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> HTTP</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">port</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">80</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">allowedRoutes</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token key atrule">namespaces</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">from</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> All</span><br></div></code></pre></div></div></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Heimdall scheduler configuration (heimdall-values.yaml)</summary><div><div class="collapsibleContent_i85q"><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockTitle_OeMC">heimdall-values.yaml</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token key atrule">global</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">imagePullSecrets</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> moreh</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">registry</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">config</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">apiVersion</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> inference.networking.x</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">k8s.io/v1alpha1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">kind</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> EndpointPickerConfig</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">plugins</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">type</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> pd</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">profile</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">handler</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">type</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> prefill</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">filter</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">type</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> decode</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">filter</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">type</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> active</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">request</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">scorer</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">parameters</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token key atrule">requestTimeout</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"20m"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">type</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> max</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">score</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">picker</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">schedulingProfiles</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> prefill</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">plugins</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">pluginRef</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> prefill</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">filter</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">pluginRef</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> active</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">request</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">scorer</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">weight</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">1</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">pluginRef</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> max</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">score</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">picker</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> decode</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">plugins</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">pluginRef</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> decode</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">filter</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">pluginRef</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> active</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">request</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">scorer</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">weight</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">1</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">pluginRef</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> max</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">score</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">picker</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">gateway</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> mif</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">gatewayClassName</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> istio</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">inferencePool</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">targetPorts</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">number</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">8000</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">number</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">8001</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">number</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">8002</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">number</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">8003</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">number</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">8004</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">number</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">8005</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">number</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">8006</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">number</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">8007</span><br></div></code></pre></div></div></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Odin decode inference service configuration (deepseek-r1-decode.yaml)</summary><div><div class="collapsibleContent_i85q"><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockTitle_OeMC">deepseek-r1-decode.yaml</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token key atrule">apiVersion</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> odin.moreh.io/v1alpha1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">kind</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> InferenceService</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">metadata</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> deepseek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">r1</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">decode</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">spec</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">replicas</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">3</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">inferencePoolRefs</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> heimdall</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">templateRefs</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">decode</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">dp</span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> quickstart</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">vllm</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">deepseek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">ai</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">deepseek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">r1</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">decode</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">amd</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">mi300x</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">dp8</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">moe</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">ep8</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">workerTemplate</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token key atrule">spec</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">containers</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> main</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">env</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> ISVC_MODEL_PATH</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">value</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> /app/model/DeepSeek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">R1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> HF_HUB_OFFLINE</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">value</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"1"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">resources</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token key atrule">limits</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">mellanox/hca</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"1"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token key atrule">requests</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">mellanox/hca</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"1"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">volumeMounts</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> dsr1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">mountPath</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> /app/model/DeepSeek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">R1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">readOnly</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token boolean important">false</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">volumes</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> dsr1</span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">hostPath</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token key atrule">path</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> /path/to/DeepSeek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">R1</span><br></div></code></pre></div></div></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Odin prefill inference service configuration (deepseek-r1-prefill.yaml)</summary><div><div class="collapsibleContent_i85q"><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockTitle_OeMC">deepseek-r1-prefill.yaml</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token key atrule">apiVersion</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> odin.moreh.io/v1alpha1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">kind</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> InferenceService</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">metadata</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> deepseek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">r1</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">prefill</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">spec</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">replicas</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">2</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">inferencePoolRefs</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> heimdall</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">templateRefs</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">prefill</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">dp</span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> quickstart</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">vllm</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">deepseek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">ai</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">deepseek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">r1</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">prefill</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">amd</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">mi300x</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">dp8</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">moe</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">ep8</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">workerTemplate</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token key atrule">spec</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">containers</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> main</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">env</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> ISVC_MODEL_PATH</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">value</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> /app/model/DeepSeek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">R1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> HF_HUB_OFFLINE</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">value</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"1"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">resources</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token key atrule">limits</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">mellanox/hca</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"1"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token key atrule">requests</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">mellanox/hca</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"1"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">volumeMounts</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> dsr1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">mountPath</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> /app/model/DeepSeek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">R1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">              </span><span class="token key atrule">readOnly</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token boolean important">false</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">volumes</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> dsr1</span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">hostPath</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token key atrule">path</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> /path/to/DeepSeek</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">R1</span><br></div></code></pre></div></div></div></div></details>
<p>Run the following commands to deploy and run the components.</p>
<p><strong>Istio gateway:</strong></p>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">kubectl apply </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-n</span><span class="token plain"> deepseek-r1-benchmark </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-f</span><span class="token plain"> gateway.yaml</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">kubectl get pod </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-n</span><span class="token plain"> deepseek-r1-benchmark </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-l</span><span class="token plain"> gateway.networking.k8s.io/gateway-name</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">mif</span><br></div></code></pre></div></div>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockTitle_OeMC">Expected output</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">NAME                         READY   STATUS    RESTARTS   AGE</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">mif-istio-584474ddd9-rt9p9   </span><span class="token number" style="color:rgb(181, 206, 168)">1</span><span class="token plain">/1     Running   </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain">          163m</span><br></div></code></pre></div></div>
<p><strong>Heimdall scheduler:</strong></p>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">helm upgrade </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-i</span><span class="token plain"> heimdall moreh/heimdall </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--version</span><span class="token plain"> v0.7.1 </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-n</span><span class="token plain"> deepseek-r1-benchmark </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-f</span><span class="token plain"> heimdall-values.yaml</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">kubectl get all </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-n</span><span class="token plain"> deepseek-r1-benchmark </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-l</span><span class="token plain"> app.kubernetes.io/instance</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">heimdall</span><br></div></code></pre></div></div>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockTitle_OeMC">Expected output</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">NAME                            READY   STATUS    RESTARTS   AGE</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">pod/heimdall-5576d4f48b-bgn4c   </span><span class="token number" style="color:rgb(181, 206, 168)">1</span><span class="token plain">/1     Running   </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain">          3d1h</span><br></div></code></pre></div></div>
<p><strong>Odin inference service:</strong></p>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">kubectl apply </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-n</span><span class="token plain"> deepseek-r1-benchmark </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-f</span><span class="token plain"> deepseek-r1-decode.yaml</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">kubectl apply </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-n</span><span class="token plain"> deepseek-r1-benchmark </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-f</span><span class="token plain"> deepseek-r1-prefill.yaml</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">kubectl get pod </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-n</span><span class="token plain"> deepseek-r1-benchmark </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-l</span><span class="token plain"> app.kubernetes.io/name</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">deepseek-r1-decode</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">kubectl get pod </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-n</span><span class="token plain"> deepseek-r1-benchmark </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-l</span><span class="token plain"> app.kubernetes.io/name</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">deepseek-r1-prefill</span><br></div></code></pre></div></div>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockTitle_OeMC">Expected output</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">NAME                       READY   STATUS    RESTARTS   AGE</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">deepseek-r1-decode-0       </span><span class="token number" style="color:rgb(181, 206, 168)">2</span><span class="token plain">/2     Running   </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain">          5m</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">deepseek-r1-decode-1       </span><span class="token number" style="color:rgb(181, 206, 168)">2</span><span class="token plain">/2     Running   </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain">          5m</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">deepseek-r1-decode-2       </span><span class="token number" style="color:rgb(181, 206, 168)">2</span><span class="token plain">/2     Running   </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain">          5m</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">deepseek-r1-prefill-0      </span><span class="token number" style="color:rgb(181, 206, 168)">1</span><span class="token plain">/1     Running   </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain">          5m</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">deepseek-r1-prefill-1      </span><span class="token number" style="color:rgb(181, 206, 168)">1</span><span class="token plain">/1     Running   </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain">          5m</span><br></div></code></pre></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="benchmarking-method">Benchmarking method<a href="https://test-docs.moreh.io/blog/2025/11/11/deepseek-r1-671b-on-amd-mi300x-gpus-maximum-throughput/#benchmarking-method" class="hash-link" aria-label="Direct link to Benchmarking method" title="Direct link to Benchmarking method" translate="no">​</a></h2>
<p>This benchmarking follows a commonly used approach for measuring the computational performance of inference servers. Multiple concurrent users send requests at a specific request-per-second (RPS) rate, each with a fixed input sequence length and output sequence length. The concurrency and RPS are determined empirically as high as possible within the limits of GPU memory capacity and without allowing requests to accumulate in the request queue of vLLM instances. The response times of these requests are measured and used to compute output tokens per second, total tokens per second, time to first token, and inter-token latency (also known as time per output token).</p>
<p>The experiments use the <a href="https://docs.vllm.ai/en/latest/cli/bench/serve/" target="_blank" rel="noopener noreferrer" class="">vLLM bench serve</a> tool to conduct measurements of this kind. However, this tool was originally designed to measure the performance of a single-GPU server, and several aspects of it are insufficient for evaluating the levels of throughput observed in these experiments — tens of thousands of tokens per second. Therefore, three additional features were implemented in the vLLM bench serve tool bundled with Moreh vLLM, to correctly measure performance in a distributed inference environment with very high throughput. See the modified version <a href="https://github.com/moreh-dev/vllm/tree/main/vllm/benchmarks" target="_blank" rel="noopener noreferrer" class="">here</a>.</p>
<ul>
<li class=""><code>--warmup-time</code>, <code>--cooldown-time</code>: At the beginning of the experiment, before enough requests have accumulated, and near the end of the experiment, as computation winds down, the GPUs are not fully utilized. To reliably measure the maximum throughput achievable by the inference system, the tool was extended to exclude requests from the initial (warm-up) and final (cool-down) phases from the performance measurement.</li>
<li class=""><code>--max-connections-per-worker</code>: The tool was modified so that the response times of individual requests are recorded across multiple threads; otherwise, information for some requests may be lost.</li>
<li class=""><code>--sharegpt-input-len</code>, <code>--sharegpt-output-len</code>, <code>--gutenberg-input-len</code>, <code>--gutenberg-output-len</code>: To accurately measure the effect of EP load balancing, substrings of meaningful text from a real dataset, cut to the desired input sequence length, are used as prompts rather than meaningless random strings.</li>
</ul>
<p>In this benchmarking, three different input/output sequence lengths (512/512, 1000/1000, and 2000/2000) and two different datasets (<a href="https://www.kaggle.com/datasets/roschildrui/sharegpt-v3-unfiltered-cleaned-split" target="_blank" rel="noopener noreferrer" class="">ShareGPT</a> and <a href="https://huggingface.co/datasets/manu/project_gutenberg" target="_blank" rel="noopener noreferrer" class="">Gutenberg</a>) are evaluated. To launch a new Moreh vLLM pod in a Kubernetes cluster, first create a <code>benchmarking-client.yaml</code> file as follows. <strong>Please modify the following items to match your system.</strong></p>
<ul>
<li class=""><strong>Specify the name of the Kubernetes worker node on which the benchmarking pod will run.</strong></li>
<li class=""><strong>Store the <code>ShareGPT_V3_unfiltered_cleaned_split.json</code> file and the <code>project_gutenberg</code> directory on the host filesystem of that node, and specify their paths.</strong></li>
</ul>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>benchmarking-client.yaml</summary><div><div class="collapsibleContent_i85q"><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockTitle_OeMC">benchmarking-client.yaml</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token key atrule">apiVersion</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> v1</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">kind</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> Pod</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">metadata</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">benchmark</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">spec</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">containers</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> benchmark</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">image</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> 255250787067.dkr.ecr.ap</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">northeast</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">2.amazonaws.com/quickstart/moreh</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">vllm</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain">vllm_251212</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">imagePullPolicy</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> IfNotPresent</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">command</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> sleep</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">args</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> infinity</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">volumeMounts</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> sharegpt</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">dataset</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">mountPath</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"/app/dataset/ShareGPT_V3_unfiltered_cleaned_split.json"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> gutenberg</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">dataset</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">          </span><span class="token key atrule">mountPath</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"/app/dataset/project_gutenberg"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">securityContext</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token key atrule">privileged</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token boolean important">true</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">nodeSelector</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token key atrule">kubernetes.io/hostname</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> &lt;clientHostname</span><span class="token punctuation" style="color:rgb(212, 212, 212)">&gt;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">volumes</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> sharegpt</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">dataset</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">hostPath</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token key atrule">path</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> /path/to/ShareGPT_V3_unfiltered_cleaned_split.json</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> gutenberg</span><span class="token punctuation" style="color:rgb(212, 212, 212)">-</span><span class="token plain">dataset</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">hostPath</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token key atrule">path</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> /path/to/project_gutenberg</span><br></div></code></pre></div></div></div></div></details>
<p>Run the following command to start the pod.</p>
<div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">kubectl </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-n</span><span class="token plain"> deepseek-r1-benchmark apply </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-f</span><span class="token plain"> benchmarking-client.yaml</span><br></div></code></pre></div></div>
<p>Inside the pod, you can run <code>vllm bench serve</code>. <strong>You may need to modify the <code>--host</code> value depending on your Istio gateway address</strong>. The following are the actual commands used to run each experiment. For each experiment, the warm-up time and cool-down time were adjusted appropriately.</p>
<div class="theme-tabs-container tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">512, 512, ShareGPT</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">512, 512, Gutenberg</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">1000, 1000, ShareGPT</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">1000, 1000, Gutenberg</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">2000, 2000, ShareGPT</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">2000, 2000, Gutenberg</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">vllm bench serve </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--backend</span><span class="token plain"> vllm </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--model</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"deepseek-ai/DeepSeek-R1"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --metric-percentiles </span><span class="token string" style="color:rgb(206, 145, 120)">"1,10,25,50,75,90"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --percentile-metrics </span><span class="token string" style="color:rgb(206, 145, 120)">"itl,tps,ttft"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--host</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"mif-istio.deepseek-r1-benchmark.svc.cluster.local"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--port</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">80</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --num-prompts </span><span class="token number" style="color:rgb(181, 206, 168)">32400</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-concurrency </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --request-rate </span><span class="token number" style="color:rgb(181, 206, 168)">140</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ignore-eos </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ready-check-timeout-sec </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-connections-per-worker </span><span class="token number" style="color:rgb(181, 206, 168)">1296</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --warmup-time </span><span class="token number" style="color:rgb(181, 206, 168)">120.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --cooldown-time </span><span class="token number" style="color:rgb(181, 206, 168)">70.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-name sharegpt </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-path /app/dataset/ShareGPT_V3_unfiltered_cleaned_split.json </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --sharegpt-input-len </span><span class="token number" style="color:rgb(181, 206, 168)">512</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --sharegpt-output-len </span><span class="token number" style="color:rgb(181, 206, 168)">512</span><br></div></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">vllm bench serve </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--backend</span><span class="token plain"> vllm </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--model</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"deepseek-ai/DeepSeek-R1"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --metric-percentiles </span><span class="token string" style="color:rgb(206, 145, 120)">"1,10,25,50,75,90"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --percentile-metrics </span><span class="token string" style="color:rgb(206, 145, 120)">"itl,tps,ttft"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--host</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"mif-istio.deepseek-r1-benchmark.svc.cluster.local"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--port</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">80</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --num-prompts </span><span class="token number" style="color:rgb(181, 206, 168)">32400</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-concurrency </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --request-rate </span><span class="token number" style="color:rgb(181, 206, 168)">140</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ignore-eos </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ready-check-timeout-sec </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-connections-per-worker </span><span class="token number" style="color:rgb(181, 206, 168)">1296</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --warmup-time </span><span class="token number" style="color:rgb(181, 206, 168)">130.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --cooldown-time </span><span class="token number" style="color:rgb(181, 206, 168)">70.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-name gutenberg </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-path /app/dataset/project_gutenberg </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --gutenberg-input-len </span><span class="token number" style="color:rgb(181, 206, 168)">512</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --gutenberg-output-len </span><span class="token number" style="color:rgb(181, 206, 168)">512</span><br></div></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">vllm bench serve </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--backend</span><span class="token plain"> vllm </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--model</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"deepseek-ai/DeepSeek-R1"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --metric-percentiles </span><span class="token string" style="color:rgb(206, 145, 120)">"1,10,25,50,75,90"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --percentile-metrics </span><span class="token string" style="color:rgb(206, 145, 120)">"itl,tps,ttft"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--host</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"mif-istio.deepseek-r1-benchmark.svc.cluster.local"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--port</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">80</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --num-prompts </span><span class="token number" style="color:rgb(181, 206, 168)">32400</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-concurrency </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --request-rate </span><span class="token number" style="color:rgb(181, 206, 168)">80</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ignore-eos </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ready-check-timeout-sec </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-connections-per-worker </span><span class="token number" style="color:rgb(181, 206, 168)">1296</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --warmup-time </span><span class="token number" style="color:rgb(181, 206, 168)">140.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --cooldown-time </span><span class="token number" style="color:rgb(181, 206, 168)">110.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-name sharegpt </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-path /app/dataset/ShareGPT_V3_unfiltered_cleaned_split.json </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --sharegpt-input-len </span><span class="token number" style="color:rgb(181, 206, 168)">1000</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --sharegpt-output-len </span><span class="token number" style="color:rgb(181, 206, 168)">1000</span><br></div></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">vllm bench serve </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--backend</span><span class="token plain"> vllm </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--model</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"deepseek-ai/DeepSeek-R1"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --metric-percentiles </span><span class="token string" style="color:rgb(206, 145, 120)">"10,25,50,75,90"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --percentile-metrics </span><span class="token string" style="color:rgb(206, 145, 120)">"itl,tps,ttft"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--host</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"mif-istio.deepseek-r1-benchmark.svc.cluster.local"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--port</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">80</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --num-prompts </span><span class="token number" style="color:rgb(181, 206, 168)">32400</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-concurrency </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --request-rate </span><span class="token number" style="color:rgb(181, 206, 168)">80</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ignore-eos </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ready-check-timeout-sec </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-connections-per-worker </span><span class="token number" style="color:rgb(181, 206, 168)">1296</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --warmup-time </span><span class="token number" style="color:rgb(181, 206, 168)">150.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --cooldown-time </span><span class="token number" style="color:rgb(181, 206, 168)">120.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-name gutenberg </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-path /app/dataset/project_gutenberg </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --gutenberg-input-len </span><span class="token number" style="color:rgb(181, 206, 168)">1000</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --gutenberg-output-len </span><span class="token number" style="color:rgb(181, 206, 168)">1000</span><br></div></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">vllm bench serve </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--backend</span><span class="token plain"> vllm </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--model</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"deepseek-ai/DeepSeek-R1"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --metric-percentiles </span><span class="token string" style="color:rgb(206, 145, 120)">"1,10,25,50,75,90"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --percentile-metrics </span><span class="token string" style="color:rgb(206, 145, 120)">"itl,tps,ttft"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--host</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"mif-istio.deepseek-r1-benchmark.svc.cluster.local"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--port</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">80</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --num-prompts </span><span class="token number" style="color:rgb(181, 206, 168)">32400</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-concurrency </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --request-rate </span><span class="token number" style="color:rgb(181, 206, 168)">48</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ignore-eos </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ready-check-timeout-sec </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-connections-per-worker </span><span class="token number" style="color:rgb(181, 206, 168)">1296</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --warmup-time </span><span class="token number" style="color:rgb(181, 206, 168)">300.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --cooldown-time </span><span class="token number" style="color:rgb(181, 206, 168)">230.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-name sharegpt </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-path /app/dataset/ShareGPT_V3_unfiltered_cleaned_split.json </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --sharegpt-input-len </span><span class="token number" style="color:rgb(181, 206, 168)">2000</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --sharegpt-output-len </span><span class="token number" style="color:rgb(181, 206, 168)">2000</span><br></div></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token plain">vllm bench serve </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--backend</span><span class="token plain"> vllm </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--model</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"deepseek-ai/DeepSeek-R1"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --metric-percentiles </span><span class="token string" style="color:rgb(206, 145, 120)">"1,10,25,50,75,90"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --percentile-metrics </span><span class="token string" style="color:rgb(206, 145, 120)">"itl,tps,ttft"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--host</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"mif-istio.deepseek-r1-benchmark.svc.cluster.local"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line theme-code-block-highlighted-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--port</span><span class="token plain"> </span><span class="token number" style="color:rgb(181, 206, 168)">80</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --num-prompts </span><span class="token number" style="color:rgb(181, 206, 168)">32400</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-concurrency </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --request-rate </span><span class="token number" style="color:rgb(181, 206, 168)">60</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ignore-eos </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --ready-check-timeout-sec </span><span class="token number" style="color:rgb(181, 206, 168)">0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --max-connections-per-worker </span><span class="token number" style="color:rgb(181, 206, 168)">1296</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --warmup-time </span><span class="token number" style="color:rgb(181, 206, 168)">260.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --cooldown-time </span><span class="token number" style="color:rgb(181, 206, 168)">240.0</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-name gutenberg </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --dataset-path /app/dataset/project_gutenberg </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --gutenberg-input-len </span><span class="token number" style="color:rgb(181, 206, 168)">2000</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  --gutenberg-output-len </span><span class="token number" style="color:rgb(181, 206, 168)">2000</span><br></div></code></pre></div></div></div></div></div>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="experimental-results">Experimental results<a href="https://test-docs.moreh.io/blog/2025/11/11/deepseek-r1-671b-on-amd-mi300x-gpus-maximum-throughput/#experimental-results" class="hash-link" aria-label="Direct link to Experimental results" title="Direct link to Experimental results" translate="no">​</a></h2>
<p>The results are as follows. As mentioned earlier, the concurrency and RPS values were determined empirically and may vary depending on the system scale (the number of GPU nodes). We achieved 50,892–66,194 output tokens/sec across various configurations, which corresponds to <strong>17,000–22,000 tokens/sec per decode node</strong>.</p>
<div style="font-size:0.7em"><table><thead><tr><th style="text-align:center">Input sequence length</th><th style="text-align:center">Output sequence length</th><th style="text-align:center">Dataset</th><th style="text-align:center">(Concurrency, RPS)</th><th style="text-align:right">Output tokens/sec</th><th style="text-align:right">Output tokens/sec per decode node</th><th style="text-align:right">Total tokens/sec</th><th style="text-align:right">Mean TTFT (ms)</th><th style="text-align:right">Mean ITL (ms)</th></tr></thead><tbody><tr><td style="text-align:center">512</td><td style="text-align:center">512</td><td style="text-align:center">ShareGPT</td><td style="text-align:center">(10800, 140)</td><td style="text-align:right">66,194.80</td><td style="text-align:right"><strong>22,064.93</strong></td><td style="text-align:right">85,347.35</td><td style="text-align:right">1,677.87</td><td style="text-align:right">160.33</td></tr><tr><td style="text-align:center">512</td><td style="text-align:center">512</td><td style="text-align:center">Gutenberg</td><td style="text-align:center">(10800, 140)</td><td style="text-align:right">64,695.10</td><td style="text-align:right"><strong>21,565.03</strong></td><td style="text-align:right">79,432.24</td><td style="text-align:right">1,774.90</td><td style="text-align:right">164.32</td></tr><tr><td style="text-align:center">1000</td><td style="text-align:center">1000</td><td style="text-align:center">ShareGPT</td><td style="text-align:center">(10800, 80)</td><td style="text-align:right">61,828.90</td><td style="text-align:right"><strong>20,609.63</strong></td><td style="text-align:right">94,103.87</td><td style="text-align:right">1,802.87</td><td style="text-align:right">172.16</td></tr><tr><td style="text-align:center">1000</td><td style="text-align:center">1000</td><td style="text-align:center">Gutenberg</td><td style="text-align:center">(10800, 80)</td><td style="text-align:right">61,418.55</td><td style="text-align:right"><strong>20,472.85</strong></td><td style="text-align:right">92,353.63</td><td style="text-align:right">2,149.80</td><td style="text-align:right">173.63</td></tr><tr><td style="text-align:center">2000</td><td style="text-align:center">2000</td><td style="text-align:center">ShareGPT</td><td style="text-align:center">(10800, 48)</td><td style="text-align:right">51,187.87</td><td style="text-align:right"><strong>17,062.62</strong></td><td style="text-align:right">77,775.33</td><td style="text-align:right">2,567.87</td><td style="text-align:right">208.59</td></tr><tr><td style="text-align:center">2000</td><td style="text-align:center">2000</td><td style="text-align:center">Gutenberg</td><td style="text-align:center">(10800, 60)</td><td style="text-align:right">50,892.65</td><td style="text-align:right"><strong>16,964.22</strong></td><td style="text-align:right">76,739.86</td><td style="text-align:right">5,586.34</td><td style="text-align:right">208.76</td></tr></tbody></table></div>
<p>Click to view raw benchmarking logs.</p>
<div class="theme-tabs-container tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">512, 512, ShareGPT</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">512, 512, Gutenberg</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">1000, 1000, ShareGPT</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">1000, 1000, Gutenberg</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">2000, 2000, ShareGPT</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">2000, 2000, Gutenberg</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">Serving Benchmark </span><span class="token assign-left variable" style="color:rgb(156, 220, 254)">Result</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Number of worker processes:              </span><span class="token number" style="color:rgb(181, 206, 168)">25</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Successful requests:                     </span><span class="token number" style="color:rgb(181, 206, 168)">4333</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Maximum request concurrency:             </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Request rate configured </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">RPS</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:           </span><span class="token number" style="color:rgb(181, 206, 168)">140.00</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Warm-up Time:                            </span><span class="token number" style="color:rgb(181, 206, 168)">120.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Cool-down Time:                          </span><span class="token number" style="color:rgb(181, 206, 168)">70.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Benchmark duration </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                  </span><span class="token number" style="color:rgb(181, 206, 168)">115.83</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total input tokens:                      </span><span class="token number" style="color:rgb(181, 206, 168)">2218496</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total generated tokens:                  </span><span class="token number" style="color:rgb(181, 206, 168)">7667539</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Output token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:         </span><span class="token number" style="color:rgb(181, 206, 168)">66194.80</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total Token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:          </span><span class="token number" style="color:rgb(181, 206, 168)">85347.35</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Time to First Token----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                          </span><span class="token number" style="color:rgb(181, 206, 168)">1677.87</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                        </span><span class="token number" style="color:rgb(181, 206, 168)">1720.09</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P1 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">673.72</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">944.53</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1177.09</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1720.09</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">2117.51</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">2387.32</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Inter-token Latency----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">160.33</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                         </span><span class="token number" style="color:rgb(181, 206, 168)">158.98</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P1 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                             </span><span class="token number" style="color:rgb(181, 206, 168)">105.28</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">139.59</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">151.42</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">158.98</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">169.65</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">184.02</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><br></div></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">Serving Benchmark </span><span class="token assign-left variable" style="color:rgb(156, 220, 254)">Result</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Number of worker processes:              </span><span class="token number" style="color:rgb(181, 206, 168)">25</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Successful requests:                     </span><span class="token number" style="color:rgb(181, 206, 168)">3186</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Maximum request concurrency:             </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Request rate configured </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">RPS</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:           </span><span class="token number" style="color:rgb(181, 206, 168)">140.00</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Warm-up Time:                            </span><span class="token number" style="color:rgb(181, 206, 168)">130.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Cool-down Time:                          </span><span class="token number" style="color:rgb(181, 206, 168)">70.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Benchmark duration </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                  </span><span class="token number" style="color:rgb(181, 206, 168)">110.69</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total input tokens:                      </span><span class="token number" style="color:rgb(181, 206, 168)">1631232</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total generated tokens:                  </span><span class="token number" style="color:rgb(181, 206, 168)">7161008</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Output token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:         </span><span class="token number" style="color:rgb(181, 206, 168)">64695.10</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total Token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:          </span><span class="token number" style="color:rgb(181, 206, 168)">79432.24</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Time to First Token----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                          </span><span class="token number" style="color:rgb(181, 206, 168)">1774.90</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                        </span><span class="token number" style="color:rgb(181, 206, 168)">1795.76</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P1 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">775.35</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">877.82</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1124.76</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1795.76</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">2296.75</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">2685.10</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Inter-token Latency----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">164.32</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                         </span><span class="token number" style="color:rgb(181, 206, 168)">162.24</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P1 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                             </span><span class="token number" style="color:rgb(181, 206, 168)">106.99</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">142.68</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">154.22</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">162.24</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">174.62</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">189.84</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><br></div></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">Serving Benchmark </span><span class="token assign-left variable" style="color:rgb(156, 220, 254)">Result</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Number of worker processes:              </span><span class="token number" style="color:rgb(181, 206, 168)">25</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Successful requests:                     </span><span class="token number" style="color:rgb(181, 206, 168)">11856</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Maximum request concurrency:             </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Request rate configured </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">RPS</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:           </span><span class="token number" style="color:rgb(181, 206, 168)">80.00</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Warm-up Time:                            </span><span class="token number" style="color:rgb(181, 206, 168)">140.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Cool-down Time:                          </span><span class="token number" style="color:rgb(181, 206, 168)">110.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Benchmark duration </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                  </span><span class="token number" style="color:rgb(181, 206, 168)">367.34</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total input tokens:                      </span><span class="token number" style="color:rgb(181, 206, 168)">11856000</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total generated tokens:                  </span><span class="token number" style="color:rgb(181, 206, 168)">22712445</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Output token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:         </span><span class="token number" style="color:rgb(181, 206, 168)">61828.90</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total Token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:          </span><span class="token number" style="color:rgb(181, 206, 168)">94103.87</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Time to First Token----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                          </span><span class="token number" style="color:rgb(181, 206, 168)">1802.87</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                        </span><span class="token number" style="color:rgb(181, 206, 168)">1411.86</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P1 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">724.34</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">987.73</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1059.29</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1411.86</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">2221.47</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">3434.72</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Inter-token Latency----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">172.16</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                         </span><span class="token number" style="color:rgb(181, 206, 168)">169.69</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P1 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                             </span><span class="token number" style="color:rgb(181, 206, 168)">120.22</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">157.36</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">164.97</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">169.69</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">177.89</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">192.85</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><br></div></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">Serving Benchmark </span><span class="token assign-left variable" style="color:rgb(156, 220, 254)">Result</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Number of worker processes:              </span><span class="token number" style="color:rgb(181, 206, 168)">25</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Successful requests:                     </span><span class="token number" style="color:rgb(181, 206, 168)">10931</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Maximum request concurrency:             </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Request rate configured </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">RPS</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:           </span><span class="token number" style="color:rgb(181, 206, 168)">80.00</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Warm-up Time:                            </span><span class="token number" style="color:rgb(181, 206, 168)">150.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Cool-down Time:                          </span><span class="token number" style="color:rgb(181, 206, 168)">120.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Benchmark duration </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                  </span><span class="token number" style="color:rgb(181, 206, 168)">353.35</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total input tokens:                      </span><span class="token number" style="color:rgb(181, 206, 168)">10931000</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total generated tokens:                  </span><span class="token number" style="color:rgb(181, 206, 168)">21702425</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Output token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:         </span><span class="token number" style="color:rgb(181, 206, 168)">61418.55</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total Token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:          </span><span class="token number" style="color:rgb(181, 206, 168)">92353.63</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Time to First Token----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                          </span><span class="token number" style="color:rgb(181, 206, 168)">2149.80</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                        </span><span class="token number" style="color:rgb(181, 206, 168)">1910.21</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1040.06</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1374.56</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1910.21</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">2759.64</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">3502.56</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Inter-token Latency----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">173.63</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                         </span><span class="token number" style="color:rgb(181, 206, 168)">171.10</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">155.99</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">165.99</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">171.10</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">180.85</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">195.95</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><br></div></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">Serving Benchmark </span><span class="token assign-left variable" style="color:rgb(156, 220, 254)">Result</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Number of worker processes:              </span><span class="token number" style="color:rgb(181, 206, 168)">25</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Successful requests:                     </span><span class="token number" style="color:rgb(181, 206, 168)">11906</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Maximum request concurrency:             </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Request rate configured </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">RPS</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:           </span><span class="token number" style="color:rgb(181, 206, 168)">48.00</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Warm-up Time:                            </span><span class="token number" style="color:rgb(181, 206, 168)">300.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Cool-down Time:                          </span><span class="token number" style="color:rgb(181, 206, 168)">230.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Benchmark duration </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                  </span><span class="token number" style="color:rgb(181, 206, 168)">895.61</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total input tokens:                      </span><span class="token number" style="color:rgb(181, 206, 168)">23812000</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total generated tokens:                  </span><span class="token number" style="color:rgb(181, 206, 168)">45844389</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Output token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:         </span><span class="token number" style="color:rgb(181, 206, 168)">51187.87</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total Token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:          </span><span class="token number" style="color:rgb(181, 206, 168)">77775.33</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Time to First Token----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                          </span><span class="token number" style="color:rgb(181, 206, 168)">2567.87</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                        </span><span class="token number" style="color:rgb(181, 206, 168)">2538.34</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P1 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">971.14</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1213.06</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1622.42</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">2538.34</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">3267.10</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">4126.80</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Inter-token Latency----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">208.59</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                         </span><span class="token number" style="color:rgb(181, 206, 168)">201.50</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P1 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                             </span><span class="token number" style="color:rgb(181, 206, 168)">140.72</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">186.91</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">195.65</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">201.50</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">217.87</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">243.87</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><br></div></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">Serving Benchmark </span><span class="token assign-left variable" style="color:rgb(156, 220, 254)">Result</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Number of worker processes:              </span><span class="token number" style="color:rgb(181, 206, 168)">25</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Successful requests:                     </span><span class="token number" style="color:rgb(181, 206, 168)">12254</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Maximum request concurrency:             </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Request rate configured </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">RPS</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:           </span><span class="token number" style="color:rgb(181, 206, 168)">60.00</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Warm-up Time:                            </span><span class="token number" style="color:rgb(181, 206, 168)">260.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Cool-down Time:                          </span><span class="token number" style="color:rgb(181, 206, 168)">240.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Benchmark duration </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                  </span><span class="token number" style="color:rgb(181, 206, 168)">948.19</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total input tokens:                      </span><span class="token number" style="color:rgb(181, 206, 168)">24508000</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total generated tokens:                  </span><span class="token number" style="color:rgb(181, 206, 168)">48255768</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Output token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:         </span><span class="token number" style="color:rgb(181, 206, 168)">50892.65</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total Token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:          </span><span class="token number" style="color:rgb(181, 206, 168)">76739.86</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Time to First Token----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                          </span><span class="token number" style="color:rgb(181, 206, 168)">5586.34</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                        </span><span class="token number" style="color:rgb(181, 206, 168)">5313.19</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P1 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">1017.56</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1745.51</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">2823.02</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">5313.19</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">7612.50</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">10096.06</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Inter-token Latency----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">208.76</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                         </span><span class="token number" style="color:rgb(181, 206, 168)">201.13</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P1 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                             </span><span class="token number" style="color:rgb(181, 206, 168)">139.90</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">187.16</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">195.34</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">201.13</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">214.94</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">245.98</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><br></div></code></pre></div></div></div></div></div>
<p>The following are some publicly available performance numbers for comparison.</p>
<ul>
<li class="">The SGLang team reported that, on a cluster of 12x H100 nodes (96x GPUs) — with 3 nodes used for prefill and 9 nodes for decode — they achieved a throughput of <strong>22,300 output tokens/sec</strong> per decode node under a configuration with an input sequence length of 2,000 and an output sequence length of 100. Note that this number does not represent end-to-end performance with actual PD disaggregation applied; rather, it measures partial performance with decoding-only execution. (<a href="https://lmsys.org/blog/2025-05-05-large-scale-ep/" target="_blank" rel="noopener noreferrer" class="">Link</a>)</li>
<li class="">DeepSeek reported achieving <strong>14,800 tokens/sec</strong> per H800 decode node by applying PD disaggregation and expert parallelism. (<a href="https://github.com/deepseek-ai/open-infra-index/blob/main/202502OpenSourceWeek/day_6_one_more_thing_deepseekV3R1_inference_system_overview.md" target="_blank" rel="noopener noreferrer" class="">Link</a>)</li>
<li class="">AMD reported achieving <strong>up to 14,300 output tokens/sec</strong> per MI300X decode node. This result was also measured under decoding-only execution. (<a href="https://rocm.blogs.amd.com/software-tools-optimization/wide-ep-deepseek/README.html" target="_blank" rel="noopener noreferrer" class="">Link</a>)</li>
</ul>
<p>In real production deployments, an appropriate trade-off between throughput and latency (inter-token latency and time to first token) must be chosen according to the service-level objectives (SLOs). As shorter latency targets are pursued, achievable throughput inevitably decreases. Nevertheless, measuring and comparing the maximum achievable throughput before applying SLO constraints is an important step in evaluating infrastructure efficiency. Our next benchmarking will examine how throughput varies across different ITL targets.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="appendix">Appendix<a href="https://test-docs.moreh.io/blog/2025/11/11/deepseek-r1-671b-on-amd-mi300x-gpus-maximum-throughput/#appendix" class="hash-link" aria-label="Direct link to Appendix" title="Direct link to Appendix" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="experimental-results-for-isl2000-and-osl100">Experimental results for ISL=2,000 and OSL=100<a href="https://test-docs.moreh.io/blog/2025/11/11/deepseek-r1-671b-on-amd-mi300x-gpus-maximum-throughput/#experimental-results-for-isl2000-and-osl100" class="hash-link" aria-label="Direct link to Experimental results for ISL=2,000 and OSL=100" title="Direct link to Experimental results for ISL=2,000 and OSL=100" translate="no">​</a></h3>
<p>An input sequence length of 2,000 and an output sequence length of 100 were first used by the SGLang team for their PD+EP performance evaluation. Since then, this configuration has been widely adopted to evaluate PD+EP performance of DeepSeek R1.</p>
<p>First, please note that this configuration was proposed to measure prefill and decode throughput separately. Under the assumption that the input length is always 20x longer, a real inference system would require ~10x more prefill instances than decode instances. (In practice, real usage patterns differ from this assumption, and the number of decode instances typically exceeds that of prefill instances.) In small clusters, prefill inevitably becomes the overall performance bottleneck, making it impossible to accurately measure the output tokens/sec that the GPU servers can actually deliver.</p>
<p>Despite this, by enabling prefix caching and having input sequences share a fixed set of prompts, we can design a scenario in which the prefill workload is significantly reduced and measure the resulting output tokens/sec. As a result, we achieved <strong>~18,000 tokens/sec per decode node</strong>.</p>
<div style="font-size:0.85em"><table><thead><tr><th style="text-align:center">Input sequence length</th><th style="text-align:center">Output sequence length</th><th style="text-align:center">Dataset</th><th style="text-align:center">(Concurrency, RPS)</th><th style="text-align:right">Output tokens/sec</th><th style="text-align:right">Output tokens/sec per decode node</th><th style="text-align:right">Mean ITL (ms)</th></tr></thead><tbody><tr><td style="text-align:center">2000</td><td style="text-align:center">100</td><td style="text-align:center">Gutenberg</td><td style="text-align:center">(10800, 1500)</td><td style="text-align:right">53,776.62</td><td style="text-align:right"><strong>17,925.54</strong></td><td style="text-align:right">191.71</td></tr></tbody></table></div>
<p>Click to view the raw benchmarking log.</p>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>(2000, 100, Gutenberg)</summary><div><div class="collapsibleContent_i85q"><div class="language-shell codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-shell codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain">Serving Benchmark </span><span class="token assign-left variable" style="color:rgb(156, 220, 254)">Result</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Number of worker processes:              </span><span class="token number" style="color:rgb(181, 206, 168)">30</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Successful requests:                     </span><span class="token number" style="color:rgb(181, 206, 168)">185577</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Maximum request concurrency:             </span><span class="token number" style="color:rgb(181, 206, 168)">10800</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Request rate configured </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">RPS</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:           </span><span class="token number" style="color:rgb(181, 206, 168)">1500.00</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Warm-up Time:                            </span><span class="token number" style="color:rgb(181, 206, 168)">30.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Cool-down Time:                          </span><span class="token number" style="color:rgb(181, 206, 168)">20.0</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Benchmark duration </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                  </span><span class="token number" style="color:rgb(181, 206, 168)">367.68</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total input tokens:                      </span><span class="token number" style="color:rgb(181, 206, 168)">371154000</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total generated tokens:                  </span><span class="token number" style="color:rgb(181, 206, 168)">19772555</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Output token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:         </span><span class="token number" style="color:rgb(181, 206, 168)">53776.62</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Total Token throughput </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">tok/s</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:          </span><span class="token number" style="color:rgb(181, 206, 168)">1063226.66</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Time to First Token----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                          </span><span class="token number" style="color:rgb(181, 206, 168)">1079.36</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                        </span><span class="token number" style="color:rgb(181, 206, 168)">979.51</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">832.05</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">897.71</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">979.51</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1079.35</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 TTFT </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">1223.28</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">---------------Inter-token Latency----------------</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Mean ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                           </span><span class="token number" style="color:rgb(181, 206, 168)">191.71</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">Median ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                         </span><span class="token number" style="color:rgb(181, 206, 168)">181.35</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P10 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">166.87</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P25 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">175.00</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P50 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">181.35</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P75 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">191.88</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">P90 ITL </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain">:                            </span><span class="token number" style="color:rgb(181, 206, 168)">259.46</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><span class="token operator" style="color:rgb(212, 212, 212)">==</span><br></div></code></pre></div></div></div></div></details>
<p>We have also measured the performance of a decoding-only execution under the same configuration (ISL=2,000, OSL=100) and reported the results in a <a href="https://moreh.io/technical-report/21k-output-tokens-per-second-deepseek-inference-on-amd-instinct-mi300x-gpus-with-expert-parallelism-251113/" target="_blank" rel="noopener noreferrer" class="">technical report</a>. The maximum throughput achieved in this setting was 21,224 tokens/sec per decode node. This indicates that, in an end-to-end environment, MoAI Inference Framework is able to achieve <strong>~85% of the peak decode performance</strong>.</p>]]></content>
    </entry>
</feed>