Skip to main content
Version: Dev 🚧

Log Collection (Loki + Vector)

This document explains how to enable centralized log collection for the MoAI Inference Framework using Loki (log aggregation) and Vector (log collection agent).

Overview​


Architecture details​

Loki​

PropertyValue
Helm chartgrafana/loki v6.30.0
App version3.5.1
Storage backendS3 (MinIO), TSDB index
Retention90 days (2160 h)
Ingestion limit30 MB/s, 60 MB burst
Max entries/query50 000
DeploymentDistributed (gateway / read / write / backend)

Vector​

PropertyValue
Helm chartvector/vector v0.39.0
DeploymentDaemonSet (Agent mode, one pod per node)
Log sourcePods labelled mif.moreh.io/log.collect=true, plus AIGateway pods (app.kubernetes.io/name=aigateway) collected automatically (kubernetes_logs)
Log formatJSON parsing applied to pods labelled mif.moreh.io/log.format=json, and always to AIGateway pods
Tolerationsunschedulable, compute, amd.com/gpu

MinIO​

PropertyValue
Helm chartminio/minio v5.4.0
ModeStandalone
Bucketloki (created via post-install Job on startup)
Loki credentialsDedicated loki user with S3 policy scoped to loki bucket
Resources2 Gi memory (requests)
PersistenceemptyDir (ephemeral by default)
DeploymentSingle pod

Component naming​

Service names are derived from the Helm release name. With the default release name mif:

ServiceName (same-namespace access)
MinIOmif-minio
Loki gatewaymif-loki-gateway
Loki readmif-loki-read
Loki writemif-loki-write

Vector connects to Loki using the release-prefixed service name since all components are co-located in the same namespace.


Prerequisites​

  • The moai-inference-framework Helm chart installed (or being installed).
info

MinIO, Loki, and Vector are all enabled by default in the moai-inference-framework chart. No additional configuration is required to get started.


Installation​

Log collection is installed as part of the moai-inference-framework Helm chart. See Prerequisites for the required values and install command.


Verifying the installation​

Check that all Loki components are running.

kubectl get pods -n mif -l app.kubernetes.io/name=loki
Expected output (all pods Running)
NAME READY STATUS RESTARTS AGE
loki-backend-0 1/1 Running 0 2m
loki-gateway-xxxxxxxxx-xxxxx 1/1 Running 0 2m
loki-read-xxxxxxxxx-xxxxx 1/1 Running 0 2m
loki-write-0 1/1 Running 0 2m

Check that Vector is running on all nodes.

kubectl get pods -n mif -l app.kubernetes.io/name=vector
Expected output (one pod per node, all Running)
NAME READY STATUS RESTARTS AGE
vector-xxxxx 1/1 Running 0 2m
vector-yyyyy 1/1 Running 0 2m

Check Vector logs to confirm it is shipping to Loki without errors.

kubectl logs -n mif -l app.kubernetes.io/name=vector --tail=50

Enabling log collection for a pod​

Most pods opt in to log collection explicitly, controlled by the two pod labels below. First-class components such as AIGateway are collected automatically — see Automatically collected components.

Opt-in label​

Add the mif.moreh.io/log.collect=true label to a pod to include its logs in Vector's collection. Pods without this label are ignored, except for components collected automatically (see Automatically collected components).

metadata:
labels:
mif.moreh.io/log.collect: "true"

Log format label​

Add the mif.moreh.io/log.format=json label to enable structured JSON log parsing for a pod. When set, Vector parses each log line as JSON and promotes the following fields:

JSON fieldMapped to
msg or messagemessage
time or timestamptimestamp
levellevel (Loki label)
othersmerged into the event

Both common conventions are accepted: Go components emit msg/time (for example, the Heimdall scheduler), while Rust components emit message/timestamp (for example, AIGateway).

Without this label, the log line is forwarded as-is without any JSON parsing.

metadata:
labels:
mif.moreh.io/log.collect: "true"
mif.moreh.io/log.format: "json"
info

The level Loki label is only populated for JSON-parsed pods (those labelled mif.moreh.io/log.format=json, plus AIGateway). For plain-text pods, level remains empty.


Automatically collected components​

AIGateway pods are collected automatically — no opt-in label is required. The Heimdall controller stamps the immutable label app.kubernetes.io/name=aigateway on every AIGateway pod and exposes no field to set mif.moreh.io/log.collect, so Vector selects these pods by that label and always parses their JSON output. This mirrors AIGateway metrics, which the controller exposes through an auto-created ServiceMonitor and PodMonitor with no per-pod configuration.

Query AIGateway logs in Grafana with the app="aigateway" selector (see Searching logs in Grafana).


Searching logs in Grafana​

Accessing Grafana​

If you have not yet accessed Grafana, follow the Accessing Grafana guide to retrieve admin credentials, set up port forwarding, and log in.

Opening the Explore view​

After logging in to Grafana, click on the Explore icon (compass) in the left sidebar. You will see the Explore view with a query editor:

Grafana Explore view

Selecting the Loki datasource​

If the datasource is not already set to Loki, click the datasource dropdown at the top of the page and select Loki:

Selecting the Loki datasource

Switching to Code mode​

The query editor defaults to Builder mode, which provides a visual query builder. To write LogQL queries directly, click the Code button to switch to Code mode:

Switching to Code mode

Running a log query​

Enter a LogQL query in the query editor and click Run query (or press Shift+Enter). For example, {namespace="default"} returns all logs from the default namespace. The screenshot below shows the results, which include both plain-text and JSON-formatted logs collected from different pods:

Log search results in Grafana

Vector enriches every log entry with the following labels, which can be used as LogQL selectors:

LabelSourceExample value
namespacekubernetes.pod_namespacedefault
inference_servicepod label app.kubernetes.io/instancellama-3-2-1b
pool_namepod label mif.moreh.io/poolheimdall-inference-scheduler
rolepod label mif.moreh.io/roleprefill, decode
apppod label app.kubernetes.io/namevllm, aigateway
node_nameVECTOR_SELF_NODE_NAME env var (injected by Vector)gpu-node-01
levelparsed from JSON log field level (JSON-parsed pods only)info, warn, error

Query examples​

Filter by a single label:

{namespace="default"}
{inference_service="llama-3-2-1b"}
{pool_name="heimdall-inference-scheduler"}
{role="decode"}

Combine multiple labels and search for a keyword in the log line:

{namespace="default", inference_service="llama-3-2-1b", role="prefill"} |= "error"

Filter by log level (available only for JSON-formatted pods):

{namespace="default", level="error"}
info

The level label is only available for JSON-parsed pods. To filter plain-text logs by level, use a pipeline filter instead:

{namespace="default"} |= "ERROR"

AIGateway logs​

AIGateway logs are collected automatically and always parsed as JSON. Select them with the app label, and use | json to expose fields such as target, request_id, and trace_id:

{app="aigateway"}
{app="aigateway"} | json
{app="aigateway"} | json | request_id="<requestId>"

AIGateway emits uppercase levels (INFO, DEBUG, WARN, ERROR), so filter by level with the uppercase value:

{app="aigateway", level="DEBUG"}

Each parsed line includes a trace_id field for correlating a log with its trace in Tempo. Automatic log-to-trace links require a derivedFields entry on the Loki datasource, which this chart does not configure by default.


Using an external MinIO​

If MinIO is already deployed outside this chart, set minio.enabled: false and configure lokiBucket with the host and credentials of a MinIO user that has read/write access to the loki bucket.

Same namespace — if the existing MinIO service name matches <release>-minio, only credentials are required:

moai-inference-framework-values.yaml
minio:
enabled: false
lokiBucket:
accessKey: <accessKey>
secretKey: <secretKey>

Different namespace — set lokiBucket.host to the FQDN so that Loki can resolve it cross-namespace:

moai-inference-framework-values.yaml
minio:
enabled: false
lokiBucket:
host: <minio.minio.svc.cluster.local>
accessKey: <accessKey>
secretKey: <secretKey>

Disabling log collection​

moai-inference-framework-values.yaml
minio:
enabled: false
loki:
enabled: false
vector:
enabled: false