强曰为道
与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

VictoriaMetrics 完全指南 / 13 - 容器化部署

13 · 容器化部署

本章目标

  • 掌握 Docker 单容器和 Compose 部署
  • 了解 Kubernetes 上的多种部署方式
  • 使用 Helm Chart 快速搭建集群
  • 掌握生产级容器化配置

13.1 Docker 基础

13.1.1 镜像说明

镜像说明
victoriametrics/victoria-metrics单节点版
victoriametrics/vminsert集群写入层
victoriametrics/vmselect集群查询层
victoriametrics/vmstorage集群存储层
victoriametrics/vmagent采集代理
victoriametrics/vmalert告警引擎
victoriametrics/vmauth认证代理

13.1.2 基础运行

# 单节点
docker run -d \
    --name victoria-metrics \
    -p 8428:8428 \
    -v vm-data:/victoria-metrics-data \
    victoriametrics/victoria-metrics:v1.106.0 \
    -storageDataPath=/victoria-metrics-data \
    -retentionPeriod=90d

# 验证
curl http://localhost:8428/health

13.1.3 环境变量配置

# 使用环境变量传递参数(需开启 envflag)
docker run -d \
    --name victoria-metrics \
    -p 8428:8428 \
    -v vm-data:/victoria-metrics-data \
    -e VM_STORAGE_DATA_PATH=/victoria-metrics-data \
    -e VM_RETENTION_PERIOD=90d \
    -e VM_HTTP_LISTEN_ADDR=:8428 \
    -e VM_MEMORY_ALLOWED_PERCENT=60 \
    -e VM_ENVFLAG_ENABLE=true \
    victoriametrics/victoria-metrics:v1.106.0

13.2 Docker Compose 完整监控栈

13.2.1 项目结构

vm-monitoring/
├── docker-compose.yml
├── .env
├── config/
│   ├── prometheus.yml
│   ├── alertmanager.yml
│   ├── vmalert-rules.yml
│   └── vmauth.yml
├── grafana/
│   └── provisioning/
│       ├── datasources/
│       │   └── victoriametrics.yml
│       └── dashboards/
│           └── dashboard.yml
└── data/                  # 持久化数据
    ├── vm-data/
    ├── grafana-data/
    └── alertmanager-data/

13.2.2 环境变量

# .env
VM_VERSION=v1.106.0
GRAFANA_VERSION=11.0.0
PROMETHEUS_VERSION=v2.50.0

# Grafana 管理员密码
GF_SECURITY_ADMIN_PASSWORD=admin123

# 数据保留期
VM_RETENTION_PERIOD=90d

# 资源限制
VM_MEMORY_LIMIT=4G
VM_CPU_LIMIT=2

13.2.3 docker-compose.yml

version: '3.8'

services:
  # VictoriaMetrics 单节点
  victoria-metrics:
    image: victoriametrics/victoria-metrics:${VM_VERSION}
    container_name: victoria-metrics
    restart: unless-stopped
    ports:
      - "8428:8428"
    volumes:
      - ./data/vm-data:/victoria-metrics-data
    command:
      - '-storageDataPath=/victoria-metrics-data'
      - '-retentionPeriod=${VM_RETENTION_PERIOD}'
      - '-httpListenAddr=:8428'
      - '-memory.allowedPercent=60'
      - '-dedup.minScrapeInterval=15s'
      - '-envflag.enable=true'
    deploy:
      resources:
        limits:
          memory: ${VM_MEMORY_LIMIT}
          cpus: '${VM_CPU_LIMIT}'
    healthcheck:
      test: ["CMD", "/usr/bin/wget", "--spider", "-q", "http://localhost:8428/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 30s
    networks:
      - monitoring

  # vmagent - 轻量采集
  vmagent:
    image: victoriametrics/vmagent:${VM_VERSION}
    container_name: vmagent
    restart: unless-stopped
    ports:
      - "8429:8429"
    volumes:
      - ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./data/vmagent-data:/vmagent-remotewrite-data
    command:
      - '-promscrape.config=/etc/prometheus/prometheus.yml'
      - '-remoteWrite.url=http://victoria-metrics:8428/api/v1/write'
      - '-remoteWrite.tmpDataPath=/vmagent-remotewrite-data'
      - '-envflag.enable=true'
    depends_on:
      victoria-metrics:
        condition: service_healthy
    networks:
      - monitoring

  # vmalert - 告警引擎
  vmalert:
    image: victoriametrics/vmalert:${VM_VERSION}
    container_name: vmalert
    restart: unless-stopped
    ports:
      - "8880:8880"
    volumes:
      - ./config/vmalert-rules.yml:/etc/vmalert/rules.yml:ro
    command:
      - '-rule=/etc/vmalert/rules.yml'
      - '-datasource.url=http://victoria-metrics:8428'
      - '-notifier.url=http://alertmanager:9093'
      - '-external.label=env=prod'
      - '-evaluationInterval=30s'
      - '-httpListenAddr=:8880'
      - '-envflag.enable=true'
    depends_on:
      victoria-metrics:
        condition: service_healthy
    networks:
      - monitoring

  # Alertmanager
  alertmanager:
    image: prom/alertmanager:v0.27.0
    container_name: alertmanager
    restart: unless-stopped
    ports:
      - "9093:9093"
    volumes:
      - ./config/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
      - ./data/alertmanager-data:/alertmanager
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
    networks:
      - monitoring

  # Grafana
  grafana:
    image: grafana/grafana:${GRAFANA_VERSION}
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - ./data/grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD}
      - GF_INSTALL_PLUGINS=victoriametrics-metrics-datasource
      - GF_USERS_ALLOW_SIGN_UP=false
    depends_on:
      victoria-metrics:
        condition: service_healthy
    networks:
      - monitoring

  # vmauth - 认证网关(可选)
  vmauth:
    image: victoriametrics/vmauth:${VM_VERSION}
    container_name: vmauth
    restart: unless-stopped
    ports:
      - "8427:8427"
    volumes:
      - ./config/vmauth.yml:/etc/vmauth/auth.yml:ro
    command:
      - '-auth.config=/etc/vmauth/auth.yml'
      - '-httpListenAddr=:8427'
      - '-envflag.enable=true'
    depends_on:
      victoria-metrics:
        condition: service_healthy
    networks:
      - monitoring

networks:
  monitoring:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/16

13.2.4 Prometheus 采集配置

# config/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'victoria-metrics'
    static_configs:
      - targets: ['victoria-metrics:8428']

  - job_name: 'vmagent'
    static_configs:
      - targets: ['vmagent:8429']

  - job_name: 'vmalert'
    static_configs:
      - targets: ['vmalert:8880']

  - job_name: 'alertmanager'
    static_configs:
      - targets: ['alertmanager:9093']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

13.2.5 启动与管理

# 启动所有服务
docker compose up -d

# 查看状态
docker compose ps

# 查看日志
docker compose logs -f victoria-metrics
docker compose logs -f vmagent

# 停止所有服务
docker compose down

# 停止并删除数据卷
docker compose down -v

# 重建某一个服务
docker compose up -d --force-recreate victoria-metrics

13.3 Kubernetes 部署

13.3.1 使用 Helm Chart(推荐)

# 添加 Helm 仓库
helm repo add vm https://victoriametrics.github.io/helm-charts/
helm repo update

# 搜索可用 chart
helm search repo vm/

13.3.2 单节点版 Helm 安装

# 安装单节点版
helm install victoria-metrics vm/victoria-metrics \
    -n monitoring \
    --create-namespace \
    --set server.retentionPeriod=90d \
    --set server.resources.limits.memory=4Gi \
    --set server.resources.limits.cpu=2 \
    --set server.persistentVolume.size=50Gi \
    --set server.scrape.enabled=true

13.3.3 集群版 Helm 安装

# 安装集群版
helm install victoria-metrics-cluster vm/victoria-metrics-cluster \
    -n monitoring \
    --create-namespace \
    --set vminsert.replicaCount=2 \
    --set vmselect.replicaCount=2 \
    --set vmstorage.replicaCount=3 \
    --set vmstorage.retentionPeriod=90d \
    --set vmstorage.persistentVolume.size=100Gi \
    --set vmstorage.resources.limits.memory=16Gi \
    --set vmstorage.resources.limits.cpu=4

13.3.4 自定义 values.yaml

# values-cluster.yaml
vminsert:
  replicaCount: 2
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: "2"
      memory: 4Gi
  ingress:
    enabled: true
    className: nginx
    hosts:
      - vminsert.example.com
  extraArgs:
    maxConcurrentInserts: "64"
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app.kubernetes.io/component
                  operator: In
                  values:
                    - vminsert
            topologyKey: kubernetes.io/hostname

vmselect:
  replicaCount: 2
  resources:
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      cpu: "4"
      memory: 8Gi
  ingress:
    enabled: true
    className: nginx
    hosts:
      - vmselect.example.com
  extraArgs:
    search.maxConcurrentRequests: "16"
    search.maxQueryDuration: "30s"
  cacheMountPath: /cache
  persistentVolume:
    size: 10Gi

vmstorage:
  replicaCount: 3
  retentionPeriod: 90d
  resources:
    requests:
      cpu: "2"
      memory: 8Gi
    limits:
      cpu: "8"
      memory: 32Gi
  persistentVolume:
    storageClass: gp3
    size: 100Gi
  extraArgs:
    dedup.minScrapeInterval: "15s"
    memory.allowedPercent: "60"
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/component
                operator: In
                values:
                  - vmstorage
          topologyKey: kubernetes.io/hostname

# vmagent
vmagent:
  enabled: true
  replicaCount: 1
  resources:
    requests:
      cpu: 250m
      memory: 512Mi
    limits:
      cpu: "1"
      memory: 1Gi
  extraArgs:
    promscrape.maxScrapeSize: "64MB"

# vmalert
vmalert:
  enabled: true
  replicaCount: 1
  extraArgs:
    evaluationInterval: "30s"
# 使用自定义配置安装
helm install victoria-metrics-cluster vm/victoria-metrics-cluster \
    -n monitoring \
    -f values-cluster.yaml

13.3.5 升级与回滚

# 升级
helm upgrade victoria-metrics-cluster vm/victoria-metrics-cluster \
    -n monitoring \
    -f values-cluster.yaml \
    --set vmstorage.replicaCount=5

# 查看历史
helm history victoria-metrics-cluster -n monitoring

# 回滚到版本 1
helm rollback victoria-metrics-cluster 1 -n monitoring

13.4 Kubernetes 原生 YAML 部署

13.4.1 Namespace

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
  labels:
    app.kubernetes.io/part-of: victoria-metrics

13.4.2 vmstorage StatefulSet

# vmstorage.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: vmstorage
  namespace: monitoring
spec:
  serviceName: vmstorage
  replicas: 3
  selector:
    matchLabels:
      app: vmstorage
  template:
    metadata:
      labels:
        app: vmstorage
        app.kubernetes.io/component: vmstorage
    spec:
      containers:
        - name: vmstorage
          image: victoriametrics/vmstorage:v1.106.0
          ports:
            - containerPort: 8482
              name: http
            - containerPort: 8400
              name: insert
            - containerPort: 8401
              name: select
          args:
            - '-storageDataPath=/storage'
            - '-retentionPeriod=90d'
            - '-vminsertAddr=:8400'
            - '-vmselectAddr=:8401'
            - '-httpListenAddr=:8482'
            - '-memory.allowedPercent=60'
            - '-dedup.minScrapeInterval=15s'
          resources:
            requests:
              cpu: "2"
              memory: 8Gi
            limits:
              cpu: "8"
              memory: 32Gi
          volumeMounts:
            - name: vmstorage-data
              mountPath: /storage
          livenessProbe:
            httpGet:
              path: /health
              port: 8482
            initialDelaySeconds: 15
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 8482
            initialDelaySeconds: 5
            periodSeconds: 10
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - vmstorage
              topologyKey: kubernetes.io/hostname
  volumeClaimTemplates:
    - metadata:
        name: vmstorage-data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: gp3
        resources:
          requests:
            storage: 100Gi
---
apiVersion: v1
kind: Service
metadata:
  name: vmstorage
  namespace: monitoring
spec:
  clusterIP: None
  ports:
    - port: 8482
      name: http
    - port: 8400
      name: insert
    - port: 8401
      name: select
  selector:
    app: vmstorage

13.4.3 vminsert Deployment

# vminsert.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vminsert
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      app: vminsert
  template:
    metadata:
      labels:
        app: vminsert
        app.kubernetes.io/component: vminsert
    spec:
      containers:
        - name: vminsert
          image: victoriametrics/vminsert:v1.106.0
          ports:
            - containerPort: 8480
              name: http
          args:
            - '-httpListenAddr=:8480'
            - '-storageNode=vmstorage-0.vmstorage:8400,vmstorage-1.vmstorage:8400,vmstorage-2.vmstorage:8400'
            - '-replicationFactor=2'
            - '-dedup.minScrapeInterval=15s'
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: "2"
              memory: 4Gi
          livenessProbe:
            httpGet:
              path: /health
              port: 8480
            initialDelaySeconds: 15
            periodSeconds: 30
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - vminsert
                topologyKey: kubernetes.io/hostname
---
apiVersion: v1
kind: Service
metadata:
  name: vminsert
  namespace: monitoring
spec:
  type: ClusterIP
  ports:
    - port: 8480
      name: http
  selector:
    app: vminsert

13.5 容器最佳实践

13.5.1 资源限制

# 推荐的资源配置
resources:
  requests:
    cpu: "2"         # 请求值
    memory: 8Gi
  limits:
    cpu: "4"         # 限制值(建议为 request 的 2 倍)
    memory: 16Gi     # 限制值(与 request 一致避免 OOM Kill)

13.5.2 存储

# 推荐使用 SSD 存储类
volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      storageClassName: gp3     # AWS SSD
      # storageClassName: premium-rwo  # GCP SSD
      # storageClassName: managed-premium  # Azure SSD
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 100Gi

13.5.3 优雅关闭

# 优雅关闭配置
spec:
  terminationGracePeriodSeconds: 60
  containers:
    - name: victoria-metrics
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "sleep 5"]

本章小结

部署方式适用场景复杂度
Docker 单容器开发测试
Docker Compose小规模生产
Helm Chart(单节点)中小规模
Helm Chart(集群)大规模生产
原生 YAML定制化需求

扩展阅读