强曰为道
与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

Squid 完全指南 / 13 - 监控与可视化

第十三章:监控与可视化

13.1 监控概述

有效的监控是保障 Squid 稳定运行的关键。本章介绍多种监控方案,从内置工具到企业级监控栈。

┌──────────────────────────────────────────────────────┐
│                  监控架构                              │
│                                                       │
│  ┌───────────┐  ┌───────────────┐  ┌──────────────┐ │
│  │  Squid    │  │  Cache Mgr    │  │  Prometheus  │ │
│  │  Server   │──│  (内置)       │──│  Exporter    │ │
│  └───────────┘  └───────────────┘  └──────┬───────┘ │
│         │                                  │         │
│         ▼                                  ▼         │
│  ┌───────────┐  ┌───────────────┐  ┌──────────────┐ │
│  │  日志文件  │  │  SNMP         │  │  Prometheus  │ │
│  │  access   │  │  (协议监控)    │  │  Server     │ │
│  └───────────┘  └───────────────┘  └──────┬───────┘ │
│         │                                  │         │
│         ▼                                  ▼         │
│  ┌──────────────────────────────────────────────┐   │
│  │              Grafana 可视化                    │   │
│  │  Dashboard: 命中率 | 流量 | 延迟 | 错误 | 容量 │   │
│  └──────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────┘

13.2 Cache Manager (cachemgr)

Cache Manager 是 Squid 内置的管理接口,提供丰富的运行时信息。

13.2.1 启用 Cache Manager

# Cache Manager ACL
acl manager proto cache_object
acl localhost src 127.0.0.1/32
acl management src 192.168.1.0/24

# 允许访问
http_access allow manager localhost
http_access allow manager management
http_access deny manager

# 可选:设置密码保护
# cachemgr_passwd secret_password all

13.2.2 使用 squidclient 查询

# 基本信息
squidclient -h localhost mgr:info

# 5 分钟统计
squidclient -h localhost mgr:5min

# 内存使用
squidclient -h localhost mgr:mem

# 缓存目录
squidclient -h localhost mgr:storedir

# 活跃请求
squidclient -h localhost mgr:active_requests

# 客户端列表
squidclient -h localhost mgr:client_list

# 对等体状态
squidclient -h localhost mgr:peer_list

# IP 缓存
squidclient -h localhost mgr:ipcache

# FQDN 缓存
squidclient -h localhost mgr:fqdncache

# 延迟池统计
squidclient -h localhost mgr:delay

# 带密码的查询
squidclient -h localhost -U cachemgr_passwd mgr:info

13.2.3 通过 HTTP 访问

# 直接通过浏览器或 curl 访问
curl "http://localhost:3128/squid-internal-mgr/info"
curl "http://localhost:3128/squid-internal-mgr/5min"

# 带密码
curl "http://cachemgr:password@localhost:3128/squid-internal-mgr/info"

13.2.4 Cache Manager 页面列表

页面说明
info综合信息
5min5 分钟统计
60min60 分钟统计
objects缓存对象
vm_objects内存对象
storedir缓存目录
mem内存使用
cbdata回调数据
events事件队列
peer_list对等体列表
peer_select对等体选择
client_list客户端列表
active_requests活跃请求
ipcacheIP 缓存
fqdncacheFQDN 缓存
idnsDNS 状态
delay延迟池
forward转发统计
redirector重写器状态
utilization利用率
config当前配置
shutdown关闭 Squid

13.3 SNMP 监控

13.3.1 启用 SNMP

# SNMP 配置
snmp_port 3401
acl snmppublic snmp_community public
snmp_access allow snmppublic localhost
snmp_access allow snmppublic management
snmp_access deny all
# 测试 SNMP 查询
sudo apt install -y snmp

# 查询 Squid MIB
snmpwalk -v2c -c public localhost:3401 .1.3.6.1.4.3495

# 常用 OID
# .1.3.6.1.4.3495.1.1 — cacheProtoClientStats
# .1.3.6.1.4.3495.1.2 — cacheProtoServerStats
# .1.3.6.1.4.3495.1.3 — cacheProtoStats

13.4 Prometheus + Grafana

13.4.1 安装 Squid Exporter

# 下载 squid-exporter
wget https://github.com/boynux/squid-exporter/releases/latest/download/squid-exporter-linux-amd64
chmod +x squid-exporter-linux-amd64
sudo mv squid-exporter-linux-amd64 /usr/local/bin/squid-exporter

# 创建 systemd 服务
sudo tee /etc/systemd/system/squid-exporter.service <<'EOF'
[Unit]
Description=Squid Prometheus Exporter
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/squid-exporter \
    -squid-host localhost \
    -squid-port 3128 \
    -listen :9301
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl start squid-exporter
sudo systemctl enable squid-exporter

# 验证
curl http://localhost:9301/metrics

13.4.2 Prometheus 配置

# prometheus.yml
scrape_configs:
  - job_name: 'squid'
    static_configs:
      - targets: ['localhost:9301']
    scrape_interval: 15s

13.4.3 Grafana Dashboard

{
  "dashboard": {
    "title": "Squid Proxy Dashboard",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [{
          "expr": "rate(squid_client_http_requests_total[5m])",
          "legendFormat": "Requests/sec"
        }]
      },
      {
        "title": "Cache Hit Rate",
        "type": "gauge",
        "targets": [{
          "expr": "rate(squid_client_http_hits_total[5m]) / rate(squid_client_http_requests_total[5m]) * 100",
          "legendFormat": "Hit Rate %"
        }]
      },
      {
        "title": "Bandwidth",
        "type": "graph",
        "targets": [{
          "expr": "rate(squid_client_http_kbytes_out_total[5m])",
          "legendFormat": "KB/s"
        }]
      }
    ]
  }
}

13.4.4 常用 PromQL 查询

# 请求速率
rate(squid_client_http_requests_total[5m])

# 缓存命中率
rate(squid_client_http_hits_total[5m]) / rate(squid_client_http_requests_total[5m]) * 100

# 带宽使用
rate(squid_client_http_kbytes_out_total[5m])

# 活跃连接
squid_client_http_clients

# 内存使用
squid_mem_alloc_bytes

13.5 ELK 集成

13.5.1 Filebeat 配置

# /etc/filebeat/filebeat.yml
filebeat.inputs:
  - type: log
    paths:
      - /var/log/squid/access.log
    fields:
      log_type: squid
    multiline.pattern: '^\d{10}\.\d{3}'
    multiline.negate: true
    multiline.match: after

output.elasticsearch:
  hosts: ["http://elasticsearch:9200"]
  index: "squid-%{+yyyy.MM.dd}"

setup.template:
  name: squid
  pattern: squid-*

13.5.2 Logstash 过滤

# /etc/logstash/conf.d/squid.conf
filter {
  if [fields][log_type] == "squid" {
    grok {
      match => {
        "message" => "%{NUMBER:timestamp} %{NUMBER:duration} %{IP:client} %{WORD:squid_status}/%{NUMBER:http_status} %{NUMBER:bytes} %{WORD:method} %{URI:url} %{GREEDYDATA:extra}"
      }
    }
    date {
      match => ["timestamp", "UNIX"]
    }
    geoip {
      source => "client"
    }
  }
}

13.6 自定义监控脚本

13.6.1 健康检查脚本

#!/bin/bash
# squid-health-check.sh

LOGFILE="/var/log/squid/access.log"
CACHELOG="/var/log/squid/cache.log"

# 检查进程
if ! pgrep squid > /dev/null; then
    echo "CRITICAL: Squid is not running"
    exit 2
fi

# 检查端口
if ! ss -tlnp | grep -q ":3128"; then
    echo "CRITICAL: Squid port 3128 not listening"
    exit 2
fi

# 检查响应
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" -x http://localhost:3128 http://example.com)
if [ "$RESPONSE" != "200" ]; then
    echo "WARNING: Proxy returned $RESPONSE"
    exit 1
fi

# 检查缓存命中率(最近 1000 条请求)
TOTAL=$(tail -1000 "$LOGFILE" | wc -l)
HITS=$(tail -1000 "$LOGFILE" | grep -c "TCP_HIT\|TCP_MEM_HIT")
if [ $TOTAL -gt 0 ]; then
    HIT_RATIO=$((HITS * 100 / TOTAL))
    if [ $HIT_RATIO -lt 30 ]; then
        echo "WARNING: Cache hit ratio is $HIT_RATIO%"
        exit 1
    fi
fi

# 检查错误
ERRORS=$(tail -100 "$CACHELOG" | grep -c -i "error\|critical")
if [ $ERRORS -gt 10 ]; then
    echo "WARNING: $ERRORS errors in recent cache.log"
    exit 1
fi

echo "OK: Squid is healthy"
exit 0

13.6.2 带宽监控脚本

#!/bin/bash
# squid-bandwidth-monitor.sh

LOGFILE="/var/log/squid/access.log"
INTERVAL=60  # 秒

while true; do
    # 获取当前时间戳
    NOW=$(date +%s)
    PAST=$((NOW - INTERVAL))

    # 统计最近 60 秒的流量
    BYTES=$(awk -v past="$PAST" '$1 > past {sum += $6} END {print sum}' "$LOGFILE")
    BYTES=${BYTES:-0}
    MBPS=$(echo "scale=2; $BYTES * 8 / $INTERVAL / 1000000" | bc)

    echo "$(date): Bandwidth: ${MBPS} Mbps ($((BYTES / 1024)) KB)"

    sleep $INTERVAL
done

13.7 告警配置

13.7.1 Prometheus Alertmanager

# alertmanager.yml
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email-alerts'

receivers:
  - name: 'email-alerts'
    email_configs:
      - to: 'admin@example.com'
        from: 'alertmanager@example.com'
        smarthost: 'smtp.example.com:587'

# alert-rules.yml
groups:
  - name: squid_alerts
    rules:
      - alert: SquidDown
        expr: up{job="squid"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Squid is down"

      - alert: HighErrorRate
        expr: rate(squid_client_http_errors_total[5m]) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"

      - alert: LowCacheHitRate
        expr: rate(squid_client_http_hits_total[5m]) / rate(squid_client_http_requests_total[5m]) * 100 < 30
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Cache hit rate below 30%"

13.8 监控指标总结

指标类别具体指标告警阈值
可用性进程状态、端口监听进程不存在
性能请求速率、响应时间> 500ms
缓存命中率、缓存大小< 30%
资源CPU、内存、文件描述符> 80%
网络带宽、连接数> 80% 容量
错误5xx 错误率> 5%

13.9 本章小结

监控方式适用场景复杂度
Cache Manager快速诊断
SNMP网管系统集成★★
Prometheus + Grafana企业级监控★★★
ELK Stack日志分析★★★★
自定义脚本特定需求★★

扩展阅读