NetworkManager 运维教程 / 第 11 章:故障排查
第 11 章:故障排查
11.1 排查流程总览
网络问题
│
├─ 1. 确认 NM 服务状态
│ systemctl status NetworkManager
│
├─ 2. 检查设备状态
│ nmcli device status
│
├─ 3. 检查连接状态
│ nmcli connection show --active
│
├─ 4. 检查 IP 配置
│ ip addr show
│ ip route
│
├─ 5. 检查 DNS
│ resolvectl status
│ nslookup example.com
│
├─ 6. 测试连通性
│ ping <网关>
│ ping <DNS>
│ ping 8.8.8.8
│ ping example.com
│
└─ 7. 查看日志
journalctl -u NetworkManager -f
11.2 连接失败
问题:无法获取 IP(DHCP 失败)
# 症状
nmcli device status
# eth0 ethernet disconnected --
# 步骤 1:检查物理连接
ip link show eth0
# 确认 state UP
# 步骤 2:检查 NM 管理状态
nmcli device show eth0 | grep -i managed
# GENERAL.NM-MANAGED: yes
# 如果显示 unmanaged
sudo nmcli device set eth0 managed yes
# 步骤 3:检查 DHCP 日志
journalctl -u NetworkManager | grep -i "dhcp"
# 步骤 4:手动触发 DHCP
sudo dhclient -v eth0
# 步骤 5:检查 DHCP 服务器连通性
sudo nmap -sU -p 67 255.255.255.255
# 步骤 6:尝试静态 IP
nmcli connection add type ethernet con-name "temp-static" \
ifname eth0 ipv4.method manual ipv4.addresses "192.168.1.100/24" \
ipv4.gateway "192.168.1.1"
nmcli connection up "temp-static"
问题:连接频繁断开
# 查看断开历史
journalctl -u NetworkManager | grep -i "disconnect\|deactivat"
# 检查设备状态变化
journalctl -u NetworkManager | grep -i "device.*state"
# 检查链路状态
ip monitor link
# 检查以太网自协商
ethtool eth0 | grep -i "speed\|duplex\|link"
# 检查是否有电源管理干扰
ethtool -i eth0 | grep driver
# 禁用以太网省电模式
sudo ethtool -s eth0 autoneg on
# 查看内核环缓冲区
dmesg | grep -i "eth0\|link\|carrier"
问题:连接激活超时
# 检查连接配置
nmcli connection show "problem-connection"
# 常见原因:
# 1. 指定的接口不存在
nmcli -t -f DEVICE device status
# 2. MAC 地址不匹配
nmcli connection show "problem-connection" | grep mac-address
ip link show eth0 | grep ether
# 3. MTU 不匹配
nmcli connection show "problem-connection" | grep mtu
# 修复:重置 MAC 绑定
nmcli connection modify "problem-connection" \
ethernet.cloned-mac-address ""
# 修复:重新绑定接口
nmcli connection modify "problem-connection" \
connection.interface-name eth0
11.3 IP 配置问题
问题:静态 IP 不生效
# 检查连接配置
nmcli connection show "my-connection" | grep -i "ipv4"
# 确认 ipv4.method 是 manual
nmcli connection show "my-connection" | grep ipv4.method
# ipv4.method: manual
# 确认 IP 地址正确
nmcli connection show "my-connection" | grep ipv4.addresses
# 确认连接绑定了正确的接口
nmcli connection show "my-connection" | grep connection.interface-name
# 重新应用
nmcli connection down "my-connection"
nmcli connection up "my-connection"
# 或直接 reapply
nmcli device reapply eth0
问题:多 IP 地址冲突
# 查看接口上的所有 IP
ip addr show eth0
# 检查是否有多个连接绑定了同一接口
nmcli -t -f NAME,DEVICE connection show | grep eth0
# 删除冲突的连接
nmcli connection delete "conflicting-connection"
# 查看路由表
ip route show
ip -6 route show
# 检查是否有多个默认路由
ip route | grep default
问题:IPv6 问题
# 检查 IPv6 配置
nmcli connection show "my-connection" | grep ipv6
# 临时禁用 IPv6(测试)
sudo sysctl -w net.ipv6.conf.eth0.disable_ipv6=1
# 永久禁用 IPv6
nmcli connection modify "my-connection" ipv6.method disabled
# IPv6 邻居发现问题
ip -6 neigh show
ping6 -c 3 fe80::1%eth0
# IPv6 SLAAC 不工作
# 检查 IPv6 路由通告
tcpdump -i eth0 icmp6
11.4 DNS 问题
问题:域名无法解析
# 步骤 1:检查 resolv.conf
cat /etc/resolv.conf
# 步骤 2:检查 NM 管理的 DNS
nmcli device show | grep DNS
# 步骤 3:手动 DNS 测试
nslookup example.com 8.8.8.8
dig @8.8.8.8 example.com
# 步骤 4:检查 DNS 后端状态
resolvectl status # systemd-resolved
systemctl status dnsmasq # dnsmasq
# 步骤 5:清除缓存
resolvectl flush-caches
sudo systemctl restart dnsmasq
# 步骤 6:检查防火墙规则
sudo iptables -L -n | grep 53
sudo iptables -L -n -t nat | grep 53
# 常见原因:resolv.conf 被锁定
ls -la /etc/resolv.conf
# 如果是不可变文件
sudo chattr -i /etc/resolv.conf
sudo rm /etc/resolv.conf
sudo systemctl restart NetworkManager
问题:DNS 解析缓慢
# 测试 DNS 响应时间
time nslookup example.com
# 检查 DNS 服务器列表
nmcli device show | grep DNS
# 检查搜索域(可能导致额外查询)
nmcli connection show | grep dns-search
# 使用 systemd-resolved 统计
resolvectl statistics
# 优化:减少 DNS 超时
# /etc/resolv.conf(如果手动管理)
options timeout:1 attempts:1
# 优化:使用本地 DNS 缓存
sudo apt install systemd-resolved
sudo systemctl enable --now systemd-resolved
问题:DNS 泄露
# 检查 DNS 服务器
resolvectl status | grep "DNS Servers"
# 连接 VPN 后检查
# 1. 访问 https://dnsleaktest.com
# 2. 或使用命令行工具
nslookup whoami.akamai.net
# 修复 VPN DNS 泄露
nmcli connection modify "VPN" \
ipv4.dns-priority -50 \
ipv4.ignore-auto-dns no
11.5 WiFi 问题
问题:WiFi 看不到任何网络
# 步骤 1:检查 WiFi 硬件状态
rfkill list
# 如果 soft blocked
sudo rfkill unblock wifi
# 如果 hard blocked → 物理开关或 Fn+Fx 键
# 步骤 2:检查 NM WiFi 射频状态
nmcli radio wifi
# 如果 disabled
sudo nmcli radio wifi on
# 步骤 3:检查设备状态
nmcli device status | grep wifi
# 如果 unavailable → 驱动问题
# 如果 unmanaged → NM 不管理
sudo nmcli device set wlan0 managed yes
# 步骤 4:检查驱动
lsmod | grep -i wifi
lsmod | grep -i iwl # Intel
lsmod | grep -i ath # Atheros
lsmod | grep -i rtl # Realtek
# 步骤 5:扫描
sudo nmcli device wifi rescan
sleep 2 && nmcli device wifi list
# 步骤 6:检查固件
dmesg | grep -i "firmware\|wlan\|wifi"
journalctl -u NetworkManager | grep -i "firmware"
问题:WiFi 连接失败
# 检查连接日志
journalctl -u NetworkManager | grep -i "wifi\|wlan"
# 检查 wpa_supplicant
journalctl | grep -i "wpa_supplicant"
# 检查密码
nmcli connection show "WiFi-Name" | grep wifi-sec.psk
# 检查安全类型
nmcli connection show "WiFi-Name" | grep wifi-sec.key-mgmt
# 删除并重新创建连接
nmcli connection delete "WiFi-Name"
nmcli device wifi connect "SSID" password "password"
# 检查 MAC 过滤
# 查看 MAC 地址
ip link show wlan0 | grep ether
# 尝试禁用 MAC 随机化
nmcli connection modify "WiFi-Name" \
wifi.cloned-mac-address preserve
问题:WiFi 信号弱/频繁断开
# 查看当前信号强度
iwconfig wlan0 | grep "Signal level"
# 扫描并查看信号强度
nmcli device wifi list | sort -k7 -rn
# 检查漫游
journalctl -u NetworkManager | grep -i "roam\|bssid"
# 固定到特定 AP(BSSID)
nmcli connection modify "WiFi-Name" \
wifi.bssid "AA:BB:CC:DD:EE:FF"
# 调整 WiFi 驱动参数(如果支持)
# 查看驱动参数
iwconfig wlan0
ethtool -i wlan0
11.6 VPN 问题
问题:VPN 连接失败
# 查看 VPN 日志
journalctl -u NetworkManager | grep -i "vpn"
# OpenVPN 详细日志
journalctl -u NetworkManager | grep -i "openvpn"
# WireGuard 调试
wg show
# IPSec 调试
journalctl -u strongswan
# 或
journalctl -u ipsec
# 检查 VPN 配置
nmcli connection show "VPN-Name" | grep vpn
# 检查证书
openssl x509 -in client.crt -noout -dates # 证书有效期
openssl verify -CAfile ca.crt client.crt # 证书链验证
# 检查端口连通性
nc -zv vpn.example.com 1194 # OpenVPN
nc -zuv vpn.example.com 51820 # WireGuard
问题:VPN 连接但无法访问内网
# 检查 VPN 接口
ip addr show tun0 # OpenVPN
ip addr show wg0 # WireGuard
# 检查路由
ip route | grep tun0
ip route | grep wg0
# 检查 split tunnel 配置
nmcli connection show "VPN-Name" | grep "ipv4.never-default"
# 手动添加路由
sudo ip route add 10.0.0.0/8 via $(ip addr show tun0 | grep "inet " | awk '{print $2}' | cut -d/ -f1)
# 检查 VPN 服务器端是否推送了路由
journalctl -u NetworkManager | grep -i "route\|push"
11.7 日志分析工具
关键日志命令
# NM 核心日志
journalctl -u NetworkManager
# NM Dispatcher 日志
journalctl -u NetworkManager-dispatcher
# wpa_supplicant 日志(WiFi)
journalctl | grep wpa_supplicant
# DHCP 客户端日志
journalctl | grep -i "dhclient\|dhcp"
# 内核网络日志
dmesg | grep -i "net\|eth\|wlan\|bond\|bridge\|vlan"
# 按时间范围查看
journalctl -u NetworkManager --since "1 hour ago"
journalctl -u NetworkManager --since "2026-05-10 09:00" --until "2026-05-10 12:00"
# JSON 格式输出(便于解析)
journalctl -u NetworkManager -o json-pretty | jq '.MESSAGE'
实时调试
# 开启详细日志
sudo nmcli general logging level DEBUG domains ALL
# 只调试特定域
sudo nmcli general logging level DEBUG domains WIFI,DHCP4,DEVICE
# 实时跟踪
journalctl -u NetworkManager -f
# 恢复默认日志级别
sudo nmcli general logging level INFO domains DEFAULT
日志模式匹配
# 常见错误模式
# DHCP 超时
journalctl -u NetworkManager | grep "DHCP.*timeout"
# 认证失败
journalctl -u NetworkManager | grep "auth.*fail\|authentication.*failed"
# 设备状态变化
journalctl -u NetworkManager | grep "device.*state.*changed"
# 连接失败
journalctl -u NetworkManager | grep "connection.*failed\|activation.*failed"
# DNS 解析失败
journalctl -u NetworkManager | grep "dns.*fail\|resolve.*fail"
11.8 调试工具链
# 1. ip 命令(最基础)
ip addr show # 接口和 IP
ip route show # 路由表
ip link show # 链路状态
ip neigh show # ARP 表
ip monitor # 实时监控变化
# 2. ss 命令(Socket 统计)
ss -tlnp # 监听的 TCP 端口
ss -ulnp # 监听的 UDP 端口
ss -s # 统计信息
# 3. ethtool(以太网工具)
ethtool eth0 # 接口详情
ethtool -i eth0 # 驱动信息
ethtool -S eth0 # 统计信息
ethtool -k eth0 # Offload 功能
# 4. tcpdump(抓包)
sudo tcpdump -i eth0 -n
sudo tcpdump -i eth0 port 53 # DNS 流量
sudo tcpdump -i eth0 port 67 or port 68 # DHCP 流量
# 5. nmap(端口扫描)
nmap -sn 192.168.1.0/24 # 主机发现
nmap -sU -p 53 192.168.1.1 # DNS 端口
# 6. mtr(路由追踪)
mtr 8.8.8.8
mtr -r -c 10 8.8.8.8 # 报告模式
# 7. dig/nslookup(DNS 调试)
dig +trace example.com # 完整 DNS 追踪
dig @8.8.8.8 example.com # 指定 DNS 查询
11.9 NM 重置与恢复
完全重置 NM 配置
#!/bin/bash
# nm-reset.sh - 完全重置 NM 配置(谨慎使用!)
echo "⚠️ 警告:此操作将删除所有 NM 连接配置"
read -p "确认继续? (yes/no): " confirm
[ "$confirm" != "yes" ] && exit 1
# 备份
cp -a /etc/NetworkManager /etc/NetworkManager.bak.$(date +%Y%m%d)
# 停止服务
sudo systemctl stop NetworkManager
# 删除所有连接配置
sudo rm -f /etc/NetworkManager/system-connections/*
# 删除自定义配置
sudo rm -f /etc/NetworkManager/conf.d/*.conf
# 重置主配置
sudo tee /etc/NetworkManager/NetworkManager.conf << 'EOF'
[main]
plugins=keyfile
[device]
wifi.scan-rand-mac-address=yes
EOF
# 重置 resolv.conf
sudo rm -f /etc/resolv.conf
# 启动服务
sudo systemctl start NetworkManager
# 等待自动检测
sleep 5
# 检查状态
nmcli device status
nmcli connection show
echo "重置完成。请重新配置网络连接。"
快速恢复方法
# 方法 1:恢复备份
sudo cp -a /etc/NetworkManager.bak.*/system-connections/* \
/etc/NetworkManager/system-connections/
sudo nmcli connection reload
# 方法 2:使用 DHCP 快速恢复
nmcli connection add type ethernet con-name "recovery" ifname eth0
nmcli connection up "recovery"
# 方法 3:临时静态 IP
sudo ip addr add 192.168.1.100/24 dev eth0
sudo ip route add default via 192.168.1.1
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
11.10 本章小结
| 问题类型 | 首要排查步骤 |
|---|
| 连接失败 | nmcli device status → 物理连接 → 管理状态 → DHCP |
| DNS 问题 | resolvectl status → cat /etc/resolv.conf → dig |
| WiFi 问题 | rfkill list → nmcli radio wifi → nmcli device wifi list |
| VPN 问题 | journalctl → 证书/密钥 → 端口连通性 |
| IP 冲突 | ip addr → nmcli connection show → 路由表 |
| 通用 | journalctl -u NetworkManager -f 实时查看日志 |
扩展阅读