TCP/UDP 网络协议教程 / 07-TCP 拥塞控制

07 - TCP 拥塞控制

7.1 拥塞控制概述

拥塞的原因：
多个发送方共享同一条链路

     ┌─────┐      ┌─────┐
     │  A  │──→──│     │
     └─────┘     │路由 │──→── 链路（带宽有限）──→── 接收方
     ┌─────┐     │ 器  │
     │  B  │──→──│     │
     └─────┘      └─────┘

如果 A 和 B 的发送速率之和 > 链路带宽：
→ 路由器缓冲区溢出 → 丢包

拥塞的影响：
1. 丢包 → 重传 → 更多数据 → 更多丢包
2. 严重时导致拥塞崩溃 (Congestion Collapse)

7.2 拥塞窗口 (cwnd)

发送窗口 = min(cwnd, rwnd)

cwnd (Congestion Window):
• 发送方维护的虚拟窗口
• 根据网络状况动态调整
• 单位：MSS (最大段大小)

拥塞控制的目标：
• 高吞吐量：充分利用带宽
• 低延迟：避免缓冲区堆积
• 公平性：多个连接共享带宽

7.3 经典算法：Reno

慢启动 (Slow Start)

初始状态：cwnd = 1 MSS

规则：每收到一个 ACK，cwnd += 1 MSS
效果：每个 RTT，cwnd 翻倍（指数增长）

RTT 0: cwnd = 1
RTT 1: cwnd = 2
RTT 2: cwnd = 4
RTT 3: cwnd = 8
RTT 4: cwnd = 16
...

当 cwnd >= ssthresh 时，进入拥塞避免

class SlowStart:
    """慢启动算法"""
    
    def __init__(self, ssthresh=65535):
        self.cwnd = 1  # MSS
        self.ssthresh = ssthresh
    
    def on_ack(self, acked_bytes):
        """收到 ACK 时更新 cwnd"""
        # 每收到一个 ACK，cwnd += MSS
        # 等效于每个 RTT 翻倍
        mss = 1460
        self.cwnd += (acked_bytes / mss) * 1
        
        if self.cwnd >= self.ssthresh:
            return "CONGESTION_AVOIDANCE"
        return "SLOW_START"

拥塞避免 (Congestion Avoidance)

进入条件：cwnd >= ssthresh

规则：每收到一个 ACK，cwnd += MSS * MSS / cwnd
效果：每个 RTT，cwnd 增加 1 MSS（线性增长）

这是"加性增加" (Additive Increase)

class CongestionAvoidance:
    """拥塞避免算法"""
    
    def __init__(self, cwnd, ssthresh):
        self.cwnd = cwnd
        self.ssthresh = ssthresh
    
    def on_ack(self, acked_bytes):
        """收到 ACK 时更新 cwnd"""
        mss = 1460
        # 线性增长：每个 RTT 增加 1 MSS
        self.cwnd += (acked_bytes * mss) / self.cwnd

快速恢复 (Fast Recovery)

触发条件：收到 3 个重复 ACK

步骤：
1. ssthresh = cwnd / 2
2. cwnd = ssthresh + 3 MSS
3. 重传丢失的段
4. 收到新数据的 ACK 后：cwnd = ssthresh（恢复到拥塞避免）

效果：避免了慢启动的大幅降低

乘性减少 (Multiplicative Decrease)

丢包时：
ssthresh = cwnd / 2

严重丢包（超时）：
cwnd = 1 MSS（回到慢启动）

3 个重复 ACK：
cwnd = cwnd / 2（快速恢复）

AIMD 策略：
Additive Increase, Multiplicative Decrease
增加时慢慢加（每个 RTT +1 MSS）
减少时快速减（直接减半）

class RenoCongestionControl:
    """Reno 拥塞控制算法"""
    
    SLOW_START = "slow_start"
    CONGESTION_AVOIDANCE = "congestion_avoidance"
    FAST_RECOVERY = "fast_recovery"
    
    def __init__(self):
        self.cwnd = 1.0 * 1460  # 1 MSS
        self.ssthresh = 65535
        self.state = self.SLOW_START
        self.dup_ack_count = 0
    
    def on_ack(self, ack_num, new_data=False):
        """收到 ACK"""
        if self.state == self.SLOW_START:
            if new_data:
                self.cwnd += 1460  # 每个 ACK 增加 1 MSS
                self.dup_ack_count = 0
                if self.cwnd >= self.ssthresh:
                    self.state = self.CONGESTION_AVOIDANCE
            else:
                self.dup_ack_count += 1
                if self.dup_ack_count >= 3:
                    self.ssthresh = self.cwnd / 2
                    self.cwnd = self.ssthresh + 3 * 1460
                    self.state = self.FAST_RECOVERY
        
        elif self.state == self.CONGESTION_AVOIDANCE:
            if new_data:
                self.cwnd += 1460 * (1460 / self.cwnd)
                self.dup_ack_count = 0
            else:
                self.dup_ack_count += 1
                if self.dup_ack_count >= 3:
                    self.ssthresh = self.cwnd / 2
                    self.cwnd = self.ssthresh + 3 * 1460
                    self.state = self.FAST_RECOVERY
        
        elif self.state == self.FAST_RECOVERY:
            if new_data:
                self.cwnd = self.ssthresh
                self.dup_ack_count = 0
                self.state = self.CONGESTION_AVOIDANCE
            else:
                self.cwnd += 1460  # 每个重复 ACK 增加 1 MSS
    
    def on_timeout(self):
        """超时"""
        self.ssthresh = self.cwnd / 2
        self.cwnd = 1460  # 回到 1 MSS
        self.dup_ack_count = 0
        self.state = self.SLOW_START

7.4 CUBIC 算法

CUBIC 是 Linux 默认的拥塞控制算法。

CUBIC 特点：
1. 使用三次函数（cubic function）决定窗口增长
2. 更适合高带宽长延迟网络
3. 更好的公平性

cwnd 增长函数：
W(t) = C × (t - K)³ + Wmax

C = 0.4（常数）
K = (Wmax × β / C)^(1/3)
Wmax = 丢包时的窗口大小
β = 0.7（乘法减少因子）

import math

class CUBIC:
    """CUBIC 拥塞控制算法"""
    
    C = 0.4
    BETA = 0.7
    
    def __init__(self):
        self.cwnd = 1460.0  # 1 MSS
        self.wmax = 0       # 上次丢包时的窗口
        self.k = 0          # 时间偏移
        self.epoch_start = 0  # 当前周期开始时间
    
    def on_loss(self):
        """丢包时"""
        self.wmax = self.cwnd
        self.cwnd = self.cwnd * (1 - self.BETA)
        self.k = (self.wmax * self.BETA / self.C) ** (1/3)
        self.epoch_start = 0  # 重置 epoch
    
    def update_cwnd(self, t):
        """更新窗口大小（每个 ACK 调用）"""
        if self.epoch_start == 0:
            self.epoch_start = t
        
        elapsed = t - self.epoch_start
        target = self.C * ((elapsed - self.k) ** 3) + self.wmax
        
        if target > self.cwnd:
            self.cwnd = target
        else:
            # 确保至少和标准 TCP 一样快
            self.cwnd += 1460 / self.cwnd

# 模拟 CUBIC 增长
cubic = CUBIC()
cubic.cwnd = 100 * 1460  # 初始 100 MSS
cubic.wmax = 200 * 1460  # 丢包时 200 MSS

print("CUBIC 窗口增长曲线：")
for t in range(0, 10):
    cubic.update_cwnd(t)
    mss = cubic.cwnd / 1460
    print(f"t={t}: cwnd={mss:.1f} MSS")

7.5 BBR 算法

BBR (Bottleneck Bandwidth and RTT) 是 Google 提出的基于模型的拥塞控制。

传统算法（Reno/CUBIC）的问题：
• 基于丢包的信号 → 缓冲区填满才丢包
• 无法区分拥塞丢包和随机丢包
• 无法充分利用带宽

BBR 的思路：
• 测量瓶颈带宽 (BtlBw)
• 测量最小 RTT (RTprop)
• 计算最优发送速率

目标发送速率 = BtlBw
目标在途数据量 = BtlBw × RTprop (BDP)

BBR 状态机：

┌───────────────────────────────────────────────────┐
│                                                   │
│  ┌─────────┐    ┌─────────┐    ┌─────────────┐  │
│  │ Startup │───→│  Drain  │───→│ Probe BW    │  │
│  │ (启动)   │    │ (排空)  │    │ (探测带宽)  │  │
│  └─────────┘    └─────────┘    └──────┬──────┘  │
│                                       │          │
│                               ┌───────┴───────┐  │
│                               │               │  │
│                               ▼               │  │
│                        ┌────────────┐         │  │
│                        │ Probe RTT  │─────────┘  │
│                        │ (探测 RTT) │            │
│                        └────────────┘            │
└───────────────────────────────────────────────────┘

状态	目标	增益
Startup	找到 BtlBw	2/ln(2) ≈ 2.89
Drain	排空队列	ln(2)/2 ≈ 0.35
Probe BW	维持带宽	循环 1.25, 0.75, 1, 1, 1, 1, 1, 1
Probe RTT	测量 RTprop	0.5（极小窗口）

# 启用 BBR
$ sudo sysctl -w net.ipv4.tcp_congestion_control=bbr

# 加载 BBR 模块
$ sudo modprobe tcp_bbr

# 验证
$ sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = bbr

7.6 算法对比

特性	Reno	CUBIC	BBR
信号	丢包	丢包	带宽+延迟
增长方式	线性	三次函数	模型驱动
默认系统	旧 Linux	Linux	需要手动启用
高带宽网络	差	好	最好
无线网络	差	中等	好
缓冲区填充	是	是	否
公平性	中等	好	中等

7.7 拥塞控制参数

# 查看当前拥塞控制算法
$ sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic

# 可用算法
$ sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = reno cubic bbr

# 设置默认算法
$ sudo sysctl -w net.ipv4.tcp_congestion_control=bbr

# 持久化配置
$ echo "net.ipv4.tcp_congestion_control = bbr" | sudo tee -a /etc/sysctl.d/99-tcp.conf

# 初始拥塞窗口大小
$ sysctl net.ipv4.tcp_init_cwnd

7.8 应用层影响

"""
拥塞控制对应用的影响：

1. 上传大文件时，速度受 cwnd 限制
2. 跨洋连接需要更大的窗口
3. 切换拥塞控制算法可能影响性能
"""

def estimate_transfer_time(file_size_mb, rtt_ms, cwnd_init_kb=10):
    """估算文件传输时间"""
    cwnd = cwnd_init_kb * 1024  # bytes
    rtt = rtt_ms / 1000
    transferred = 0
    time_elapsed = 0
    
    while transferred < file_size_mb * 1024 * 1024:
        # 每个 RTT 能发送的数据量
        send_size = cwnd
        transferred += send_size
        time_elapsed += rtt
        
        # 慢启动：每个 RTT 翻倍（直到 ssthresh）
        cwnd = min(cwnd * 2, 10 * 1024 * 1024)
    
    return time_elapsed

# 不同 RTT 的传输时间
for rtt in [10, 50, 100, 200]:
    t = estimate_transfer_time(100, rtt)  # 100MB
    print(f"RTT={rtt}ms: 传输 100MB 约需 {t:.1f} 秒")

7.9 注意事项

⚠️ BBR 公平性：BBR 可能抢占其他流的带宽，特别是 Reno/CUBIC 流

⚠️ BBR 版本：BBRv2 已在开发中，解决了一些公平性和丢包响应问题

⚠️ cwnd 限制：应用层无法直接控制 cwnd，但可以通过限制发送速率间接影响

⚠️ 内核版本：BBR 需要 Linux 4.9+，BBRv2 需要更新的内核

7.10 扩展阅读

下一章：08 - TCP 选项机制 - MSS、窗口缩放、时间戳、SACK