PostgreSQL 完全指南 / 07 - 索引详解

第 07 章 · 索引详解

索引是数据库性能优化的核心手段。PostgreSQL 提供了最丰富的索引类型，以适应不同的数据和查询模式。

7.1 索引概览

索引类型	数据结构	适用场景	PG 版本
B-Tree	B+ 树	等值、范围、排序	所有版本
Hash	哈希表	纯等值查询	PG 10+（WAL 安全）
GiST	平衡树	几何、全文搜索、Range	所有版本
GIN	倒排索引	全文搜索、数组、JSONB	所有版本
BRIN	块范围摘要	大表顺序数据	PG 9.5+
Bloom	布隆过滤器	多列等值组合	PG 9.6+

7.2 B-Tree 索引

B-Tree 是默认索引类型，适用于绝大多数场景。

-- 创建 B-Tree 索引（默认类型）
CREATE INDEX idx_emp_email ON employees (email);
-- 等价于
CREATE INDEX idx_emp_email ON employees USING BTREE (email);

-- 多列索引
CREATE INDEX idx_emp_dept_salary ON employees (department, salary);

-- 倒序索引
CREATE INDEX idx_emp_salary_desc ON employees (salary DESC);

-- 表达式索引
CREATE INDEX idx_emp_lower_email ON employees (LOWER(email));

-- 部分索引（带 WHERE 条件）
CREATE INDEX idx_emp_active ON employees (email) WHERE is_active = TRUE;

-- 覆盖索引（INCLUDE，PG 11+）
CREATE INDEX idx_emp_covering ON employees (department) INCLUDE (name, salary);
-- 查询 SELECT name, salary FROM employees WHERE department = '工程部' 可以只用索引

B-Tree 适用操作

操作符	说明
`=`, `<>`	等值/不等
`<`, `<=`, `>`, `>=`	范围比较
`BETWEEN`	范围
`IN`	多值等值
`IS NULL` / `IS NOT NULL`	空值判断
`LIKE 'abc%'`	前缀匹配（可使用索引）
`LIKE '%abc'`	后缀匹配（无法使用索引）
`ORDER BY ... LIMIT`	排序+分页

⚠️ 注意事项：多列索引遵循最左前缀原则。索引 (a, b, c) 可以用于 WHERE a = 1、WHERE a = 1 AND b = 2、WHERE a = 1 AND b = 2 AND c = 3，但不能用于 WHERE b = 2。

7.3 Hash 索引

Hash 索引仅支持等值查询，但比 B-Tree 更小、更快。

CREATE INDEX idx_emp_name_hash ON employees USING HASH (name);

-- 仅适用于等值查询
SELECT * FROM employees WHERE name = 'Alice';  -- ✅ 使用 Hash 索引
SELECT * FROM employees WHERE name > 'Alice';  -- ❌ 不使用 Hash 索引

特性	B-Tree	Hash
等值查询	✅	✅（更快）
范围查询	✅	❌
排序	✅	❌
索引大小	较大	较小
WAL 安全	✅	✅（PG 10+）

⚠️ 注意事项：PG 10 之前 Hash 索引不支持 WAL，崩溃后需要重建。PG 10+ 已解决此问题。但 Hash 索引的实际使用场景较少，通常 B-Tree 已足够。

7.4 GiST 索引

GiST（Generalized Search Tree）支持多种数据类型的复杂查询。

-- 需要 btree_gist 扩展（排他约束用）
CREATE EXTENSION IF NOT EXISTS btree_gist;

-- 几何数据索引
CREATE INDEX idx_geo ON places USING GIST (location);

-- Range 类型索引
CREATE INDEX idx_res_during ON reservations USING GIST (during);

-- 全文搜索索引
CREATE INDEX idx_fts ON articles USING GIST (to_tsvector('english', content));

-- 排他约束
CREATE TABLE rooms (
    id SERIAL PRIMARY KEY,
    room_no INT,
    during TSTZRANGE,
    EXCLUDE USING GIST (room_no WITH =, during WITH &&)
);

适用操作符：<<, >>, @>, <@, &&, ~=, <-> 等。

7.5 GIN 索引

GIN（Generalized Inverted Index）倒排索引，是 PostgreSQL 处理全文搜索、数组、JSONB 的核心。

-- JSONB GIN 索引
CREATE INDEX idx_events_data ON events USING GIN (data);

-- 数组 GIN 索引
CREATE INDEX idx_products_tags ON products USING GIN (tags);

-- 全文搜索 GIN 索引
CREATE INDEX idx_articles_fts ON articles USING GIN (
    to_tsvector('simple', title || ' ' || content)
);

-- jsonb_path_ops（仅支持 @> 操作符，但更小更快）
CREATE INDEX idx_events_data_path ON events USING GIN (data jsonb_path_ops);

GIN 索引类型	支持的操作符	说明
默认 GIN	`@>`, `<@`, `?`, `?	`,` ?&`,` &&`
`jsonb_path_ops`	`@>`	仅包含查询，但体积更小

💡 技巧：如果 JSONB 查询只用 @>（包含查询），使用 jsonb_path_ops 可以大幅减小索引体积，构建速度也更快。

7.6 BRIN 索引

BRIN（Block Range Index）为超大表设计，索引极小。

-- BRIN 索引
CREATE INDEX idx_logs_created ON access_logs USING BRIN (created_at)
    WITH (pages_per_range = 128);

-- 适合场景：时序数据、自增 ID、时间戳
-- 数据特征：物理顺序与逻辑顺序高度相关

特性	B-Tree	BRIN
索引大小	大（每行一个条目）	极小（每个页范围一个摘要）
查询精度	精确	近似（需回表验证）
适用表大小	任意	超大表（百万行以上）
维护	需要 VACUUM	需要 VACUUM 更新摘要
适用数据	任意	物理有序数据

-- 比较索引大小
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'access_logs';

7.7 部分索引（Partial Index）

部分索引只索引满足特定条件的行，减少索引大小。

-- 只索引活跃用户
CREATE INDEX idx_active_users ON users (email) WHERE is_active = TRUE;

-- 只索引未完成订单
CREATE INDEX idx_pending_orders ON orders (created_at)
WHERE status IN (0, 1, 2);

-- 使用部分索引的查询必须包含索引条件
SELECT * FROM users WHERE email = 'alice@example.com' AND is_active = TRUE;  -- ✅
SELECT * FROM users WHERE email = 'alice@example.com';  -- ❌ 可能不使用部分索引

💡 技巧：部分索引在以下场景特别有用：

状态机中只有少量"活跃"记录需要查询
软删除表中大部分记录已删除
需要为 NULL 值单独建立索引

7.8 索引管理

-- 查看表的所有索引
\d+ employees

-- 查看所有索引及其大小
SELECT
    schemaname, tablename, indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS size,
    idx_scan AS scans
FROM pg_indexes
JOIN pg_stat_user_indexes USING (schemaname, tablename, indexname)
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexname::regclass) DESC;

-- 重建索引（不阻塞查询）
REINDEX INDEX CONCURRENTLY idx_emp_email;

-- 重建表的所有索引
REINDEX TABLE CONCURRENTLY employees;

-- 删除索引
DROP INDEX IF EXISTS idx_emp_email;

-- 创建索引时不阻塞写入（必须！）
CREATE INDEX CONCURRENTLY idx_new ON employees (department);

⚠️ 注意事项：生产环境创建索引务必使用 CONCURRENTLY。普通 CREATE INDEX 会锁表，阻塞所有写操作。CREATE INDEX CONCURRENTLY 耗时更长但不阻塞。

7.9 无用索引检测

-- 查找从未使用过的索引
SELECT
    schemaname, relname AS table, indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS size,
    idx_scan AS scans
FROM pg_stat_user_indexes i
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0
  AND NOT indisunique       -- 排除唯一索引
  AND NOT indisprimary      -- 排除主键
ORDER BY pg_relation_size(i.indexrelid) DESC;

-- 查找重复/冗余索引
SELECT
    a.indexrelid::regclass AS index_a,
    b.indexrelid::regclass AS index_b,
    pg_size_pretty(pg_relation_size(a.indexrelid)) AS size_a
FROM pg_index a
JOIN pg_index b ON a.indrelid = b.indrelid
    AND a.indexrelid <> b.indexrelid
    AND a.indkey::TEXT LIKE b.indkey::TEXT || '%'
WHERE a.indrelid::regclass::TEXT NOT LIKE 'pg_%';

7.10 索引策略最佳实践

索引设计 Checklist

检查项	说明
☐ WHERE 条件列	高频查询的过滤列
☐ JOIN 列	外键/关联列
☐ ORDER BY 列	排序列
☐ 覆盖索引	INCLUDE 常用列，避免回表
☐ 部分索引	只索引需要的部分
☐ 表达式索引	函数/计算列
☐ 索引数量	每个表不超过 5-6 个
☐ 索引大小	监控索引膨胀

常见索引误区

误区	正确做法
每列都建索引	只在高频查询的列建索引
`CREATE INDEX` 不用 `CONCURRENTLY`	生产必须用 `CONCURRENTLY`
不关注索引膨胀	定期 REINDEX 或使用 pg_repack
LIKE ‘%xxx’ 建 B-Tree	前缀模糊用 pg_trgm GIN
小表建索引	小于 1 万行的表通常不需要索引

业务场景

场景	推荐索引
用户登录查询	B-Tree（email 唯一索引）
商品模糊搜索	GIN（pg_trgm）
JSONB 条件查询	GIN（data）或 GIN（data jsonb_path_ops）
时序数据查询	BRIN（created_at）或 B-Tree
地理位置查询	GiST（PostGIS geometry）
全文搜索	GIN（tsvector）
标签过滤	GIN（tags 数组）