强曰为道
与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

AgensGraph 完全指南 / 第 05 章:Cypher 进阶

第 05 章:Cypher 进阶

5.1 路径(Path)

路径是 Cypher 中最强大的概念之一,它表示图中顶点和边的有序交替序列。

5.1.1 路径的基本概念

路径的结构:
  (v₁)─[e₁]─(v₂)─[e₂]─(v₃)─[e₃]─(v₄)
  │                                      │
  └──────────── 路径 P ──────────────────┘

  路径长度 = 边的数量 = 3
  路径中的顶点数 = 4

5.1.2 固定长度路径

-- 长度为 2 的路径(朋友的朋友)
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->()-[:KNOWS]->(c:Person)
RETURN c.name;

-- 长度为 3 的路径
MATCH (a:Person {name: 'Alice'})-[:KNOWS*3]->(d:Person)
RETURN d.name;

-- 绑定路径变量
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*2]->(c:Person)
RETURN p, length(p) AS path_length, nodes(p) AS path_nodes;

5.1.3 变长路径(Variable-Length Path)

-- 最少 1 跳,最多 3 跳
MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
RETURN DISTINCT b.name, b.age;

-- 0 到 N 跳(包含起点自身)
MATCH (a:Person {name: 'Alice'})-[:KNOWS*0..5]->(b:Person)
RETURN DISTINCT b.name;

-- 至少 2 跳(无上限)
MATCH (a:Person {name: 'Alice'})-[:KNOWS*2..]->(b:Person)
RETURN DISTINCT b.name;
语法 含义 示例
*N 精确 N 跳 [:KNOWS*3]
*N..M N 到 M 跳 [:KNOWS*1..3]
*N.. 至少 N 跳 [:KNOWS*2..]
*..M 最多 M 跳 [:KNOWS*..5]
* 1 跳或更多 [:KNOWS*]

5.1.4 最短路径(Shortest Path)

-- 找到两个节点间的最短路径
MATCH p = shortestPath(
  (a:Person {name: 'Alice'})-[:KNOWS*]-(b:Person {name: 'Dave'})
)
RETURN p, length(p) AS distance;

-- 所有最短路径
MATCH p = allShortestPaths(
  (a:Person {name: 'Alice'})-[:KNOWS*]-(b:Person {name: 'Dave'})
)
RETURN p, length(p) AS distance;

-- 限制最大深度的最短路径
MATCH p = shortestPath(
  (a:Person {name: 'Alice'})-[:KNOWS*..6]-(b:Person {name: 'Dave'})
)
RETURN p;

5.1.5 路径函数

MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(c:Person)
RETURN
  p                        AS full_path,
  length(p)                AS hops,
  nodes(p)                 AS vertices,
  relationships(p)         AS edges,
  startNode(head(relationships(p))) AS start,
  endNode(last(relationships(p)))   AS destination;
函数 返回值 说明
length(p) 整数 路径长度(边数)
nodes(p) 顶点列表 路径中的所有顶点
relationships(p) 边列表 路径中的所有边
startNode(r) 顶点 边的起始顶点
endNode(r) 顶点 边的终止顶点
head(list) 元素 列表第一个元素
last(list) 元素 列表最后一个元素

5.2 高级聚合

5.2.1 聚合函数详解

-- 统计技术部中各职级的人数和平均薪资
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department {name: '技术部'})
RETURN
  e.title AS title,
  count(e) AS headcount,
  avg(e.salary) AS avg_salary,
  min(e.salary) AS min_salary,
  max(e.salary) AS max_salary,
  sum(e.salary) AS total_cost,
  stdev(e.salary) AS salary_stddev,
  percentileCont(e.salary, 0.5) AS median_salary,
  percentileDisc(e.salary, 0.9) AS p90_salary
ORDER BY avg_salary DESC;
聚合函数 说明 示例
count(x) 计数 count(e)
count(*) 总行数 count(*)
count(DISTINCT x) 去重计数 count(DISTINCT e.title)
avg(x) 平均值 avg(e.salary)
sum(x) 总和 sum(e.salary)
min(x) 最小值 min(e.salary)
max(x) 最大值 max(e.salary)
stdev(x) 标准差 stdev(e.salary)
stdevp(x) 总体标准差 stdevp(e.salary)
percentileCont(x, p) 连续百分位 percentileCont(e.salary, 0.5)
percentileDisc(x, p) 离散百分位 percentileDisc(e.salary, 0.9)
collect(x) 收集为列表 collect(e.name)

5.2.2 collect 聚合

collect() 将匹配的值收集为一个列表,在图查询中极为常用:

-- 收集每个部门的员工名单
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN
  d.name AS department,
  collect(e.name) AS employees,
  count(e) AS headcount;

-- collect 与 DISTINCT 配合
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN
  d.name AS department,
  collect(DISTINCT e.title) AS unique_titles;

5.2.3 分组聚合(隐式 GROUP BY)

-- Cypher 中 RETURN 中的聚合函数会自动按非聚合字段分组
-- 等价于 SQL 的 GROUP BY
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN d.name AS dept, count(e) AS cnt, avg(e.salary) AS avg_sal;

-- 按多个字段分组
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN d.name AS dept, e.title AS title, count(e) AS cnt;

5.2.4 HAVING 等价操作

Cypher 没有显式的 HAVING 关键字,使用 WHEREWITH 之后实现:

-- 找出平均薪资超过 20000 的部门
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d.name AS dept, avg(e.salary) AS avg_salary, count(e) AS cnt
WHERE avg_salary > 20000
RETURN dept, avg_salary, cnt
ORDER BY avg_salary DESC;

5.3 WITH — 管道操作

WITH 是 Cypher 中的"管道"操作符,类似于 Unix 的 | 管道,将前一步的结果传递给下一步。

5.3.1 基本用法

-- 分步查询:先找高薪员工,再找其所在部门
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WHERE e.salary > 20000
WITH e, d
RETURN e.name, e.salary, d.name AS department;

5.3.2 WITH 配合聚合过滤

-- 找出人数超过 2 人的部门及其平均薪资
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, count(e) AS emp_count, avg(e.salary) AS avg_sal
WHERE emp_count >= 2
RETURN d.name AS department, emp_count, round(avg_sal) AS avg_salary
ORDER BY emp_count DESC;

5.3.3 WITH 实现中间排序

-- 找出薪资最高的前 3 名员工,然后查询他们的部门
MATCH (e:Employee)
WITH e ORDER BY e.salary DESC LIMIT 3
MATCH (e)-[:BELONGS_TO]->(d:Department)
RETURN e.name, e.salary, d.name AS department;

5.3.4 WITH 传递多个变量

-- 多步骤管道处理
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, collect(e) AS employees, count(e) AS cnt
WHERE cnt > 1
WITH d, employees, cnt,
     reduce(total = 0, emp IN employees | total + emp.salary) AS total_salary
RETURN d.name, cnt, total_salary, round(total_salary / cnt) AS avg_salary;

5.4 UNWIND — 列表展开

UNWIND 将列表展开为多行,是 collect() 的逆操作:

-- 将列表展开为多行
UNWIND [1, 2, 3, 4, 5] AS num
RETURN num;

-- 批量创建节点
UNWIND ['Alice', 'Bob', 'Carol', 'Dave'] AS name
CREATE (:Person {name: name, created: datetime()});

-- 批量导入场景
UNWIND [
  {name: 'Alice', age: 30, city: '北京'},
  {name: 'Bob', age: 28, city: '上海'},
  {name: 'Carol', age: 32, city: '广州'}
] AS data
CREATE (:Person {
  name: data.name,
  age: data.age,
  city: data.city
});

-- 与 collect 配合(展开后处理)
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, collect(e.name) AS names
UNWIND names AS employee_name
RETURN d.name, employee_name;

5.5 FOREACH — 循环操作

-- 为路径中的每个节点添加属性
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
FOREACH (n IN nodes(p) | SET n:Visited)

-- 为路径中的每条边设置属性
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
FOREACH (r IN relationships(p) | SET r.traversed = true)

5.6 子查询与 CALL

5.6.1 CALL 子查询

-- 使用 CALL 执行子查询
MATCH (d:Department)
CALL {
  WITH d
  MATCH (e:Employee)-[:BELONGS_TO]->(d)
  RETURN count(e) AS emp_count, avg(e.salary) AS avg_salary
}
RETURN d.name, emp_count, avg_salary;

5.6.2 子查询实现 EXISTS 语义

-- 找出有下属的员工(类似 SQL EXISTS)
MATCH (e:Employee)
WHERE EXISTS {
  MATCH (e)<-[:REPORTS_TO]-(sub:Employee)
}
RETURN e.name, e.title;

-- 找出没有下属的员工(类似 SQL NOT EXISTS)
MATCH (e:Employee)
WHERE NOT EXISTS {
  MATCH (e)<-[:REPORTS_TO]-(sub:Employee)
}
RETURN e.name, e.title;

5.6.3 OPTIONAL MATCH(左连接语义)

-- 类似 SQL 的 LEFT JOIN
-- 即使没有匹配也返回左侧结果
MATCH (e:Employee)
OPTIONAL MATCH (e)-[:BELONGS_TO]->(d:Department)
RETURN e.name, COALESCE(d.name, '未分配') AS department;

5.7 CASE 条件表达式

-- 简单 CASE
MATCH (e:Employee)
RETURN e.name, e.salary,
  CASE e.title
    WHEN '技术总监' THEN '管理层'
    WHEN '高级工程师' THEN '高级'
    WHEN '工程师' THEN '中级'
    ELSE '其他'
  END AS level;

-- 搜索 CASE
MATCH (e:Employee)
RETURN e.name, e.salary,
  CASE
    WHEN e.salary >= 30000 THEN '高薪'
    WHEN e.salary >= 20000 THEN '中等'
    ELSE '基础'
  END AS salary_grade;

5.8 高级列表操作

5.8.1 列表推导式

-- 列表过滤
MATCH (d:Department)<-[:BELONGS_TO]-(e:Employee)
RETURN d.name,
  [emp IN collect(e) WHERE emp.salary > 20000 | emp.name] AS high_earners;

-- 列表变换
MATCH (e:Employee)
RETURN collect(e.name) AS names,
  [name IN collect(e.name) | toUpper(name)] AS upper_names;

5.8.2 列表函数

函数 说明 示例
size(list) 列表长度 size(collect(n))
head(list) 第一个元素 head([1,2,3]) → 1
last(list) 最后一个元素 last([1,2,3]) → 3
tail(list) 除第一个外 tail([1,2,3]) → [2,3]
reverse(list) 反转 reverse([1,2,3]) → [3,2,1]
range(start, end, step) 生成范围 range(1,10,2) → [1,3,5,7,9]
extract(x IN list | expr) 提取 extract(n IN [1,2,3] | n * 2) → [2,4,6]
filter(x IN list WHERE cond) 过滤 filter(n IN [1,2,3,4] WHERE n > 2) → [3,4]
reduce(accum = init, x IN list | expr) 归约 reduce(s=0, n IN [1,2,3] | s+n) → 6

5.8.3 reduce 聚合器

-- 使用 reduce 计算路径总权重
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
RETURN
  [n IN nodes(p) | n.name] AS path_names,
  reduce(weight = 0, r IN relationships(p) | weight + r.weight) AS total_weight;

-- 使用 reduce 拼接字符串
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, collect(e.name) AS names
RETURN d.name,
  reduce(s = '', name IN names | s + CASE WHEN s <> '' THEN ', ' ELSE '' END + name) AS employee_list;

5.9 高级业务场景:知识图谱推理

场景:构建药物交互知识图谱

-- 创建药物节点
CREATE (:Drug {name: '阿司匹林', category: '解热镇痛', dosage: '100mg'});
CREATE (:Drug {name: '华法林', category: '抗凝血', dosage: '5mg'});
CREATE (:Drug {name: '布洛芬', category: '解热镇痛', dosage: '400mg'});
CREATE (:Drug {name: '氯吡格雷', category: '抗血小板', dosage: '75mg'});

-- 创建疾病节点
CREATE (:Disease {name: '心血管疾病', severity: 'high'});
CREATE (:Disease {name: '头痛', severity: 'low'});

-- 创建药物关系
MATCH (a:Drug {name: '阿司匹林'}), (b:Drug {name: '华法林'})
CREATE (a)-[:INTERACTS_WITH {risk: 'high', effect: '增加出血风险'}]->(b);

MATCH (a:Drug {name: '阿司匹林'}), (b:Drug {name: '布洛芬'})
CREATE (a)-[:INTERACTS_WITH {risk: 'medium', effect: '降低阿司匹林效果'}]->(b);

MATCH (a:Drug {name: '阿司匹林'}), (d:Disease {name: '心血管疾病'})
CREATE (a)-[:TREATS]->(d);

MATCH (a:Drug {name: '布洛芬'}), (d:Disease {name: '头痛'})
CREATE (a)-[:TREATS]->(d);

查询:药物安全检查

-- 查找所有药物交互风险
MATCH (d1:Drug)-[r:INTERACTS_WITH]->(d2:Drug)
WHERE r.risk = 'high'
RETURN d1.name AS drug1, d2.name AS drug2, r.effect AS interaction_effect
ORDER BY r.risk;

查询:找出某疾病的所有可用药物及其交互

-- 针对心血管疾病,找出治疗药物及其与其他药物的交互
MATCH (drug:TREATS)->(disease:Disease {name: '心血管疾病'})
OPTIONAL MATCH (drug)-[r:INTERACTS_WITH]->(other:Drug)
RETURN drug.name AS treatment,
       drug.dosage AS dosage,
       COALESCE(other.name, '无已知交互') AS interacting_drug,
       COALESCE(r.effect, '-') AS effect,
       COALESCE(r.risk, '-') AS risk;

5.10 Cypher 高级操作符速查

操作 语法 说明
UNION 查询1 UNION 查询2 合并结果集(去重)
UNION ALL 查询1 UNION ALL 查询2 合并结果集(保留重复)
DISTINCT RETURN DISTINCT n 去重
OPTIONAL MATCH OPTIONAL MATCH pattern 可选匹配(左连接)
WITH WITH expr AS alias 管道传递
UNWIND UNWIND list AS item 列表展开
FOREACH FOREACH (x IN list | ops) 循环操作
CASE WHEN CASE WHEN cond THEN expr END 条件表达式
EXISTS {} WHERE EXISTS { MATCH ... } 存在性检查

5.11 本章小结

要点 说明
路径 变长路径 *N..M、最短路径 shortestPath()
聚合 count, avg, collect, reduce
WITH 管道操作,支持中间过滤和排序
UNWIND 列表展开为多行
子查询 CALL {}EXISTS {}
OPTIONAL MATCH 左连接语义

5.12 练习

  1. 在社交网络图中,使用变长路径找到 Alice 到 Dave 的所有路径(长度不超过 4)。
  2. 使用 collect()UNWIND 实现"列出每个员工的所有同事"。
  3. 使用 reduce() 计算一条路径上所有边属性的总和。
  4. 使用 EXISTS {} 子查询找出所有既是管理者又是高薪员工的人。

5.13 扩展阅读