强曰为道
与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

AgensGraph 完全指南 / 第 05 章:Cypher 进阶

第 05 章:Cypher 进阶

5.1 路径(Path)

路径是 Cypher 中最强大的概念之一,它表示图中顶点和边的有序交替序列。

5.1.1 路径的基本概念

路径的结构:
  (v₁)─[e₁]─(v₂)─[e₂]─(v₃)─[e₃]─(v₄)
  │                                      │
  └──────────── 路径 P ──────────────────┘

  路径长度 = 边的数量 = 3
  路径中的顶点数 = 4

5.1.2 固定长度路径

-- 长度为 2 的路径(朋友的朋友)
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->()-[:KNOWS]->(c:Person)
RETURN c.name;

-- 长度为 3 的路径
MATCH (a:Person {name: 'Alice'})-[:KNOWS*3]->(d:Person)
RETURN d.name;

-- 绑定路径变量
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*2]->(c:Person)
RETURN p, length(p) AS path_length, nodes(p) AS path_nodes;

5.1.3 变长路径(Variable-Length Path)

-- 最少 1 跳,最多 3 跳
MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
RETURN DISTINCT b.name, b.age;

-- 0 到 N 跳(包含起点自身)
MATCH (a:Person {name: 'Alice'})-[:KNOWS*0..5]->(b:Person)
RETURN DISTINCT b.name;

-- 至少 2 跳(无上限)
MATCH (a:Person {name: 'Alice'})-[:KNOWS*2..]->(b:Person)
RETURN DISTINCT b.name;
语法含义示例
*N精确 N 跳[:KNOWS*3]
*N..MN 到 M 跳[:KNOWS*1..3]
*N..至少 N 跳[:KNOWS*2..]
*..M最多 M 跳[:KNOWS*..5]
*1 跳或更多[:KNOWS*]

5.1.4 最短路径(Shortest Path)

-- 找到两个节点间的最短路径
MATCH p = shortestPath(
  (a:Person {name: 'Alice'})-[:KNOWS*]-(b:Person {name: 'Dave'})
)
RETURN p, length(p) AS distance;

-- 所有最短路径
MATCH p = allShortestPaths(
  (a:Person {name: 'Alice'})-[:KNOWS*]-(b:Person {name: 'Dave'})
)
RETURN p, length(p) AS distance;

-- 限制最大深度的最短路径
MATCH p = shortestPath(
  (a:Person {name: 'Alice'})-[:KNOWS*..6]-(b:Person {name: 'Dave'})
)
RETURN p;

5.1.5 路径函数

MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(c:Person)
RETURN
  p                        AS full_path,
  length(p)                AS hops,
  nodes(p)                 AS vertices,
  relationships(p)         AS edges,
  startNode(head(relationships(p))) AS start,
  endNode(last(relationships(p)))   AS destination;
函数返回值说明
length(p)整数路径长度(边数)
nodes(p)顶点列表路径中的所有顶点
relationships(p)边列表路径中的所有边
startNode(r)顶点边的起始顶点
endNode(r)顶点边的终止顶点
head(list)元素列表第一个元素
last(list)元素列表最后一个元素

5.2 高级聚合

5.2.1 聚合函数详解

-- 统计技术部中各职级的人数和平均薪资
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department {name: '技术部'})
RETURN
  e.title AS title,
  count(e) AS headcount,
  avg(e.salary) AS avg_salary,
  min(e.salary) AS min_salary,
  max(e.salary) AS max_salary,
  sum(e.salary) AS total_cost,
  stdev(e.salary) AS salary_stddev,
  percentileCont(e.salary, 0.5) AS median_salary,
  percentileDisc(e.salary, 0.9) AS p90_salary
ORDER BY avg_salary DESC;
聚合函数说明示例
count(x)计数count(e)
count(*)总行数count(*)
count(DISTINCT x)去重计数count(DISTINCT e.title)
avg(x)平均值avg(e.salary)
sum(x)总和sum(e.salary)
min(x)最小值min(e.salary)
max(x)最大值max(e.salary)
stdev(x)标准差stdev(e.salary)
stdevp(x)总体标准差stdevp(e.salary)
percentileCont(x, p)连续百分位percentileCont(e.salary, 0.5)
percentileDisc(x, p)离散百分位percentileDisc(e.salary, 0.9)
collect(x)收集为列表collect(e.name)

5.2.2 collect 聚合

collect() 将匹配的值收集为一个列表,在图查询中极为常用:

-- 收集每个部门的员工名单
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN
  d.name AS department,
  collect(e.name) AS employees,
  count(e) AS headcount;

-- collect 与 DISTINCT 配合
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN
  d.name AS department,
  collect(DISTINCT e.title) AS unique_titles;

5.2.3 分组聚合(隐式 GROUP BY)

-- Cypher 中 RETURN 中的聚合函数会自动按非聚合字段分组
-- 等价于 SQL 的 GROUP BY
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN d.name AS dept, count(e) AS cnt, avg(e.salary) AS avg_sal;

-- 按多个字段分组
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN d.name AS dept, e.title AS title, count(e) AS cnt;

5.2.4 HAVING 等价操作

Cypher 没有显式的 HAVING 关键字,使用 WHEREWITH 之后实现:

-- 找出平均薪资超过 20000 的部门
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d.name AS dept, avg(e.salary) AS avg_salary, count(e) AS cnt
WHERE avg_salary > 20000
RETURN dept, avg_salary, cnt
ORDER BY avg_salary DESC;

5.3 WITH — 管道操作

WITH 是 Cypher 中的"管道"操作符,类似于 Unix 的 | 管道,将前一步的结果传递给下一步。

5.3.1 基本用法

-- 分步查询:先找高薪员工,再找其所在部门
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WHERE e.salary > 20000
WITH e, d
RETURN e.name, e.salary, d.name AS department;

5.3.2 WITH 配合聚合过滤

-- 找出人数超过 2 人的部门及其平均薪资
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, count(e) AS emp_count, avg(e.salary) AS avg_sal
WHERE emp_count >= 2
RETURN d.name AS department, emp_count, round(avg_sal) AS avg_salary
ORDER BY emp_count DESC;

5.3.3 WITH 实现中间排序

-- 找出薪资最高的前 3 名员工,然后查询他们的部门
MATCH (e:Employee)
WITH e ORDER BY e.salary DESC LIMIT 3
MATCH (e)-[:BELONGS_TO]->(d:Department)
RETURN e.name, e.salary, d.name AS department;

5.3.4 WITH 传递多个变量

-- 多步骤管道处理
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, collect(e) AS employees, count(e) AS cnt
WHERE cnt > 1
WITH d, employees, cnt,
     reduce(total = 0, emp IN employees | total + emp.salary) AS total_salary
RETURN d.name, cnt, total_salary, round(total_salary / cnt) AS avg_salary;

5.4 UNWIND — 列表展开

UNWIND 将列表展开为多行,是 collect() 的逆操作:

-- 将列表展开为多行
UNWIND [1, 2, 3, 4, 5] AS num
RETURN num;

-- 批量创建节点
UNWIND ['Alice', 'Bob', 'Carol', 'Dave'] AS name
CREATE (:Person {name: name, created: datetime()});

-- 批量导入场景
UNWIND [
  {name: 'Alice', age: 30, city: '北京'},
  {name: 'Bob', age: 28, city: '上海'},
  {name: 'Carol', age: 32, city: '广州'}
] AS data
CREATE (:Person {
  name: data.name,
  age: data.age,
  city: data.city
});

-- 与 collect 配合(展开后处理)
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, collect(e.name) AS names
UNWIND names AS employee_name
RETURN d.name, employee_name;

5.5 FOREACH — 循环操作

-- 为路径中的每个节点添加属性
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
FOREACH (n IN nodes(p) | SET n:Visited)

-- 为路径中的每条边设置属性
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
FOREACH (r IN relationships(p) | SET r.traversed = true)

5.6 子查询与 CALL

5.6.1 CALL 子查询

-- 使用 CALL 执行子查询
MATCH (d:Department)
CALL {
  WITH d
  MATCH (e:Employee)-[:BELONGS_TO]->(d)
  RETURN count(e) AS emp_count, avg(e.salary) AS avg_salary
}
RETURN d.name, emp_count, avg_salary;

5.6.2 子查询实现 EXISTS 语义

-- 找出有下属的员工(类似 SQL EXISTS)
MATCH (e:Employee)
WHERE EXISTS {
  MATCH (e)<-[:REPORTS_TO]-(sub:Employee)
}
RETURN e.name, e.title;

-- 找出没有下属的员工(类似 SQL NOT EXISTS)
MATCH (e:Employee)
WHERE NOT EXISTS {
  MATCH (e)<-[:REPORTS_TO]-(sub:Employee)
}
RETURN e.name, e.title;

5.6.3 OPTIONAL MATCH(左连接语义)

-- 类似 SQL 的 LEFT JOIN
-- 即使没有匹配也返回左侧结果
MATCH (e:Employee)
OPTIONAL MATCH (e)-[:BELONGS_TO]->(d:Department)
RETURN e.name, COALESCE(d.name, '未分配') AS department;

5.7 CASE 条件表达式

-- 简单 CASE
MATCH (e:Employee)
RETURN e.name, e.salary,
  CASE e.title
    WHEN '技术总监' THEN '管理层'
    WHEN '高级工程师' THEN '高级'
    WHEN '工程师' THEN '中级'
    ELSE '其他'
  END AS level;

-- 搜索 CASE
MATCH (e:Employee)
RETURN e.name, e.salary,
  CASE
    WHEN e.salary >= 30000 THEN '高薪'
    WHEN e.salary >= 20000 THEN '中等'
    ELSE '基础'
  END AS salary_grade;

5.8 高级列表操作

5.8.1 列表推导式

-- 列表过滤
MATCH (d:Department)<-[:BELONGS_TO]-(e:Employee)
RETURN d.name,
  [emp IN collect(e) WHERE emp.salary > 20000 | emp.name] AS high_earners;

-- 列表变换
MATCH (e:Employee)
RETURN collect(e.name) AS names,
  [name IN collect(e.name) | toUpper(name)] AS upper_names;

5.8.2 列表函数

函数说明示例
size(list)列表长度size(collect(n))
head(list)第一个元素head([1,2,3]) → 1
last(list)最后一个元素last([1,2,3]) → 3
tail(list)除第一个外tail([1,2,3]) → [2,3]
reverse(list)反转reverse([1,2,3]) → [3,2,1]
range(start, end, step)生成范围range(1,10,2) → [1,3,5,7,9]
extract(x IN list | expr)提取extract(n IN [1,2,3] | n * 2) → [2,4,6]
filter(x IN list WHERE cond)过滤filter(n IN [1,2,3,4] WHERE n > 2) → [3,4]
reduce(accum = init, x IN list | expr)归约reduce(s=0, n IN [1,2,3] | s+n) → 6

5.8.3 reduce 聚合器

-- 使用 reduce 计算路径总权重
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
RETURN
  [n IN nodes(p) | n.name] AS path_names,
  reduce(weight = 0, r IN relationships(p) | weight + r.weight) AS total_weight;

-- 使用 reduce 拼接字符串
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, collect(e.name) AS names
RETURN d.name,
  reduce(s = '', name IN names | s + CASE WHEN s <> '' THEN ', ' ELSE '' END + name) AS employee_list;

5.9 高级业务场景:知识图谱推理

场景:构建药物交互知识图谱

-- 创建药物节点
CREATE (:Drug {name: '阿司匹林', category: '解热镇痛', dosage: '100mg'});
CREATE (:Drug {name: '华法林', category: '抗凝血', dosage: '5mg'});
CREATE (:Drug {name: '布洛芬', category: '解热镇痛', dosage: '400mg'});
CREATE (:Drug {name: '氯吡格雷', category: '抗血小板', dosage: '75mg'});

-- 创建疾病节点
CREATE (:Disease {name: '心血管疾病', severity: 'high'});
CREATE (:Disease {name: '头痛', severity: 'low'});

-- 创建药物关系
MATCH (a:Drug {name: '阿司匹林'}), (b:Drug {name: '华法林'})
CREATE (a)-[:INTERACTS_WITH {risk: 'high', effect: '增加出血风险'}]->(b);

MATCH (a:Drug {name: '阿司匹林'}), (b:Drug {name: '布洛芬'})
CREATE (a)-[:INTERACTS_WITH {risk: 'medium', effect: '降低阿司匹林效果'}]->(b);

MATCH (a:Drug {name: '阿司匹林'}), (d:Disease {name: '心血管疾病'})
CREATE (a)-[:TREATS]->(d);

MATCH (a:Drug {name: '布洛芬'}), (d:Disease {name: '头痛'})
CREATE (a)-[:TREATS]->(d);

查询:药物安全检查

-- 查找所有药物交互风险
MATCH (d1:Drug)-[r:INTERACTS_WITH]->(d2:Drug)
WHERE r.risk = 'high'
RETURN d1.name AS drug1, d2.name AS drug2, r.effect AS interaction_effect
ORDER BY r.risk;

查询:找出某疾病的所有可用药物及其交互

-- 针对心血管疾病,找出治疗药物及其与其他药物的交互
MATCH (drug:TREATS)->(disease:Disease {name: '心血管疾病'})
OPTIONAL MATCH (drug)-[r:INTERACTS_WITH]->(other:Drug)
RETURN drug.name AS treatment,
       drug.dosage AS dosage,
       COALESCE(other.name, '无已知交互') AS interacting_drug,
       COALESCE(r.effect, '-') AS effect,
       COALESCE(r.risk, '-') AS risk;

5.10 Cypher 高级操作符速查

操作语法说明
UNION查询1 UNION 查询2合并结果集(去重)
UNION ALL查询1 UNION ALL 查询2合并结果集(保留重复)
DISTINCTRETURN DISTINCT n去重
OPTIONAL MATCHOPTIONAL MATCH pattern可选匹配(左连接)
WITHWITH expr AS alias管道传递
UNWINDUNWIND list AS item列表展开
FOREACHFOREACH (x IN list | ops)循环操作
CASE WHENCASE WHEN cond THEN expr END条件表达式
EXISTS {}WHERE EXISTS { MATCH ... }存在性检查

5.11 本章小结

要点说明
路径变长路径 *N..M、最短路径 shortestPath()
聚合count, avg, collect, reduce
WITH管道操作,支持中间过滤和排序
UNWIND列表展开为多行
子查询CALL {}EXISTS {}
OPTIONAL MATCH左连接语义

5.12 练习

  1. 在社交网络图中,使用变长路径找到 Alice 到 Dave 的所有路径(长度不超过 4)。
  2. 使用 collect()UNWIND 实现"列出每个员工的所有同事"。
  3. 使用 reduce() 计算一条路径上所有边属性的总和。
  4. 使用 EXISTS {} 子查询找出所有既是管理者又是高薪员工的人。

5.13 扩展阅读