SQL 从入门到精通 —— 大师手笔,深入浅出

作者:taohuaracing

这不是一本罗列语法的文档。这是一个老程序员手把手带你 理解 SQL 的思维方式

读完它,你不仅能写 SQL,还能 像数据库一样思考


📖 目录

  1. 为什么要学 SQL?
  2. SQL 的本质——先理解,再记忆
  3. 环境准备——动手实践才是王道
  4. 基础篇 —— CURD 四板斧
  5. 进阶篇 —— 筛选与排序
  6. 核心篇 —— JOIN 与表关系
  7. 聚合与分组
  8. 子查询与 CTE
  9. 窗口函数 —— 高手的分水岭
  10. 索引与性能——别让你的查询慢成狗
  11. 实战篇 —— 从零到一搭一个真实场景
  12. 常见陷阱与最佳实践
  13. 给新手的进阶路线图
  14. 参考与出处

1. 为什么要学 SQL?

一句话:数据是新时代的石油,SQL 是抽油机。

无论你是:

  • 后端开发 —— 80% 的接口背后是数据库查询
  • 数据分析师 —— SQL 是安身立命的根本
  • 产品经理 —— 会 SQL 的产品比别人的需求清晰十倍
  • 运维 / DBA —— 你懂的

SQL 不会过时。从 1974 年诞生到今天,它依然是数据领域的通用语言。


2. SQL 的本质——先理解,再记忆

很多初学者死记语法,然后抱怨 SQL 难。问题不在你,在方法

🧠 核心思维:SQL = 集合论

SQL 的底层是关系代数。说人话:

每一条 SQL 语句,都是对「集合」的操作。

  • SELECT — 从集合中挑出某些列
  • WHERE — 过滤集合中的行
  • JOIN — 把两个集合拼在一起
  • GROUP BY — 把一个集合拆成多个子集合
  • HAVING — 过滤子集合

一旦你理解了 SQL 操作的是「集合」,而不是「一行一行」的数据,你就打通了任督二脉。

⚡ 执行顺序(背下来,刻进 DNA)

这不是你写的顺序,这是数据库执行 SQL 的顺序:

FROM        →  1. 先确定数据来源(表 / JOIN 结果)
WHERE       →  2. 过滤行
GROUP BY    →  3. 分组
HAVING      →  4. 过滤分组
SELECT      →  5. 挑选列(可以起别名)
ORDER BY    →  6. 排序
LIMIT       →  7. 取前 N 行

💡 Why this matters: 为什么 WHERE 里不能直接用 SELECT 中定义的别名?因为 SELECT 在第 5 步才执行,WHERE 在第 2 步。

为什么 HAVING 可以用聚合函数而 WHERE 不行?因为 WHERE 执行时还没分组。

理解了执行顺序,这些「为什么」就自然懂了。


3. 环境准备——动手实践才是王道

看一万遍不如写一遍。 以下三个选择,三选一:

🅰 最简单:SQLite + 在线工具

  • 推荐: https://sqliteonline.com/ (在线,零安装)
  • 或者 https://www.db-fiddle.com/
  • 本地装 SQLite:pip install sqlite3 / brew install sqlite3

🅱 真实开发:MySQL / PostgreSQL

# Windows (用 Scoop / Chocolatey)
scoop install mysql
# 或者 Docker
docker run --name mysql-demo -e MYSQL_ROOT_PASSWORD=root -p 3306:3306 -d mysql:8.0
# Mac
brew install mysql
brew services start mysql

🅲 推荐新手方案

SQLite 起步,因为它零配置、轻量、语法干净。等 SQLite 玩熟了,再切 MySQL / PostgreSQL 几乎无障碍——核心语法 95% 一样。

📦 本文配套练习数据

建议你直接建一个库,把下面的表和数据跑一遍:

-- ============ 员工表 ============
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department TEXT NOT NULL,
    salary REAL NOT NULL,
    age INTEGER,
    join_date DATE
);

INSERT INTO employees VALUES (1, '张三', '技术部', 15000, 28, '2021-03-01');
INSERT INTO employees VALUES (2, '李四', '技术部', 18000, 32, '2020-06-15');
INSERT INTO employees VALUES (3, '王五', '市场部', 12000, 26, '2022-01-10');
INSERT INTO employees VALUES (4, '赵六', '市场部', 14000, 30, '2021-08-20');
INSERT INTO employees VALUES (5, '翠花', '人事部', 11000, 35, '2019-11-01');
INSERT INTO employees VALUES (6, '狗蛋', '技术部', 22000, 38, '2018-04-10');
INSERT INTO employees VALUES (7, '小明', '市场部', 9000, 23, '2023-02-28');
INSERT INTO employees VALUES (8, '小红', '人事部', 13000, 29, '2020-09-05');

-- ============ 订单表 ============
CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_name TEXT NOT NULL,
    product TEXT NOT NULL,
    amount REAL NOT NULL,
    order_date DATE,
    employee_id INTEGER,
    FOREIGN KEY (employee_id) REFERENCES employees(id)
);

INSERT INTO orders VALUES (1, '阿里巴巴', '服务器', 50000, '2024-01-15', 1);
INSERT INTO orders VALUES (2, '腾讯', '云服务', 80000, '2024-01-20', 2);
INSERT INTO orders VALUES (3, '阿里巴巴', '数据库', 30000, '2024-02-10', 6);
INSERT INTO orders VALUES (4, '字节跳动', 'AI平台', 120000, '2024-02-15', 1);
INSERT INTO orders VALUES (5, '腾讯', '安全服务', 45000, '2024-03-01', 2);
INSERT INTO orders VALUES (6, '小米', '云存储', 25000, '2024-03-10', 6);
INSERT INTO orders VALUES (7, '阿里巴巴', 'CDN', 35000, '2024-03-20', 1);
INSERT INTO orders VALUES (8, '华为', 'AI平台', 95000, '2024-04-05', 2);

4. 基础篇 —— CURD 四板斧

4.1 SELECT —— 查

-- 查所有列
SELECT * FROM employees;

-- 查指定列
SELECT name, salary FROM employees;

-- 列运算
SELECT name, salary * 12 AS annual_salary FROM employees;

-- 字符串拼接 (不同数据库语法略有不同)
-- SQLite: ||
SELECT name || ' - ' || department AS info FROM employees;
-- MySQL: CONCAT(name, ' - ', department)
-- PG:    name || ' - ' || department

-- 常量列
SELECT name, '在职' AS status FROM employees;

4.2 INSERT —— 增

-- 指定列插入 (推荐)
INSERT INTO employees (name, department, salary, age, join_date)
VALUES ('大壮', '技术部', 16000, 27, '2024-05-01');

-- 插入全部列 (顺序必须匹配表定义)
INSERT INTO employees VALUES (10, '小芳', '人事部', 10000, 24, '2024-06-01');

-- 批量插入
INSERT INTO employees (name, department, salary, age, join_date) VALUES
    ('阿强', '市场部', 11000, 25, '2024-04-01'),
    ('阿珍', '技术部', 19000, 31, '2023-10-01');

-- 从查询结果插入
INSERT INTO high_salary_employees (name, salary)
SELECT name, salary FROM employees WHERE salary > 15000;

4.3 UPDATE —— 改

-- ⚠️ 一定要加 WHERE!不加就全改了!
UPDATE employees SET salary = 16000 WHERE name = '小明';

-- 更新多列
UPDATE employees 
SET salary = salary * 1.1, age = age + 1
WHERE department = '技术部';

-- 用子查询更新
UPDATE orders 
SET amount = amount * 0.95
WHERE employee_id = (SELECT id FROM employees WHERE name = '张三');

4.4 DELETE —— 删

-- ⚠️ 一定要加 WHERE!不加就清表了!
DELETE FROM employees WHERE name = '大壮';

-- 删除所有行但保留表结构
DELETE FROM employees;   -- 👎 慢,逐行删
TRUNCATE TABLE employees; -- 👍 快,一次性释放

-- 用子查询删除
DELETE FROM orders
WHERE employee_id IN (SELECT id FROM employees WHERE department = '市场部');

💡 黄金法则: INSERT 是原子操作,UPDATE 和 DELETE 一定要先 SELECT 确认再执行:

-- 先看看要改哪些
SELECT * FROM employees WHERE name = '小明';
-- 确认无误再改
UPDATE employees SET salary = 16000 WHERE name = '小明';

5. 进阶篇 —— 筛选与排序

5.1 WHERE 条件筛选

-- ========== 比较运算符 ==========
SELECT * FROM employees WHERE salary > 15000;
SELECT * FROM employees WHERE salary BETWEEN 10000 AND 15000;

-- ========== 字符串匹配 ==========
-- % 匹配任意多个字符,_ 匹配一个字符
SELECT * FROM employees WHERE name LIKE '张%';  -- 张开头
SELECT * FROM employees WHERE name LIKE '%小%'; -- 包含"小"
SELECT * FROM employees WHERE name LIKE '张_';  -- 张+一个字

-- ========== IN 运算符 ==========
SELECT * FROM employees WHERE department IN ('技术部', '人事部');

-- ========== NULL 判断 (不是 = NULL!) ==========
SELECT * FROM employees WHERE salary IS NULL;   -- ✅
SELECT * FROM employees WHERE salary = NULL;    -- ❌ 永远不成立!

-- ========== 复合条件 ==========
SELECT * FROM employees 
WHERE department = '技术部' 
  AND salary > 15000 
  AND (age < 30 OR age > 35);

5.2 ORDER BY 排序

-- 升序 (默认)
SELECT name, salary FROM employees ORDER BY salary;

-- 降序
SELECT name, salary FROM employees ORDER BY salary DESC;

-- 多字段排序
SELECT department, salary, name 
FROM employees 
ORDER BY department ASC, salary DESC;

-- 用数字代表第几列 (不推荐,维护噩梦)
SELECT name, department, salary FROM employees ORDER BY 3 DESC;

5.3 LIMIT 分页

-- 前 3 条
SELECT * FROM employees ORDER BY salary DESC LIMIT 3;

-- 跳过 2 条取 3 条 (第 3~5 名)
SELECT * FROM employees ORDER BY salary DESC LIMIT 3 OFFSET 2;
-- 简写: LIMIT 2, 3  (MySQL only,注意顺序是 offset, limit)

-- 分页通用公式:
-- LIMIT page_size OFFSET (page_number - 1) * page_size
-- 第 2 页,每页 3 条
SELECT * FROM employees ORDER BY salary DESC LIMIT 3 OFFSET 3;

5.4 DISTINCT 去重

-- 有哪些部门?
SELECT DISTINCT department FROM employees;

-- 多列去重 (组合唯一)
SELECT DISTINCT department, age FROM employees;

-- DISTINCT 的性能问题: 它需要对结果排序去重,大表上慎用
-- 替代方案:GROUP BY 通常更快
SELECT department FROM employees GROUP BY department;

6. 核心篇 —— JOIN 与表关系

这是 SQL 最重要的概念,没有之一。

数据库设计的第一课:用关联表,不要把所有东西塞一个表里。

6.1 INNER JOIN

只返回两表中匹配的行:

-- 查询每个订单对应的员工
SELECT o.id AS order_id, 
       o.product, 
       o.amount, 
       e.name AS employee_name
FROM orders o
INNER JOIN employees e ON o.employee_id = e.id;

-- 结果: 只有有订单的员工才会出现。没有订单的员工不显示。

6.2 LEFT / RIGHT JOIN

左连接保留左表所有行,右表不匹配时填 NULL:

-- 显示所有员工及其订单(包括没订单的员工)
SELECT e.name, 
       e.department, 
       o.product, 
       o.amount
FROM employees e
LEFT JOIN orders o ON e.id = o.employee_id;

-- 结果: 所有员工都在,没订单的 product/amount 为 NULL

💡 LEFT JOIN 常用套路:找"没有"的东西

-- 哪些员工没有任何订单?
SELECT e.name, e.department
FROM employees e
LEFT JOIN orders o ON e.id = o.employee_id
WHERE o.id IS NULL;

6.3 CROSS JOIN

笛卡尔积——左表每行 × 右表每行:

-- 每个员工 × 每个部门(别在实际场景乱用)
SELECT e.name, d.department
FROM employees e
CROSS JOIN (SELECT DISTINCT department FROM employees) d;

6.4 SELF JOIN

自己连自己——同一张表起两个别名:

-- 找到同一部门里,工资比我高的人
SELECT e1.name AS employee,
       e2.name AS higher_earner,
       e1.salary AS my_salary,
       e2.salary AS their_salary
FROM employees e1
INNER JOIN employees e2 
    ON e1.department = e2.department 
   AND e1.salary < e2.salary;

7. 聚合与分组

7.1 聚合函数

-- 常用聚合: COUNT, SUM, AVG, MAX, MIN
SELECT COUNT(*)           AS total_employees,
       SUM(salary)        AS total_salary,
       AVG(salary)        AS avg_salary,
       MAX(salary)        AS max_salary,
       MIN(salary)        AS min_salary
FROM employees;

-- COUNT(*) vs COUNT(列)
SELECT COUNT(*)      FROM employees;           -- 8 (所有行)
SELECT COUNT(age)    FROM employees;           -- 8 (非 NULL 的行数)
SELECT COUNT(DISTINCT department) FROM employees; -- 3

7.2 GROUP BY + HAVING

-- 每个部门的统计
SELECT department,
       COUNT(*)       AS emp_count,
       AVG(salary)    AS avg_salary,
       SUM(salary)    AS total_salary
FROM employees
GROUP BY department;

-- HAVING: 过滤分组 (WHERE 是对行过滤,HAVING 是对组过滤)
SELECT department,
       AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 13000;

-- 完整的 SQL 实例(写上最佳风格)
SELECT 
    department,
    COUNT(*) AS emp_count,
    ROUND(AVG(salary), 0) AS avg_salary
FROM employees
WHERE join_date >= '2020-01-01'  -- 先过滤行
GROUP BY department
HAVING COUNT(*) >= 2              -- 再过滤组
ORDER BY avg_salary DESC          -- 排序
LIMIT 3;

-- 💡 执行顺序: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

8. 子查询与 CTE

8.1 标量子查询

返回单个值的子查询,可以用在 SELECT、WHERE 里:

-- 在 SELECT 中: 每人工资与平均的差距
SELECT name, salary,
       (SELECT AVG(salary) FROM employees) AS avg_salary,
       salary - (SELECT AVG(salary) FROM employees) AS diff
FROM employees;

-- 在 WHERE 中: 高于平均工资的人
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

8.2 表子查询

返回多行多列,必须起别名:

-- 找出高于部门平均工资的员工
SELECT e.name, e.department, e.salary, dept_avg.avg_salary
FROM employees e
INNER JOIN (
    SELECT department, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department
) dept_avg ON e.department = dept_avg.department
WHERE e.salary > dept_avg.avg_salary;

8.3 EXISTS / IN

-- IN: 有订单的员工
SELECT * FROM employees 
WHERE id IN (SELECT DISTINCT employee_id FROM orders);

-- EXISTS: 更高效的写法(尤其大表)
SELECT * FROM employees e
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.employee_id = e.id
);

-- NOT EXISTS: 没订单的员工
SELECT * FROM employees e
WHERE NOT EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.employee_id = e.id
);

-- 💡 EXISTS vs IN:
-- 1. EXISTS 找到第一个匹配就停止,IN 要生成完整结果集
-- 2. 如果子查询结果集很大,EXISTS 通常更快
-- 3. IN 不能处理 NULL(NULL IN (...) 返回 NULL 不是 FALSE)
-- 4. 现代优化器在很多场景下会等价重写,但 EXISTS 写法更清晰

8.4 CTE (WITH) —— 现代 SQL 的基石

CTE (Common Table Expression) 是给子查询起名字。它让你把复杂的 SQL 拆成清晰的步骤。

-- 基础 CTE
WITH high_salary AS (
    SELECT * FROM employees WHERE salary > 15000
)
SELECT name, department, salary
FROM high_salary
ORDER BY salary DESC;

-- 多层 CTE (这才是威力所在)
WITH 
dept_stats AS (
    SELECT department, 
           AVG(salary) AS avg_salary,
           COUNT(*) AS emp_count
    FROM employees
    GROUP BY department
),
top_employees AS (
    SELECT e.name, e.department, e.salary
    FROM employees e
    INNER JOIN dept_stats ds ON e.department = ds.department
    WHERE e.salary > ds.avg_salary
)
SELECT * FROM top_employees ORDER BY salary DESC;

-- CTE 可以复用同一个名字多次引用
WITH department_counts AS (
    SELECT department, COUNT(*) AS cnt
    FROM employees
    GROUP BY department
)
SELECT 
    (SELECT cnt FROM department_counts WHERE department = '技术部') AS tech_count,
    (SELECT cnt FROM department_counts WHERE department = '市场部') AS market_count,
    (SELECT cnt FROM department_counts WHERE department = '人事部') AS hr_count;

💡 CTE 的最佳实践:

  • 把复杂查询拆成:取数据 → 算中间结果 → 最终输出 三个步骤
  • 每个 CTE 只做一件事
  • 名字起得有意义(dept_stats 而不是 cte1
  • 调试时可以先 SELECT * FROM 某个CTE 单独跑

9. 窗口函数 —— 高手的分水岭

窗口函数 = 保留所有行 + 在行之间做计算。

普通 GROUP BY 会把多行压成一行(丢失细节)。
窗口函数不压缩行数,而是在每行之外额外计算聚合值。

9.1 ROW_NUMBER / RANK / DENSE_RANK

-- 按工资排名
SELECT name, department, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,      -- 1,2,3,4,5...
       RANK()       OVER (ORDER BY salary DESC) AS rank,         -- 1,2,2,4,5... (并列跳号)
       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank    -- 1,2,2,3,4... (并列不跳号)
FROM employees;

-- 每个部门内部分组排名
SELECT name, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

-- 每个部门工资最高的前两名
WITH ranked AS (
    SELECT name, department, salary,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT * FROM ranked WHERE rn <= 2;

9.2 LAG / LEAD

前后行引用——用于计算同比、环比:

-- 按工资排序,看前后人的工资
SELECT name, salary,
       LAG(salary, 1)  OVER (ORDER BY salary) AS prev_salary,
       LEAD(salary, 1) OVER (ORDER BY salary) AS next_salary,
       salary - LAG(salary, 1) OVER (ORDER BY salary) AS diff
FROM employees;

-- 用金额分析订单增长
SELECT 
    order_date,
    SUM(amount) AS daily_amount,
    LAG(SUM(amount), 1) OVER (ORDER BY order_date) AS prev_day,
    SUM(amount) - LAG(SUM(amount), 1) OVER (ORDER BY order_date) AS growth
FROM orders
GROUP BY order_date
ORDER BY order_date;

9.3 SUM / AVG 窗口

移动/累计统计:

-- 累计求和(running total)
SELECT 
    order_date,
    product,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

-- 分区累计(每个员工的累计订单额)
SELECT 
    e.name,
    o.order_date,
    o.amount,
    SUM(o.amount) OVER (
        PARTITION BY e.id 
        ORDER BY o.order_date
    ) AS running_total
FROM employees e
LEFT JOIN orders o ON e.id = o.employee_id;

-- 移动平均(最近 3 天的平均)
SELECT 
    order_date,
    SUM(amount) AS daily_amount,
    AVG(SUM(amount)) OVER (
        ORDER BY order_date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3day
FROM orders
GROUP BY order_date
ORDER BY order_date;

💡 窗口函数框架语法:

SUM() OVER (
    PARTITION BY ...   -- 分组(可选)
    ORDER BY ...       -- 排序(可选)
    ROWS BETWEEN ...   -- 窗口范围(可选)
)

窗口范围选项:

  • ROWS UNBOUNDED PRECEDING — 从分区第一行到当前行
  • ROWS BETWEEN 2 PRECEDING AND CURRENT ROW — 前两行到当前行
  • ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING — 前后各一行
  • RANGEROWS 的区别:RANGE 按值范围,ROWS 按行数

10. 索引与性能——别让你的查询慢成狗

为什么查询慢?

数据库的数据存在磁盘上。没有索引 = 全表扫描 = 从头到尾翻一遍。

有索引 = 像书的目录一样直接翻到那一页。

什么时候建索引?

-- 在经常 WHERE 的列上建索引
CREATE INDEX idx_employees_department ON employees(department);

-- 在经常 JOIN 的列上建索引
CREATE INDEX idx_orders_employee_id ON orders(employee_id);

-- 复合索引(多列联合查询时)
CREATE INDEX idx_employees_dept_salary ON employees(department, salary);

索引的代价

  • 写入变慢(建索引的表,INSERT/UPDATE/DELETE 要同时维护索引)
  • 占用磁盘空间
  • 不是越多越好

怎么知道查询慢不慢?

-- SQLite
EXPLAIN QUERY PLAN SELECT * FROM employees WHERE salary > 15000;

-- MySQL
EXPLAIN SELECT * FROM employees WHERE salary > 15000;

-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM employees WHERE salary > 15000;

💡 索引黄金法则

应该建索引 不应该建索引
频繁出现在 WHERE 中的列 几乎不查询的列
频繁 JOIN 的列 值非常少的列(如性别、布尔值)
需要排序/分组的列 频繁更新的列
高选择性的列(唯一值多) 大文本/BLOB列

11. 实战篇 —— 从零到一搭一个真实场景

11.1 电商订单系统

-- ========== 建表 ==========
CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT UNIQUE,
    city TEXT,
    created_at DATE DEFAULT CURRENT_DATE
);

CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    category TEXT,
    price REAL NOT NULL,
    stock INTEGER DEFAULT 0
);

CREATE TABLE orders_shop (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    order_date DATE DEFAULT CURRENT_DATE,
    status TEXT DEFAULT 'pending'  -- pending, paid, shipped, cancelled
);

CREATE TABLE order_items (
    id INTEGER PRIMARY KEY,
    order_id INTEGER REFERENCES orders_shop(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    unit_price REAL NOT NULL
);

-- ========== 插入示例数据 ==========
INSERT INTO customers (name, email, city) VALUES
    ('张三', 'zhangsan@email.com', '北京'),
    ('李四', 'lisi@email.com', '上海'),
    ('王五', 'wangwu@email.com', '广州');

INSERT INTO products (name, category, price, stock) VALUES
    ('iPhone 16', '手机', 8999, 50),
    ('MacBook Pro', '电脑', 14999, 30),
    ('AirPods Pro', '耳机', 1999, 100),
    ('iPad Air', '平板', 4999, 40);

INSERT INTO orders_shop (customer_id, order_date, status) VALUES
    (1, '2024-06-01', 'paid'),
    (1, '2024-06-05', 'shipped'),
    (2, '2024-06-10', 'paid'),
    (3, '2024-06-15', 'cancelled');

INSERT INTO order_items (order_id, product_id, quantity, unit_price) VALUES
    (1, 1, 1, 8999),
    (1, 3, 2, 1999),
    (2, 2, 1, 14999),
    (3, 4, 1, 4999);

-- ========== 实战查询 ==========

-- 1. 某客户的订单总金额
SELECT c.name, SUM(oi.quantity * oi.unit_price) AS total_spent
FROM customers c
JOIN orders_shop os ON c.id = os.customer_id
JOIN order_items oi ON os.id = oi.order_id
WHERE os.status != 'cancelled'
GROUP BY c.id;

-- 2. 热销产品排行榜
SELECT p.name, 
       SUM(oi.quantity) AS total_sold,
       SUM(oi.quantity * oi.unit_price) AS revenue
FROM products p
JOIN order_items oi ON p.id = oi.product_id
JOIN orders_shop os ON oi.order_id = os.id
WHERE os.status != 'cancelled'
GROUP BY p.id
ORDER BY total_sold DESC;

-- 3. 各城市消费排行
SELECT c.city,
       COUNT(DISTINCT c.id) AS customer_count,
       SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM customers c
JOIN orders_shop os ON c.id = os.customer_id
JOIN order_items oi ON os.id = oi.order_id
WHERE os.status != 'cancelled'
GROUP BY c.city
ORDER BY total_revenue DESC;

-- 4. 复购分析(购超过1单的客户)
SELECT c.name, c.email, COUNT(os.id) AS order_count
FROM customers c
JOIN orders_shop os ON c.id = os.customer_id
WHERE os.status != 'cancelled'
GROUP BY c.id
HAVING COUNT(os.id) > 1;

11.2 用户行为分析

-- ========== 用户行为日志表 ==========
CREATE TABLE event_logs (
    id INTEGER PRIMARY KEY,
    user_id INTEGER,
    event_name TEXT,      -- 'page_view', 'click', 'purchase', 'login'
    page_url TEXT,
    event_time TIMESTAMP,
    duration_seconds INTEGER
);

-- 插入示例
INSERT INTO event_logs VALUES
(1, 1, 'login', '/login', '2024-06-01 09:00:00', NULL),
(2, 1, 'page_view', '/products', '2024-06-01 09:05:00', 30),
(3, 1, 'click', '/products/1', '2024-06-01 09:05:30', NULL),
(4, 1, 'purchase', '/checkout', '2024-06-01 09:10:00', 120),
(5, 2, 'login', '/login', '2024-06-01 10:00:00', NULL),
(6, 2, 'page_view', '/search', '2024-06-01 10:02:00', 15),
(7, 2, 'login', '/login', '2024-06-02 14:00:00', NULL),
(8, 2, 'page_view', '/products', '2024-06-02 14:05:00', 45),
(9, 2, 'purchase', '/checkout', '2024-06-02 14:10:00', 90);

-- 1. 用户会话分析(用户每次登录到购买的事件序列)
WITH user_sessions AS (
    SELECT *,
           SUM(CASE WHEN event_name = 'login' THEN 1 ELSE 0 END) 
               OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
    FROM event_logs
)
SELECT user_id, session_id,
       MIN(event_time) AS session_start,
       MAX(event_time) AS session_end,
       GROUP_CONCAT(event_name, ' → ') AS event_sequence
FROM user_sessions
GROUP BY user_id, session_id;

-- 2. 转化漏斗(从访问到购买的转化率)
WITH funnel AS (
    SELECT 
        COUNT(DISTINCT CASE WHEN event_name = 'page_view' THEN user_id END) AS visited,
        COUNT(DISTINCT CASE WHEN event_name = 'click' THEN user_id END) AS clicked,
        COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN user_id END) AS purchased
    FROM event_logs
)
SELECT visited, clicked,
       ROUND(1.0 * clicked / visited * 100, 1) AS click_rate,
       purchased,
       ROUND(1.0 * purchased / visited * 100, 1) AS purchase_rate
FROM funnel;

-- 3. 每日活跃用户(DAU)
SELECT DATE(event_time) AS day,
       COUNT(DISTINCT user_id) AS dau
FROM event_logs
GROUP BY day
ORDER BY day;

12. 常见陷阱与最佳实践

❌ 坑 1:NULL 参与运算

-- NULL + 10 = NULL (不是 10!)
-- NULL = NULL → 未知(不是 TRUE!)
-- NULL IN (1,2,3) → 未知(不是 FALSE!)

-- ✅ 正确处理
SELECT name, salary + IFNULL(bonus, 0) AS total_salary FROM employees;
SELECT * FROM employees WHERE bonus IS NULL;           -- ✅
SELECT * FROM employees WHERE bonus = NULL;            -- ❌
SELECT * FROM employees WHERE bonus IN (500, 1000);    -- NULL 不会匹配
SELECT * FROM employees WHERE bonus NOT IN (500, 1000); -- ⚠️ NULL 行会被排除!

❌ 坑 2:HAVING vs WHERE

-- ❌ 错误
SELECT department, COUNT(*) 
FROM employees
WHERE COUNT(*) > 2       -- WHERE 里不能有聚合函数
GROUP BY department;

-- ✅ 正确
SELECT department, COUNT(*) 
FROM employees
GROUP BY department
HAVING COUNT(*) > 2;

❌ 坑 3:GROUP BY 非聚合列

-- ❌ 不严谨(在 ONLY_FULL_GROUP_BY 模式下会报错)
SELECT name, department, AVG(salary)
FROM employees
GROUP BY department;

-- ✅ 正确:name 要么在 GROUP BY 里,要么是聚合的
SELECT department, AVG(salary)
FROM employees
GROUP BY department;

❌ 坑 4:表别名的引用

-- SELECT 中定义的别名不能在 WHERE 里用
SELECT salary * 12 AS annual FROM employees
WHERE annual > 180000;  -- ❌ 报错!SELECT 在第 5 步执行

-- ✅ 用子查询/CTE
SELECT * FROM (
    SELECT salary * 12 AS annual FROM employees
) WHERE annual > 180000;

✅ 最佳实践清单

实践 说明
大写关键字 SELECT, FROM, WHERE 全大写,表名列名小写
合理缩进 JOIN、WHERE、GROUP BY 各占一行
用别名 表名太长时用缩写 eo
不用 SELECT * 明确列出需要的列
优先 CTE 别写超过 20 行的嵌套子查询
加注释 复杂逻辑一定写注释
先 EXPLAIN 慢查询先看执行计划
事务包裹 多个 UPDATE/DELETE 用 BEGIN / COMMIT
外键约束 建表时定义外键,保证数据完整性

13. 给新手的进阶路线图

基础 (1-2 周)
├── SELECT / INSERT / UPDATE / DELETE
├── WHERE / ORDER BY / LIMIT / DISTINCT
├── LIKE / IN / BETWEEN / IS NULL
└── 理解 SQL 执行顺序

进阶 (2-4 周)
├── JOIN (INNER / LEFT / RIGHT / SELF)
├── GROUP BY + HAVING
├── 子查询 (标量 / 表 / EXISTS)
├── 联合查询 UNION / UNION ALL
└── CASE WHEN 条件表达式

高级 (4-8 周)
├── 窗口函数 (RANK / LAG / SUM OVER)
├── CTE (WITH) + 递归 CTE
├── 索引原理 + EXPLAIN 分析
├── 事务与隔离级别
└── 数据库设计范式

实战 (持续)
├── 每天在 LeetCode / HackerRank 刷 1-2 道 SQL 题
├── 读开源项目的 SQL 迁移文件
├── 尝试优化真实项目中的慢查询
└── 学习 ORM 生成的 SQL,对比手写 SQL

📚 推荐资源

练习平台:

游戏化学习:

深度阅读:

  • 《SQL 必知必会》(Ben Forta) — 入门圣经
  • 《高性能 MySQL》(Baron Schwartz) — 进阶必读
  • Use The Index, Luke! (https://use-the-index-luke.com/) — 索引深度

14. 参考与出处

本文内容凝聚了以下精华来源,致谢每一位写出这些优秀文章/书籍/工具的人:

📖 书籍

  1. 《SQL 必知必会》 (Sams Teach Yourself SQL in 10 Minutes) — Ben Forta
    • 全球销量最高的 SQL 入门书,没有之一。每节课 10 分钟,4 周拿下 SQL 基础。
  2. 《高性能 MySQL》 (High Performance MySQL, 3rd Edition) — Baron Schwartz 等
    • MySQL 领域的圣经,索引、查询优化、架构设计的终极指南。
  3. 《SQL 反模式》 (SQL Antipatterns) — Bill Karwin
    • 告诉你 SQL 中常见的错误设计模式,每个开发者都应该读。
  4. 《SQL 经典实例》 (SQL Cookbook) — Anthony Molinaro
    • 100+ 个实战 SQL 技巧,随手翻随时学。

🌐 文章与论坛

  1. Stack Overflow — 任何 SQL 问题,这里几乎都有答案
  2. Modern SQL — Markus Winand 的 SQL 现代语法解读
  3. Use The Index, Luke! — Markus Winand 的另一力作,索引的终极指南
  4. PostgreSQL Tutorial — 最清晰的 PG 教程
  5. SQLZoo — 交互式 SQL 学习,支持多种数据库
  6. LeetCode Discussion — 面试 SQL 题讨论区,高手如云

🛠 工具

  1. db-fiddle.com — 在线写 SQL,支持 MySQL/PG/SQLite
  2. sqliteonline.com — 零安装的 SQLite 在线工具
  3. DBeaver — 跨平台数据库管理工具,GUI 良心之作
  4. SQLite Browser — SQLite 可视化工具

🏆 值得关注的人

  1. Markus Winand — SQL 性能优化大师,Modern SQL 布道者
  2. Baron Schwartz — 数据库性能领域泰斗
  3. Bill Karwin — SQL 反模式作者,Stack Overflow 活跃答主
  4. Jennifer Widom — Stanford 数据库课程教授,她的课视频在 YouTube 免费看

🎁 最后的赠言

“Talk is cheap. Show me the code.” — Linus Torvalds

对于 SQL 来说,这句话要改成:

“Reading is cheap. Show me the EXPLAIN PLAN.”

别光看教程了。打开一个 SQL 终端,把上面的例子亲手敲一遍。

如果卡住了:

  • 先自己查 Stack Overflow
  • 查不到再问 ChatGPT
  • 还不行就重新问

本文由 taohua 编写,结合了 5+ 年编程经验与 SQL 实战积累。

最后更新:2026-06-09

🚀 如果把 SQL 学好,代码之路就打下了半壁江山。加油!

Logo

欢迎加入DeepSeek 技术社区。在这里,你可以找到志同道合的朋友,共同探索AI技术的奥秘。

更多推荐