分区是将一个大表物理上分割成多个小表(分区),逻辑上仍是一个完整表。类似书籍分章节存放。
适用场景:数据有自然范围划分(特别是时间)
CREATE TABLE sales (
id INT,
sale_date DATE,
amount DECIMAL(10,2),
PRIMARY KEY(id, sale_date)
) PARTITION BY RANGE(YEAR(sale_date)) (
PARTITION p2020 VALUES LESS THAN(2021),
PARTITION p2021 VALUES LESS THAN(2022),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
特点:
- 明确的范围边界定义
- 适合时间序列数据归档
- 支持高效的范围查询
适用场景:需要均匀分布的无规则数据
CREATE TABLE users (
user_id BIGINT,
username VARCHAR(50),
PRIMARY KEY(user_id)
) PARTITION BY HASH(user_id)
PARTITIONS 4;
特点:
- 数据均匀分布到各分区
- 无法预知具体存储位置
- 分区数建议为2的幂次方
适用场景:离散值精确分类
CREATE TABLE products (
product_id INT,
category VARCHAR(20),
PRIMARY KEY(product_id, category)
) PARTITION BY LIST(category) (
PARTITION p_electronics VALUES IN ('phone', 'laptop'),
PARTITION p_clothing VALUES IN ('shirt', 'pants')
);
特点:
- 精确控制数据到指定分区
- 适合枚举类型数据
- 新增值需修改分区定义
在一级分区基础上再进行细分,形成两级分区结构。
RANGE+HASH(时间+业务ID)是最常用组合:
CREATE TABLE orders (
order_id BIGINT,
user_id BIGINT,
order_date DATE,
PRIMARY KEY(order_id, order_date, user_id)
)
/* 一级:按日期的RANGE分区 */
PARTITION BY RANGE(TO_DAYS(order_date))
/* 二级:按用户ID的HASH分区 */
SUBPARTITION BY HASH(user_id)
SUBPARTITIONS 8 (
PARTITION p202301 VALUES LESS THAN(TO_DAYS('2023-02-01')),
PARTITION p202302 VALUES LESS THAN(TO_DAYS('2023-03-01'))
);
-- RANGE分区添加
ALTER TABLE sales ADD PARTITION (
PARTITION p2023 VALUES LESS THAN(2024)
);
-- LIST分区添加
ALTER TABLE products ADD PARTITION (
PARTITION p_home VALUES IN ('furniture', 'appliance')
);
ALTER TABLE sales REORGANIZE PARTITION p2020,p2021 INTO (
PARTITION p2020_2021 VALUES LESS THAN(2022)
);
-- 删除整个分区(包括数据)
ALTER TABLE sales DROP PARTITION p2020;
-- 删除分区数据但保留结构
ALTER TABLE sales TRUNCATE PARTITION p2021;
-- 查看分区定义
SHOW CREATE TABLE sales;
-- 查看分区数据分布
SELECT
partition_name,
table_rows
FROM information_schema.partitions
WHERE table_name = 'sales';
-- 分析查询命中的分区
EXPLAIN PARTITIONS
SELECT * FROM sales
WHERE sale_date BETWEEN '2021-01-01' AND '2021-12-31';
CREATE TABLE orders (
order_id BIGINT,
user_id BIGINT,
order_date DATETIME,
merchant_id INT,
PRIMARY KEY(order_id, order_date, user_id)
)
PARTITION BY RANGE(TO_DAYS(order_date))
SUBPARTITION BY HASH(merchant_id)
SUBPARTITIONS 16 (
PARTITION p2023q1 VALUES LESS THAN(TO_DAYS('2023-04-01')),
PARTITION p2023q2 VALUES LESS THAN(TO_DAYS('2023-07-01'))
);
CREATE TABLE iot_metrics (
metric_id BIGINT,
device_type VARCHAR(20),
metric_time DATETIME,
value DOUBLE,
PRIMARY KEY(metric_id, device_type, metric_time)
)
PARTITION BY LIST(device_type)
SUBPARTITION BY RANGE(UNIX_TIMESTAMP(metric_time)) (
PARTITION p_sensor VALUES IN ('temp', 'humidity') (
SUBPARTITION p_sensor_202301 VALUES LESS THAN(UNIX_TIMESTAMP('2023-02-01')),
SUBPARTITION p_sensor_202302 VALUES LESS THAN(UNIX_TIMESTAMP('2023-03-01'))
),
PARTITION p_actuator VALUES IN ('valve', 'switch') (
SUBPARTITION p_actuator_202301 VALUES LESS THAN(UNIX_TIMESTAMP('2023-02-01')),
SUBPARTITION p_actuator_202302 VALUES LESS THAN(UNIX_TIMESTAMP('2023-03-01'))
)
);
错误:
CREATE TABLE invalid_tbl (
id INT PRIMARY KEY,
create_time DATETIME
) PARTITION BY RANGE(TO_DAYS(create_time)) (...);
解决:
CREATE TABLE valid_tbl (
id INT,
create_time DATETIME,
PRIMARY KEY(id, create_time)
) PARTITION BY RANGE(TO_DAYS(create_time)) (...);
症状:元数据管理开销大,性能下降
方案:
- 合并小分区
- 减少二级子分区数量
- 考虑按更大时间粒度分区
症状:某些分区数据量过大
方案:
- 调整HASH分区数量
- 考虑使用RANGE-LIST组合
- 对热点数据单独分区