举个简单例子,表 t1 有两个字段,针对字段 r1 有两个索引,一个是基于字段 r1 的普通二级索引,另外一个是基于字段r1的前缀索引。
<localhost|mysql>show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`r1` varchar(300) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_r1` (`r1`),
KEY `idx_r1_p` (`r1`(6))
) ENGINE=InnoDB AUTO_INCREMENT=32755 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
<localhost|mysql>select count(*) from t1;
+----------+
| count(*) |
+----------+
| 24576 |
+----------+
1 row in set (0.04 sec)
对应的执行计划。可以看出,MySQL 选择了体积较小的前缀索引 idx_r1_p。
<localhost|mysql>explain select count(*) from t1 where r1 like 'sample%'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: range
possible_keys: idx_r1,idx_r1_p
key: idx_r1_p
key_len: 27
ref: NULL
rows: 4
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)
SQL 1 – SQL 6 的前缀长度依次为 6 – 1 个字符。
# SQL 1
select count(*) from t1 where r1 like 'sample%';
# SQL 2
select count(*) from t1 where r1 like 'sampl%';
# SQL 3
select count(*) from t1 where r1 like 'samp%';
# SQL 4
select count(*) from t1 where r1 like 'sam%';
# SQL 5
select count(*) from t1 where r1 like 'sa%';
# SQL 6
select count(*) from t1 where r1 like 's%';
为了找到最合适的索引前缀长度, 我写了一个简单的函数,用来依次返回字段 r1 每个前缀长度的数据分散比率。函数 func_calc_prefix_length 返回一个 JSON 对象,对象的 KEY 和 VALUE 分别记录了前缀长度以及对应的分散比率。
DELIMITER $$
USE `ytt`$$
DROP FUNCTION IF EXISTS `func_calc_prefix_length`$$
CREATE DEFINER=`ytt`@`%` FUNCTION `func_calc_prefix_length`() RETURNS JSON
BEGIN
DECLARE v_total_pct DECIMAL(20,4);
DECLARE v_prefix_pct DECIMAL(20,4);
DECLARE v_result JSON DEFAULT '[]';
DECLARE i TINYINT DEFAULT 1;
SELECT TRUNCATE(COUNT(DISTINCT r1) / COUNT(r1),4) INTO v_total_pct FROM t1;
label1:LOOP
SELECT TRUNCATE(COUNT(DISTINCT LEFT(r1,i)) / COUNT(r1),4) INTO v_prefix_pct FROM t1;
SET v_result = JSON_ARRAY_APPEND(v_result,'$',JSON_OBJECT(i,v_prefix_pct));
IF v_prefix_pct >= v_total_pct THEN
LEAVE label1;
END IF;
SET i = i + 1;
END LOOP;
RETURN v_result;
END$$
DELIMITER ;
查看下 sample 为后缀的表记录样例。
<localhost|mysql>select * from t2 where r1 like '%sample' limit 1\G
*************************** 1. row ***************************
id: 14
r1: mysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmysqlmyssample
1 row in set (0.13 sec)
针对此种情形,有两种优化方法:
第一,可以把数据按照后缀做一个拆分,后缀部分单独为一个字段,然后给这个字段加一个索引。除了要加字段,此方法很完美~
建一个表 t3,把表 t2 的数据导进去。
CREATE TABLE `t3` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`r1` varchar(300) DEFAULT NULL,
`suffix_r1` varchar(6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_suffix_r1` (`suffix_r1`)
) ENGINE=InnoDB
<localhost|mysql>insert into t3 select id,r1,right(r1,6) from t2;
Query OK, 24576 rows affected (19.05 sec)
Records: 24576 Duplicates: 0 Warnings: 0
再次执行 SQL 7,查询瞬间出来结果。
<localhost|mysql>select count(*) from t3 where suffix_r1 = 'sample';
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.00 sec)
第二,可以把数据反转过来后建立前缀索引查询记录。对表 t2 克隆一张表 t4。
<localhost|mysql>insert into t4 select id,reverse(r1) from t2;
Query OK, 24576 rows affected (5.25 sec)
Records: 24576 Duplicates: 0 Warnings: 0
查询关键词进行反转查询,
<localhost|mysql>select count(*) from t4 where r1 like 'elpmas%';
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.00 sec)
再看下查询计划,走了前缀索引。不过这样的缺点是查询记录的时候需要在 SQL 层处理记录数据,加上反转函数。
<localhost|mysql>explain select count(*) from t4 where r1 like 'elpmas%'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t4
partitions: NULL
type: range
possible_keys: idx_r1_p
key: idx_r1_p
key_len: 27
ref: NULL
rows: 4
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)