数据库内核月报

一个逻辑运算符析取合取式(Item_cond)下如果有多个子查询，可以尝试对这多个子查询进行折叠，折叠成1个或者改写为true、false。显而易见可以大大加速sql执行。
注：该功能在PolarDB 8.02.2.22 版本中，默认在ro上开启，在rw上开启需设置如下参数
set subquery_coalesce_mode=on; // 开启折叠，默认只执行启发式折叠
set subquery_coalesce=on; // 开启基于代价折叠，折叠会增加新算子，大部分场景是个better plan，但并不总是
// 如果开启了cbqt，会忽略这个开关，基于代价进行枚举。
规则生效场景：
  折叠的对象可以出现在where， having， join on条件的任何位置上。做到应消尽消。
  子查询可以是Exists, IN子查询， ALL/ALL子查询
  算子可以是正值， 负值运算，如in /not in， exists/not exists。
  父亲逻辑算子可以是and， 可以是or
同类型消除
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE c2 = 0)  --子查询1
	AND EXISTS (SELECT 1 FROM t2) 		              --子查询2
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE c2 = 0)
子查询1是子查询2的子集，子查询2直接消除掉
or条件下：
SELECT * FROM t1 WHERE EXIST (SELECT 1 FROM t2 WHERE c2 = 0)  --子查询1
	or EXIST (SELECT 1 FROM t2) 		              --子查询2
SELECT * FROM t1 WHERE EXISTs (SELECT 1 FROM t2)
*注： or条件下改写为 false or EXISTs (SELECT 1 FROM t2), 保留大集合
同类型子查询合并
select * from t1 where exists (select t1.a as f from t1 where a >10 and b < 10)
or exists (select a from t1 where a > 10  and c <3);
select * from t1 where exists (select t1.a as f from t1 where a >10 and (b < 10 or c <3);
也可以是and_cond表达式
select * from t1 where not exists (select t1.a as f from t1 where a >10 and b < 10)
and not exists (select a from t1 where a > 10  and c <3);
select * from t1 where not exists (select t1.a as f from t1 where a >10 and (b < 10 or c <3);
互斥类型子查询消除
exists冲突 //消除子查询，改写为false
  exists VS not exists
  in VS not in
SELECT * FROM t1 WHERE EXISTs (SELECT 1 FROM t2 WHERE c1 = 0)  --子查询1
      AND NOT EXISTs (SELECT 1 FROM t2) 		--子查询2
SELECT * FROM t1 WHERE false
any/all 冲突 //SQL2
  >any vs <all、<=all
  <any vs >all, >=all
    SELECT * FROM t1 WHERE t1.c1 > any (SELECT c1 FROM t2 where c1 > 100) and
t1.c1 < all (select c1 from t2 where  c1 > 10)
SELECT * FROM t1 WHERE false; //any 是all 子集
    Item_or下的消除
    SELECT * FROM t1 WHERE EXISTs (SELECT 1 FROM t2 )  --子查询1
    or NOT EXISTs (SELECT 1 FROM t2 WHERE c1 = 0) 		--子查询2
SELECT * FROM t1 WHERE true
    互斥exists子查询合并 //SQL3
    SELECT * FROM t1 WHERE EXIST (SELECT 1 FROM t2) 	    --子查询1
   AND NOT EXIST (SELECT 1 FROM t2 WHERE c2 = 0)      --子查询2
SELECT * FROM t1 WHERE EXIST (SELECT 1 FROM t2 having sum (case when extra_cond 1 else 0) ==0) 
  增加 为having sum (case when extra_cond 1 else 0) ==0
  并不总是最优的，基于代价选择是否进行折叠，如确认改写较优，开启subquery_coalesce参数。
  tpch Q21 热数据测试对比如下
  in vs not in  并且 (not in集合更小，是左侧子集）
  = any vs != all 并且 (all集合更小，是左侧子集）
    SELECT * FROM t1 WHERE t1.c1 = any (SELECT c1 FROM t2 where c1 > 10) and
t1.c1 != all (select c1 from t2 where  c1 > 100)
SELECT * FROM t1 WHERE t1.c1 = 
any (SELECT c1 FROM t2 where c1 > 10 and LNNVL(c1 >100));
该plan也是基于代价进行改写的，通常都是改写更优。
or 条件下合并
SELECT * FROM t1 WHERE t1.c1 = any (SELECT c1 FROM t2 ) or
t1.c1 != all (select c1 from t2 where  c1 > 100)
SELECT * FROM t1 WHERE 1;
PolarDB子查询折叠基本上是能转尽转，可以折叠在where条件， join on条件， having条件内部的任何冗余子查询，
能大大加速SQL执行。