业务场景的问题,我们有一个刷CUBE的SQL,是Oracle环境,平时跑70多分钟,
?
但是最近突然不动了,这个SQL需要算累计值,比如年累计客户数量。
?
累计值是什么意思呢?我们使用下面的数据来说明问题。
select ‘201901‘ as c_month,100 as c_customers from dual union all select ‘201902‘ as c_month,102 as c_customers from dual union all select ‘201903‘ as c_month,120 as c_customers from dual union all select ‘201904‘ as c_month,111 as c_customers from dual union all select ‘201905‘ as c_month,155 as c_customers from dual union all select ‘201906‘ as c_month,199 as c_customers from dual; C_MONT C_CUSTOMERS ------ ----------- 201901 100 201902 102 201903 120 201904 111 201905 155 201906 199
2019年1月,客户数量是100, 2019年2月,客户数量是102 ,
那么2019年1月的客户累计值是100,
2019年2月的客户年累计值是202(2019年1月的客户数量 + 2019年2月的客户数量),
2019年3月的客户年累计值是322(2019年1月的客户数量 + 2019年2月的客户数量+ 2019年3月的客户数量),
…
?
我使用如下的测试SQL来说明这个场景
create or replace view tab_test1 as select ‘201901‘ as c_month,199 as c_customers from dual union all select ‘201907‘ as c_month,108 as c_customers from dual; create view tab_test2 as select ‘20190131‘ as monthlastday from dual union all select ‘20190228‘ as monthlastday from dual union all select ‘20190331‘ as monthlastday from dual union all select ‘20190430‘ as monthlastday from dual union all select ‘20190531‘ as monthlastday from dual union all select ‘20190630‘ as monthlastday from dual union all select ‘20190731‘ as monthlastday from dual; select * from tab_test1 a join tab_test2 b on to_date(c_month,‘yyyymm‘) <= to_date(b.monthlastday,‘yyyymmdd‘) and to_date(c_month,‘yyyymm‘) >= trunc(to_date(b.monthlastday,‘yyyymmdd‘),‘yyyy‘) order by b.monthlastday,a.c_month; C_MONT C_CUSTOMERS MONTHLAS ------ ----------- -------- 201901 100 20190131 201901 100 20190228 201902 102 20190228 201901 100 20190331 201902 102 20190331 201903 120 20190331 201901 100 20190430 201902 102 20190430 201903 120 20190430 201904 111 20190430 201901 100 20190531 C_MONT C_CUSTOMERS MONTHLAS ------ ----------- -------- 201902 102 20190531 201903 120 20190531 201904 111 20190531 201905 155 20190531 201901 100 20190630 201902 102 20190630 201903 120 20190630 201904 111 20190630 201905 155 20190630 201906 199 20190630 201901 100 20190731 C_MONT C_CUSTOMERS MONTHLAS ------ ----------- -------- 201902 102 20190731 201903 120 20190731 201904 111 20190731 201905 155 20190731 201906 199 20190731 201907 108 20190731 已选择 28 行。
从上面SQL返回的数据能看出来,monthlastday 字段分组,汇总c_customers,就能很轻松算出年累计值。但是如果a表数据太大,无法走hash 关联。
所以需要通过某种方法改成等值关联。?
1. 先创建一个时间维表,可以通过树形查询生成一个时间维度表,由于我的测试数据自小粒度是到月的,所以我的日期维度表也是到月的。
create or replace view tab_test3 as select extract(year from c_date) as c_year,extract(month from c_date) as c_month,to_char(c_date,‘yyyymm‘) as c_month2 from (select add_months(date‘2019-01-01‘,level -1 ) as c_date from dual connect by level <= 8);
2. 通过时间维度表自关联出累计月份对应的日期。
下面SQL,同构过滤t1 表的c_month2 字段,就可以拿到任意月份的累计月份了。比如2019-07月的累计月份是2019年 1-7月份
select t1.c_month2 as groupcolumn,t2.c_month2 joincolumn from tab_test3 t1 join tab_test3 t2 on t1.c_year = t2.c_year and t1.c_month2 >= t2.c_month2 order by 1,2 GROUPC JOINCO ------ ------ 201901 201901 201902 201901 201902 201902 201903 201901 201903 201902 201903 201903 201904 201901 201904 201902 201904 201903 201904 201904 201905 201901 GROUPC JOINCO ------ ------ 201905 201902 201905 201903 201905 201904 201905 201905 201906 201901 201906 201902 201906 201903 201906 201904 201906 201905 201906 201906 201907 201901 GROUPC JOINCO ------ ------ 201907 201902 201907 201903 201907 201904 201907 201905 201907 201906 201907 201907 201908 201901 201908 201902 201908 201903 201908 201904 201908 201905 GROUPC JOINCO ------ ------ 201908 201906 201908 201907 201908 201908 已选择 36 行。
3. 修改原来SQL中关于日期的不等值关联,可以起到走hash的作用。