Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

查询基表 在物化视图 Union Rewrite会扫描所有字段,严重降低查询性能 #55221

Closed
zarah1 opened this issue Jan 18, 2025 · 2 comments · Fixed by #55286
Closed
Labels
sig/materialized view type/bug Something isn't working

Comments

@zarah1
Copy link

zarah1 commented Jan 18, 2025

版本 3.3.9 存算一体部署
基表是小时级的分区表,实时追加写入,物化视图是每天更新一次的天级分区物化视图

Query Profile
17d77a65-d567-11ef-95d9-00163e3d4879profile.txt

执行计划如下,此查询耗时14s920ms
mysql> explain
-> select date_trunc('day', data_date),sum(ssp_req),sum(dsp_req),sum(dsp_fill_req)
-> from adx.mammut_adx_slot_pkg_hi
-> where data_date >= '2025-01-17 00:00:00' and data_date < '2025-01-19 00:00:00'
-> group by date_trunc('day', data_date)
-> order by date_trunc('day', data_date);
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:51: date_trunc | 52: sum | 53: sum | 54: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 15:MERGING-EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 55: data_date |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 15 |
| UNPARTITIONED |
| |
| 14:SORT |
| | order by: <slot 51> 51: date_trunc ASC |
| | offset: 0 |
| | |
| 13:Project |
| | <slot 51> : 55: data_date |
| | <slot 52> : 222: sum |
| | <slot 53> : 223: sum |
| | <slot 54> : 224: sum |
| | |
| 12:AGGREGATE (merge finalize) |
| | output: sum(222: sum), sum(223: sum), sum(224: sum) |
| | group by: 55: data_date |
| | |
| 11:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 11 |
| HASH_PARTITIONED: 55: data_date |
| |
| 10:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(90: ssp_req), sum(94: dsp_req), sum(93: dsp_fill_req) |
| | group by: 55: data_date |
| | |
| 0:UNION |
| | |
| |----9:EXCHANGE |
| | |
| 3:EXCHANGE |
| |
| PLAN FRAGMENT 3 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 197: date_trunc, 198: to_date, 164: media_id, 165: dsp_id, 166: target_id, 169: media_slot_type, 170: media_slot_id, 167: dsp_slot_id, 168: adx_slot_id, 177: platform, 174: sdkv, 175: pkg, 176: dsp_pkg, 173: conv_type_id, 171: dsp_slot_type, 172: adx_dsp_slot_id, 182: julang_task_id, 183: policy_id, 184: experiment_id, 180: media_tier, 179: media_category, 178: integrate_by, 181: version |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 09 |
| RANDOM |
| |
| 8:AGGREGATE (merge finalize) |
| | output: sum(199: sum), sum(202: sum), sum(203: sum) |
| | group by: 197: date_trunc, 198: to_date, 164: media_id, 165: dsp_id, 166: target_id, 169: media_slot_type, 170: media_slot_id, 167: dsp_slot_id, 168: adx_slot_id, 177: platform, 174: sdkv, 175: pkg, 176: dsp_pkg, 173: conv_type_id, 171: dsp_slot_type, 172: adx_dsp_slot_id, 182: julang_task_id, 183: policy_id, 184: experiment_id, 180: media_tier, 179: media_category, 178: integrate_by, 181: version |
| | |
| 7:EXCHANGE |
| |
| PLAN FRAGMENT 4 |
| OUTPUT EXPRS: |
| colocate exec groups: ExecGroup{groupId=6, nodeIds=[4, 5, 6]} |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 07 |
| HASH_PARTITIONED: 197: date_trunc, 198: to_date, 164: media_id, 165: dsp_id, 166: target_id, 169: media_slot_type, 170: media_slot_id, 167: dsp_slot_id, 168: adx_slot_id, 177: platform, 174: sdkv, 175: pkg, 176: dsp_pkg, 173: conv_type_id, 171: dsp_slot_type, 172: adx_dsp_slot_id, 182: julang_task_id, 183: policy_id, 184: experiment_id, 180: media_tier, 179: media_category, 178: integrate_by, 181: version |
| |
| 6:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(185: ssp_req), sum(154: dsp_fill_req), sum(186: dsp_req) |
| | group by: 197: date_trunc, 198: to_date, 164: media_id, 165: dsp_id, 166: target_id, 169: media_slot_type, 170: media_slot_id, 167: dsp_slot_id, 168: adx_slot_id, 177: platform, 174: sdkv, 175: pkg, 176: dsp_pkg, 173: conv_type_id, 171: dsp_slot_type, 172: adx_dsp_slot_id, 182: julang_task_id, 183: policy_id, 184: experiment_id, 180: media_tier, 179: media_category, 178: integrate_by, 181: version |
| | |
| 5:Project |
| | <slot 154> : 154: dsp_fill_req |
| | <slot 164> : 164: media_id |
| | <slot 165> : 165: dsp_id |
| | <slot 166> : 166: target_id |
| | <slot 167> : 167: dsp_slot_id |
| | <slot 168> : 168: adx_slot_id |
| | <slot 169> : 169: media_slot_type |
| | <slot 170> : 170: media_slot_id |
| | <slot 171> : 171: dsp_slot_type |
| | <slot 172> : 172: adx_dsp_slot_id |
| | <slot 173> : 173: conv_type_id |
| | <slot 174> : 174: sdkv |
| | <slot 175> : 175: pkg |
| | <slot 176> : 176: dsp_pkg |
| | <slot 177> : 177: platform |
| | <slot 178> : 178: integrate_by |
| | <slot 179> : 179: media_category |
| | <slot 180> : 180: media_tier |
| | <slot 181> : 181: version |
| | <slot 182> : 182: julang_task_id |
| | <slot 183> : 183: policy_id |
| | <slot 184> : 184: experiment_id |
| | <slot 185> : 185: ssp_req |
| | <slot 186> : 186: dsp_req |
| | <slot 197> : date_trunc('day', 163: data_date) |
| | <slot 198> : to_date(163: data_date) |
| | |
| 4:OlapScanNode |
| TABLE: mammut_adx_slot_pkg_hi |
| PREAGGREGATION: ON |
| PREDICATES: date_trunc('day', 163: data_date) >= '2025-01-17 00:00:00', date_trunc('day', 163: data_date) < '2025-01-19 00:00:00' |
| partitions=15/120 |
| rollup: mammut_adx_slot_pkg_hi |
| tabletRatio=90/90 |
| tabletList=575061,575064,575067,575070,575073,575076,575555,575558,575561,575564 ... |
| cardinality=71305099 |
| avgRowSize=171.23503 |
| |
| PLAN FRAGMENT 5 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 03 |
| RANDOM |
| |
| 2:Project |
| | <slot 101> : 101: data_date |
| | <slot 136> : 136: ssp_req |
| | <slot 139> : 139: dsp_fill_req |
| | <slot 140> : 140: dsp_req |
| | |
| 1:OlapScanNode |
| TABLE: mammut_adx_slot_pkg_day_async_mv01 |
| PREAGGREGATION: ON |
| partitions=1/6 |
| rollup: mammut_adx_slot_pkg_day_async_mv01 |
| tabletRatio=3/3 |
| tabletList=575808,575811,575814 |
| cardinality=4987523 |
| avgRowSize=32.0 |
| MaterializedView: true |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

@zarah1 zarah1 added the type/bug Something isn't working label Jan 18, 2025
@zarah1 zarah1 changed the title 查询基表物化视图 Union Rewrite会扫描所有字段,严重降低查询性能 查询基表 在物化视图 Union Rewrite会扫描所有字段,严重降低查询性能 Jan 18, 2025
@LiShuMing
Copy link
Contributor

Thanks for your feedback. Can you provide the base table and mv's schema, let me check it.

@zarah1
Copy link
Author

zarah1 commented Jan 20, 2025

table schema:
mammut_adx_slot_pkg_hi | CREATE TABLE mammut_adx_slot_pkg_hi (
data_hour varchar(65533) NOT NULL COMMENT "事件时间小时级",
event_min varchar(65533) NOT NULL COMMENT "事件时间分钟级",
media_id int(11) NULL COMMENT "",
dsp_id int(11) NULL COMMENT "",
target_id int(11) NULL COMMENT "",
dsp_slot_id varchar(65533) NULL COMMENT "",
adx_slot_id int(11) NULL COMMENT "",
media_slot_type int(11) NULL COMMENT "",
media_slot_id varchar(65533) NULL COMMENT "",
dsp_slot_type int(11) NULL COMMENT "",
adx_dsp_slot_id int(11) NULL DEFAULT "-1" COMMENT "",
conv_type_id varchar(65533) NULL COMMENT "",
sdkv varchar(65533) NULL COMMENT "",
pkg varchar(65533) NULL COMMENT "",
dsp_pkg varchar(65533) NULL COMMENT "",
platform varchar(65533) NULL COMMENT "",
integrate_by int(11) NULL COMMENT "",
flow_price double NULL DEFAULT "0.2894" COMMENT "流量单价(G/元)",
media_category int(11) NULL COMMENT "",
media_tier int(11) NULL COMMENT "",
version varchar(65533) NULL COMMENT "",
julang_task_id varchar(65533) NULL COMMENT "",
policy_id int(11) NULL DEFAULT "-1" COMMENT "分发规则ID",
experiment_id varchar(65533) NULL COMMENT "实验id",
touch_amount varchar(65533) NULL COMMENT "触达次数",
ssp_req bigint(20) NULL COMMENT "",
dsp_req bigint(20) NULL COMMENT "",
ssp_win bigint(20) NULL COMMENT "",
dsp_win bigint(20) NULL COMMENT "",
imp bigint(20) NULL COMMENT "",
clk bigint(20) NULL COMMENT "",
dsp_floor double NULL COMMENT "",
dsp_fill_price double NULL COMMENT "",
dsp_fee_price double NULL DEFAULT "0" COMMENT "",
imp_dsp_bid_price double NULL DEFAULT "0" COMMENT "",
dsp_req_timeout bigint(20) NULL COMMENT "",
ssp_bid_price double NULL COMMENT "",
ssp_floor double NULL COMMENT "",
dsp_win_price double NULL COMMENT "",
dsp_fill_req bigint(20) NULL COMMENT "",
dsp_fill_valid bigint(20) NULL DEFAULT "0" COMMENT "",
used_time bigint(20) NULL COMMENT "",
dsp_used_time bigint(20) NULL DEFAULT "0" COMMENT "",
revenue double NULL COMMENT "",
revenue_rebate double NULL COMMENT "",
media_revenue double NULL COMMENT "",
body_size bigint(20) NULL COMMENT "",
launch bigint(20) NULL COMMENT "",
media_origin_revenue double NULL DEFAULT "0" COMMENT "",
data_date datetime NOT NULL COMMENT "事件时间精确到分钟"
) ENGINE=OLAP
DUPLICATE KEY(data_hour, event_min, media_id)
PARTITION BY date_trunc('hour', data_date)
DISTRIBUTED BY HASH(pkg, dsp_slot_id, media_id) BUCKETS 6
PROPERTIES (
"compression" = "LZ4",
"fast_schema_evolution" = "true",
"partition_live_number" = "120",
"replicated_storage" = "true",
"replication_num" = "2",
"storage_cooldown_ttl" = "3 days",
"storage_medium" = "SSD"
);

mv schema:
CREATE MATERIALIZED VIEW mammut_adx_slot_pkg_day_async_mv01 (data_date, dt, media_id, dsp_id, target_id, media_slot_type, media_slot_id, dsp_slot_id, adx_slot_id, platform, sdkv, pkg, dsp_pkg, conv_type_id, dsp_slot_type, adx_dsp_slot_id, julang_task_id, policy_id, experiment_id, media_tier, media_category, integrate_by, version, dsp_floor, imp, revenue, dsp_fill_price, dsp_fee_price, imp_dsp_bid_price, dsp_req_timeout, ssp_bid_price, media_revenue, clk, ssp_floor, dsp_win_price, ssp_req, launch, ssp_win, dsp_fill_req, dsp_req, used_time, dsp_used_time, dsp_win, body_size, revenue_rebate, media_origin_revenue)
PARTITION BY (data_date)
DISTRIBUTED BY HASH(pkg, media_id) BUCKETS 3
REFRESH ASYNC START("2024-10-13 01:30:00") EVERY(INTERVAL 1 DAY)
PROPERTIES (
"storage_cooldown_time" = "9999-12-31 23:59:59",
"replicated_storage" = "true",
"partition_refresh_number" = "1",
"resource_group" = "rg1",
"transparent_mv_rewrite_mode" = "true",
"session.enable_spill" = "true",
"storage_medium" = "SSD",
"replication_num" = "2",
"session.spill_mode" = "force",
"auto_refresh_partitions_limit" = "168"
)
AS SELECT date_trunc('day', mammut_adx_slot_pkg_hi.data_date) AS data_date, to_date(mammut_adx_slot_pkg_hi.data_date) AS dt, mammut_adx_slot_pkg_hi.media_id, mammut_adx_slot_pkg_hi.dsp_id, mammut_adx_slot_pkg_hi.target_id, mammut_adx_slot_pkg_hi.media_slot_type, mammut_adx_slot_pkg_hi.media_slot_id, mammut_adx_slot_pkg_hi.dsp_slot_id, mammut_adx_slot_pkg_hi.adx_slot_id, mammut_adx_slot_pkg_hi.platform, mammut_adx_slot_pkg_hi.sdkv, mammut_adx_slot_pkg_hi.pkg, mammut_adx_slot_pkg_hi.dsp_pkg, mammut_adx_slot_pkg_hi.conv_type_id, mammut_adx_slot_pkg_hi.dsp_slot_type, mammut_adx_slot_pkg_hi.adx_dsp_slot_id, mammut_adx_slot_pkg_hi.julang_task_id, mammut_adx_slot_pkg_hi.policy_id, mammut_adx_slot_pkg_hi.experiment_id, mammut_adx_slot_pkg_hi.media_tier, mammut_adx_slot_pkg_hi.media_category, mammut_adx_slot_pkg_hi.integrate_by, mammut_adx_slot_pkg_hi.version, sum(mammut_adx_slot_pkg_hi.dsp_floor) AS dsp_floor, sum(mammut_adx_slot_pkg_hi.imp) AS imp, sum(mammut_adx_slot_pkg_hi.revenue) AS revenue, sum(mammut_adx_slot_pkg_hi.dsp_fill_price) AS dsp_fill_price, sum(mammut_adx_slot_pkg_hi.dsp_fee_price) AS dsp_fee_price, sum(mammut_adx_slot_pkg_hi.imp_dsp_bid_price) AS imp_dsp_bid_price, sum(mammut_adx_slot_pkg_hi.dsp_req_timeout) AS dsp_req_timeout, sum(mammut_adx_slot_pkg_hi.ssp_bid_price) AS ssp_bid_price, sum(mammut_adx_slot_pkg_hi.media_revenue) AS media_revenue, sum(mammut_adx_slot_pkg_hi.clk) AS clk, sum(mammut_adx_slot_pkg_hi.ssp_floor) AS ssp_floor, sum(mammut_adx_slot_pkg_hi.dsp_win_price) AS dsp_win_price, sum(mammut_adx_slot_pkg_hi.ssp_req) AS ssp_req, sum(mammut_adx_slot_pkg_hi.launch) AS launch, sum(mammut_adx_slot_pkg_hi.ssp_win) AS ssp_win, sum(mammut_adx_slot_pkg_hi.dsp_fill_req) AS dsp_fill_req, sum(mammut_adx_slot_pkg_hi.dsp_req) AS dsp_req, sum(mammut_adx_slot_pkg_hi.used_time) AS used_time, sum(mammut_adx_slot_pkg_hi.dsp_used_time) AS dsp_used_time, sum(mammut_adx_slot_pkg_hi.dsp_win) AS dsp_win, sum(mammut_adx_slot_pkg_hi.body_size) AS body_size, sum(mammut_adx_slot_pkg_hi.revenue_rebate) AS revenue_rebate, sum(mammut_adx_slot_pkg_hi.media_origin_revenue) AS media_origin_revenue
FROM adx.mammut_adx_slot_pkg_hi
WHERE mammut_adx_slot_pkg_hi.data_date >= '2024-10-11 00:00:00'
GROUP BY date_trunc('day', mammut_adx_slot_pkg_hi.data_date), to_date(mammut_adx_slot_pkg_hi.data_date), mammut_adx_slot_pkg_hi.media_id, mammut_adx_slot_pkg_hi.dsp_id, mammut_adx_slot_pkg_hi.target_id, mammut_adx_slot_pkg_hi.media_slot_type, mammut_adx_slot_pkg_hi.media_slot_id, mammut_adx_slot_pkg_hi.dsp_slot_id, mammut_adx_slot_pkg_hi.adx_slot_id, mammut_adx_slot_pkg_hi.platform, mammut_adx_slot_pkg_hi.sdkv, mammut_adx_slot_pkg_hi.pkg, mammut_adx_slot_pkg_hi.dsp_pkg, mammut_adx_slot_pkg_hi.conv_type_id, mammut_adx_slot_pkg_hi.dsp_slot_type, mammut_adx_slot_pkg_hi.adx_dsp_slot_id, mammut_adx_slot_pkg_hi.julang_task_id, mammut_adx_slot_pkg_hi.policy_id, mammut_adx_slot_pkg_hi.experiment_id, mammut_adx_slot_pkg_hi.media_tier, mammut_adx_slot_pkg_hi.media_category, mammut_adx_slot_pkg_hi.integrate_by, mammut_adx_slot_pkg_hi.version;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/materialized view type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants