[HUDI-8920] Optimized SerDe costs of Flink write, simple bucket and non bucket cases #12796

geserdugarov · 2025-02-06T13:55:30Z

Change Logs

Changes in Flink stream write into Hudi table corresponding to RFC #12697. Here simple bucket index and non bucket cases are implemented. The only remaining work to do is to support consistent hashing and bounded context:

Main points:

HoodieFlinkRecord is introduced. It doesn't extend HoodieRecord because we need to create data structure with Flink row data and Hudi metadata, constructed from Flink internal data types.
HoodieFlinkRecord is effective for Flink processing due to implemented HoodieFlinkRecordTypeInfo and HoodieFlinkRecordSerializer with custom serialize and deserialize methods.
We doesn't rewrite classes used in Flink write pipelines to new optimized ones because we want to save previous behavior. Therefore, new behavior could be turned on by write.fast.mode configuration, which is turned off by default. After proper testing we could turn it on by default, then deprecate previous behavior, and refactor all classes after drop of previous behavior.

Benchmark description

Lineitem table from TPC-H benchmark was used. 60 mln rows, from which 20 mln rows are unique.

Perfomance estimation results

	current with Kryo	HoodieFlinkRecord	Optimization
Non bucket
Data passed, GB	43.9	29.3	33.3%
Total time, s	578	384	33.6%
Simple bucket index
Data passed, GB	19.4	13.6	29.9%
Total time, s	297	236	20.5%

Flink operators

Non bucket case:

Simple bucket case:

Impact

Flink write performance improvement.

Risk level (write none, low medium or high below)

Low

Documentation Update

After merge

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

wombatu-kun · 2025-02-07T03:04:04Z

hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java

+      .booleanType()
+      .defaultValue(false)
+      .withDescription("Optimized Flink write into Hudi table, which uses customized serialization/deserialization. "
+          + "Note, that only SIMPLE BUCKET index is supported for now.");


PR's title says "simple bucket and non bucket cases"

Missed it. Thanks! Fixed in 5a81536.

geserdugarov · 2025-02-11T16:31:12Z

@danny0405 , @xiarixiaoyao, @yuzhaojing, @wombatu-kun , hi!
If you don't mind and would have time, could you, please, review this PR related to corresponding RFC #12697.

Actually, the main part of proposed changes has been done in this PR. The only missed part for now is consistent hashing support (in progress) and bounded context (will check it next).

I've also finished testing.

Run benchmark on lineitem table from TPC-H for supported write scenarios, and got 20-30% performance increase.
Manually checked that restore from Flink checkpoint is successful.
Enabled write.fast.mode by default to run all tests in [HUDI-8920] [TEST] write.fast.mode turned on by default #12817. There are only 2 errors in test-flink (flink1.20, 1.11.3):
https://github.com/apache/hudi/actions/runs/13261652696/job/37019456652?pr=12817

Error:  testScheduleSplitPlan  Time elapsed: 0.034 s  <<< ERROR!
org.apache.hudi.exception.HoodieNotSupportedException: Currently, consistent hashing is not supported with enabled 'write.fast.mode'
	at org.apache.hudi.sink.cluster.ITTestFlinkConsistentHashingClustering.prepareData(ITTestFlinkConsistentHashingClustering.java:126)
	at org.apache.hudi.sink.cluster.ITTestFlinkConsistentHashingClustering.testScheduleSplitPlan(ITTestFlinkConsistentHashingClustering.java:79)

Error:  testScheduleMergePlan  Time elapsed: 0.027 s  <<< ERROR!
org.apache.hudi.exception.HoodieNotSupportedException: Currently, consistent hashing is not supported with enabled 'write.fast.mode'
	at org.apache.hudi.sink.cluster.ITTestFlinkConsistentHashingClustering.prepareData(ITTestFlinkConsistentHashingClustering.java:126)
	at org.apache.hudi.sink.cluster.ITTestFlinkConsistentHashingClustering.testScheduleMergePlan(ITTestFlinkConsistentHashingClustering.java:104)

These errors are related to not supported consistent hashing yet.
All other cases are successfully passed.

… Flink write, non bucket and simple bucket index

geserdugarov · 2025-02-12T13:59:58Z

I've squashed all commits into one, cb090d8, and renamed HoodieFlinkRecord into HoodieFlinkInternalRow to prevent confusion, because this class doesn't extend HoodieRecord.

hudi-bot · 2025-02-12T15:48:01Z

CI report:

466c013 UNKNOWN
cb090d8 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

github-actions bot added the size:XL PR with lines of changes > 1000 label Feb 6, 2025

geserdugarov mentioned this pull request Feb 6, 2025

[HUDI-8921] Switch from HoodieRecord to HoodieFlinkRecord for Flink write, simple bucket index #12722

Closed

4 tasks

geserdugarov changed the title ~~[HUDI-8946] [HUDI-8921] Optimized SerDe costs of Flink write, simple bucket and non bucket cases~~ [HUDI-8920] Optimized SerDe costs of Flink write, simple bucket and non bucket cases Feb 6, 2025

geserdugarov mentioned this pull request Feb 6, 2025

[HUDI-8799] Design of RFC-84, Optimized SerDe of DataStream in Flink operators #12697

Open

4 tasks

geserdugarov force-pushed the master-serde-non-bucket branch 2 times, most recently from 68028fc to 9377d36 Compare February 6, 2025 16:28

wombatu-kun reviewed Feb 7, 2025

View reviewed changes

geserdugarov force-pushed the master-serde-non-bucket branch from 466c013 to 5a81536 Compare February 7, 2025 07:37

geserdugarov mentioned this pull request Feb 10, 2025

[HUDI-8920] [TEST] write.fast.mode turned on by default #12817

Draft

4 tasks

[HUDI-8921] [HUDI-8946] [RFC-84] Optimize SerDe of stream records for…

cb090d8

… Flink write, non bucket and simple bucket index

geserdugarov force-pushed the master-serde-non-bucket branch from 04a23b3 to cb090d8 Compare February 12, 2025 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-8920] Optimized SerDe costs of Flink write, simple bucket and non bucket cases #12796

[HUDI-8920] Optimized SerDe costs of Flink write, simple bucket and non bucket cases #12796

geserdugarov commented Feb 6, 2025 •

edited

Loading

wombatu-kun Feb 7, 2025

geserdugarov Feb 7, 2025 •

edited

Loading

geserdugarov commented Feb 11, 2025 •

edited

Loading

geserdugarov commented Feb 12, 2025

hudi-bot commented Feb 12, 2025

[HUDI-8920] Optimized SerDe costs of Flink write, simple bucket and non bucket cases #12796

Are you sure you want to change the base?

[HUDI-8920] Optimized SerDe costs of Flink write, simple bucket and non bucket cases #12796

Conversation

geserdugarov commented Feb 6, 2025 • edited Loading

Change Logs

Benchmark description

Perfomance estimation results

Flink operators

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

wombatu-kun Feb 7, 2025

Choose a reason for hiding this comment

geserdugarov Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

geserdugarov commented Feb 11, 2025 • edited Loading

geserdugarov commented Feb 12, 2025

hudi-bot commented Feb 12, 2025

CI report:

geserdugarov commented Feb 6, 2025 •

edited

Loading

geserdugarov Feb 7, 2025 •

edited

Loading

geserdugarov commented Feb 11, 2025 •

edited

Loading