Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: produce message size rfc #4202

Merged
merged 4 commits into from
Oct 21, 2024
Merged

Conversation

fraidev
Copy link
Contributor

@fraidev fraidev commented Oct 2, 2024

TL;DR

batch_size producer config must not reject large records, just send them directly.

Create a new max_request_size producer config that must reject large messages.
I am using max_request_size because Kafka uses max.request.size but we can change it to other config name.

Compression sizes should not be used for these producer configs.

Related:

@fraidev fraidev force-pushed the rfc_produce_msg_size branch 3 times, most recently from 8b495a7 to 7efe6bb Compare October 2, 2024 02:30
@fraidev fraidev marked this pull request as ready for review October 2, 2024 02:32
@fraidev fraidev force-pushed the rfc_produce_msg_size branch 2 times, most recently from f60994d to eaab9a5 Compare October 2, 2024 02:43
@fraidev fraidev requested review from digikata, sehz and ajhunyady October 2, 2024 02:43

1. Handling Larger Messages than Batch Size

If a single record exceeds the defined `batch_size`, Fluvio will process the record as a standalone request, ensuring that larger messages are not discarded or delayed. If the record does not exceed the `batch_size`, Fluvio will process the record as part of an already existing batch or create a new one if the batch is full.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, is this true? Customers are complaining of packets being dropped if they are larger than the batch size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the expectation, I'll rewrite it with what is doing now and what should be changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this behavior is needed. should move to enhancement if this doesn't applies to original problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be subsumed into exiting batching behavior. if batch_size is exceed, then batch is flushed

rfc/produce-message-size.md Show resolved Hide resolved
@sehz
Copy link
Contributor

sehz commented Oct 2, 2024

Never mind previous comments. Here is clarifying semantics.

max_request_size applies to pre-compression record size so can be applied to per record constrain. batch-size applies to post-compression limit (or underlying raw records limit). We can set max-request-size high for default case and bump up batch-size as well.


### Batch Size

`batch_size` will define the maximum size of a batch of records that can be sent by the producer. If a record exceeds this size, Fluvio will process the record as a standalone message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size should be ultimate raw limit. again not sure what this address

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this behavior will be confused, other products has the same parameter config batch_size. In other products, Its behavior is to just check if the record will be inside a batch or not.

Copy link
Contributor Author

@fraidev fraidev Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never reject them

@fraidev
Copy link
Contributor Author

fraidev commented Oct 2, 2024

max_request_size applies to pre-compression record size so can be applied to per record constrain. batch-size applies to post-compression limit (or underlying raw records limit).

I think that we should use the compressed size for batch-size as we only compress the whole batch. Not each record of the batch.

We can set max-request-size high for default case and bump up batch-size as well.

Sure, Kafka for example uses 16kb for batch_size like us and 1mb for max.request.size


### Compression

`batch_size` and `max_request_size` will only use the uncompressed message size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see suggestion. Having batch_size applies to final raw batch should simplify calculation.

@fraidev fraidev force-pushed the rfc_produce_msg_size branch from eaab9a5 to 57a0bd7 Compare October 14, 2024 19:23
@fraidev fraidev force-pushed the rfc_produce_msg_size branch from fe81e0f to bc03205 Compare October 17, 2024 16:26
Copy link
Contributor

@ajhunyady ajhunyady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check this in. @sehz you had some comments are they resolved?

Copy link
Contributor

@digikata digikata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@digikata digikata added this pull request to the merge queue Oct 21, 2024
Merged via the queue into infinyon:master with commit b650cae Oct 21, 2024
106 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants