-
Notifications
You must be signed in to change notification settings - Fork 486
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
99 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
# Enhancing Fluvio to Message Sizes | ||
|
||
This RFC proposes modifications to Fluvio's handling of message sizes. The goal is to ensure that the configuration of `batch_size`, `max_request_size`, and compression does not block the processing of messages. | ||
|
||
## Proposed Enhancements | ||
|
||
1. Handling Larger Messages than Batch Size | ||
|
||
If a single record exceeds the defined `batch_size`, Fluvio will process the record as a standalone request, ensuring that larger messages are not discarded or delayed. If the record does not exceed the `batch_size`, Fluvio will process the record as part of an already existing batch or create a new one if the batch is full. | ||
|
||
2. Handling Larger Messages than the Max Request Size | ||
|
||
Fluvio will have a new configuration parameter, max_request_size, that will define the maximum size of a request that can be sent by the producer. This configuration will make Fluvio display errors when a message exceeds the defined `max_request_size`, even if it's a message with only one record or a batch of records. | ||
|
||
3. Compression Behavior | ||
|
||
Fluvio will ensure that configuration limits that use size constraints, such as `batch_size` and `max_request_size` will only use the uncompressed message size. | ||
It affects the size of messages in transit but doesn't change the maximum request size constraints. | ||
|
||
|
||
## Fluvio CLI | ||
|
||
Preparing the environment, with a topic and a large data file: | ||
|
||
```bash | ||
fluvio topic create large-data-topic | ||
printf 'This is a sample line. ' | awk -v b=500000 '{while(length($0) < b) $0 = $0 $0}1' | cut -c1-500000 > large-data-file.txt | ||
``` | ||
|
||
### Batch Size | ||
|
||
`batch_size` will define the maximum size of a batch of records that can be sent by the producer. If a record exceeds this size, Fluvio will process the record as a standalone message. | ||
|
||
```bash | ||
fluvio produce large-data-topic --batch-size 16536 --file large-data-file.txt --raw | ||
``` | ||
|
||
There will not be any errors displayed, even if the message exceeds the batch size. But the record will be processed as a standalone message. | ||
|
||
### Max Request Size | ||
|
||
`max_request_size` will define the maximum size of a message that can be sent by the producer. If a message exceeds this size, Fluvio will throw an error. Even if it's a message with only one record or a batch of them. | ||
|
||
```bash | ||
fluvio produce large-data-topic --max-request-size 16384 --file large-data-file.txt --raw | ||
``` | ||
|
||
Will be displayed the following error: | ||
|
||
```bash | ||
the given record is larger than the max_request_size (16384 bytes). | ||
``` | ||
|
||
### Compression | ||
|
||
`batch_size` and `max_request_size` will only use the uncompressed message size. | ||
|
||
```bash | ||
fluvio produce large-data-topic --batch-size 16536 --compression gzip --file large-data-file.txt --raw | ||
fluvio produce large-data-topic --max-request-size 16384 --compression gzip --file large-data-file.txt --raw | ||
``` | ||
|
||
The first one and the second one will use the uncompressed message size to be calculated. Only the second one will display an error because the uncompressed message exceeds the max request size. | ||
|
||
## References | ||
|
||
### Kafka Behavior | ||
|
||
Kafka has a similar behavior for handling large messages than batch size and max request size. | ||
|
||
Preparing the environment, with a topic and a large data file: | ||
|
||
```bash | ||
kafka-topics --create --topic large-data-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1 | ||
printf 'This is a sample line. ' | awk -v b=500000 '{while(length($0) < b) $0 = $0 $0}1' | cut -c1-500000 > large-data-file.txt | ||
``` | ||
|
||
Producing large messages for the topic with a small batch size will not display any errors. | ||
|
||
```bash | ||
kafka-console-producer --topic large-data-topic --bootstrap-server localhost:9092 --producer-property batch.size=16384 < large-data-file.txt | ||
``` | ||
|
||
|
||
Producing large messages for the topic with a small max request size will display an error: | ||
|
||
```bash | ||
kafka-console-producer --topic large-data-topic --bootstrap-server localhost:9092 --producer-property max.request.size=16384 < large-data-file.txt | ||
org.apache.kafka.common.errors.RecordTooLargeException: The message is 500087 bytes when serialized which is larger than 16384, which is the value of the max.request.size configuration. | ||
``` | ||
|
||
Producing large messages to the topic with compression will not use the compression size to calculate the batch size: | ||
|
||
```bash | ||
kafka-console-producer --topic large-data-topic --bootstrap-server localhost:9092 --producer-property batch.size=16384 --producer-property compression.type=gzip < large-data-file.txt | ||
kafka-console-producer --topic large-data-topic --bootstrap-server localhost:9092 --producer-property max.request.size=16384 --producer-property compression.type=gzip < large-data-file.txt | ||
``` | ||
|
||
Both commands will not use the compression size to calculate the batch size and the max request size, respectively. But only the second one will display an error because the message exceeds the max request size. |