[Proposal]: Add UUID conversion to and from 16 byte fixed sequences #100
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
UUIDs are often passed around in application code in their canonical, hex as string representation e.g. "550e8400-e29b-41d4-a716-446655440000". Encoding UUIDs as Avro "string"s takes 37 bytes, while encoding UUIDs in their binary form fits into a 16 byte sized "fixed", saving 21 bytes per encoding.
This change allows application code to keep passing around canonical hex UUIDs while converting to the compact encoding, requiring only
uuid_format: :canonical_string
to be given in decode options.The Java reference implementation also supports encoding UUIDs as both strings and 16 byte fixed sequences.
Encoding is augmented such that a 16 byte fixed schema with
%{"logicalType" => "uuid"}
, converts a hex-string UUID to the 16 byte binary representation.Decoding is augmented such that given
uuid_format: :canonical_string
in decode options, the binary representation is converted to the canonical hex-string representation.The encoding change is nearly backwards-compatible, previously when given an incorrectly size "fixed" with
{"logicalType": "uuid"}
, an error was raised, while now conversion is attempted.The decoding change is fully backwards-compatible, as
uuid_format
defaults to:binary
.For UUID codec, the
uniq
library was added (no transitive dependencies).