From 0bc975e18ea8d804473004da6e1107ec2b52040b Mon Sep 17 00:00:00 2001 From: Aihua Xu Date: Fri, 18 Oct 2024 11:19:28 -0700 Subject: [PATCH] Update sorting/bucketing/json sections for variant --- format/spec.md | 204 +++++++++++++++++++++++++------------------------ 1 file changed, 106 insertions(+), 98 deletions(-) diff --git a/format/spec.md b/format/spec.md index ff163a31379d..c9cd5ebd4383 100644 --- a/format/spec.md +++ b/format/spec.md @@ -180,8 +180,8 @@ A **`map`** is a collection of key-value pairs with a key type and a value type. ### Semi-structured Types -A **`variant`** is a type to represent semi-structured data. A variant value can store a value of any other type, including any primitive, struct, list or map value. The variant encoding is defined the [Apache Parquet Project](https://github.com/apache/parquet-format/blob/4f208158dba80ff4bff4afaa4441d7270103dff6/VariantEncoding.md). Variant type is added in [v3](#version-3). -Limitation: only map value with string-type keys is supported in variant. +A **`variant`** is a type to represent semi-structured data. A variant value can store a value of any other type, including `null`, any primitive, struct, list or map value. The variant encoding is defined the [Apache Parquet Project](https://github.com/apache/parquet-format/blob/4f208158dba80ff4bff4afaa4441d7270103dff6/VariantEncoding.md). Variant type is added in [v3](#version-3). +Limitation: map value with only string-type keys is supported in variant. ### Primitive Types @@ -362,16 +362,16 @@ Partition field IDs must be reused if an existing partition spec contains an equ ### Partition Transforms -| Transform name | Description | Source types | Result type | -|-------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------| -| **`identity`** | Source value, unmodified | Any | Source type | +| Transform name | Description | Source types | Result type | +|-------------------|--------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|-------------| +| **`identity`** | Source value, unmodified | Any other than `variant` | Source type | | **`bucket[N]`** | Hash of value, mod `N` (see below) | `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int` | -| **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string`, `binary` | Source type | -| **`year`** | Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`hour`** | Extract a timestamp hour, as hours from 1970-01-01 00:00:00 | `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`void`** | Always produces `null` | Any | Source type or `int` | +| **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string`, `binary` | Source type | +| **`year`** | Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`hour`** | Extract a timestamp hour, as hours from 1970-01-01 00:00:00 | `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`void`** | Always produces `null` | Any | Source type or `int` | All transforms must return `null` for a `null` input value. @@ -449,6 +449,9 @@ Sorting floating-point numbers should produce the following behavior: `-NaN` < ` A data or delete file is associated with a sort order by the sort order's id within [a manifest](#manifests). Therefore, the table must declare all the sort orders for lookup. A table could also be configured with a default sort order id, indicating how the new data should be sorted by default. Writers should use this default sort order to sort the data on write, but are not required to if the default order is prohibitively expensive, as it would be for streaming writes. +Note: + +1. `variant` columns are not valid for sorting. ## Manifests @@ -1030,28 +1033,29 @@ Values should be stored in Parquet using the types and logical type annotations Lists must use the [3-level representation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists). -| Type | Parquet physical type | Logical type | Notes | -|--------------------|--------------------------------------------------------------------|---------------------------------------------|----------------------------------------------------------------| -| **`unknown`** | None | | Omit from data files | -| **`boolean`** | `boolean` | | | -| **`int`** | `int` | | | -| **`long`** | `long` | | | -| **`float`** | `float` | | | -| **`double`** | `double` | | | -| **`decimal(P,S)`** | `P <= 9`: `int32`,
`P <= 18`: `int64`,
`fixed` otherwise | `DECIMAL(P,S)` | Fixed must use the minimum number of bytes that can store `P`. | -| **`date`** | `int32` | `DATE` | Stores days from 1970-01-01. | -| **`time`** | `int64` | `TIME_MICROS` with `adjustToUtc=false` | Stores microseconds from midnight. | -| **`timestamp`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores microseconds from 1970-01-01 00:00:00.000000. | -| **`timestamptz`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=true` | Stores microseconds from 1970-01-01 00:00:00.000000 UTC. | -| **`timestamp_ns`** | `int64` | `TIMESTAMP_NANOS` with `adjustToUtc=false` | Stores nanoseconds from 1970-01-01 00:00:00.000000000. | -| **`timestamptz_ns`** | `int64` | `TIMESTAMP_NANOS` with `adjustToUtc=true` | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC. | -| **`string`** | `binary` | `UTF8` | Encoding must be UTF-8. | -| **`uuid`** | `fixed_len_byte_array[16]` | `UUID` | | -| **`fixed(L)`** | `fixed_len_byte_array[L]` | | | -| **`binary`** | `binary` | | | -| **`struct`** | `group` | | | -| **`list`** | `3-level list` | `LIST` | See Parquet docs for 3-level representation. | -| **`map`** | `3-level map` | `MAP` | See Parquet docs for 3-level representation. | +| Type | Parquet physical type | Logical type | Notes | +|----------------------|--------------------------------------------------------------------|---------------------------------------|----------------------------------------------------------------| +| **`unknown`** | None | | Omit from data files | +| **`boolean`** | `boolean` | | | +| **`int`** | `int` | | | +| **`long`** | `long` | | | +| **`float`** | `float` | | | +| **`double`** | `double` | | | +| **`decimal(P,S)`** | `P <= 9`: `int32`,
`P <= 18`: `int64`,
`fixed` otherwise | `DECIMAL(P,S)` | Fixed must use the minimum number of bytes that can store `P`. | +| **`date`** | `int32` | `DATE` | Stores days from 1970-01-01. | +| **`time`** | `int64` | `TIME_MICROS` with `adjustToUtc=false` | Stores microseconds from midnight. | +| **`timestamp`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores microseconds from 1970-01-01 00:00:00.000000. | +| **`timestamptz`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=true` | Stores microseconds from 1970-01-01 00:00:00.000000 UTC. | +| **`timestamp_ns`** | `int64` | `TIMESTAMP_NANOS` with `adjustToUtc=false` | Stores nanoseconds from 1970-01-01 00:00:00.000000000. | +| **`timestamptz_ns`** | `int64` | `TIMESTAMP_NANOS` with `adjustToUtc=true` | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC. | +| **`string`** | `binary` | `UTF8` | Encoding must be UTF-8. | +| **`uuid`** | `fixed_len_byte_array[16]` | `UUID` | | +| **`fixed(L)`** | `fixed_len_byte_array[L]` | | | +| **`binary`** | `binary` | | | +| **`struct`** | `group` | | | +| **`list`** | `3-level list` | `LIST` | See Parquet docs for 3-level representation. | +| **`map`** | `3-level map` | `MAP` | See Parquet docs for 3-level representation. | +| **`variant`** | `group` with `Data` and `Metadata` fields of `binary` type | | See Parquet docs for Variant encoding. | When reading an `unknown` column, any corresponding column must be ignored and replaced with `null` values. @@ -1138,6 +1142,7 @@ Hash results are not dependent on decimal scale, which is part of the type, not 4. UUIDs are encoded using big endian. The test UUID for the example above is: `f79c3e09-677c-4bbd-a479-3f349cb785e7`. This UUID encoded as a byte array is: `F7 9C 3E 09 67 7C 4B BD A4 79 3F 34 9C B7 85 E7` 5. `doubleToLongBits` must give the IEEE 754 compliant bit representation of the double value. All `NaN` bit patterns must be canonicalized to `0x7ff8000000000000L`. Negative zero (`-0.0`) must be canonicalized to positive zero (`0.0`). Float hash values are the result of hashing the float cast to double to ensure that schema evolution does not change hash values if float types are promoted. +6. `variant` values are currently not valid for bucketing and so they are not hashed. ## Appendix C: JSON serialization @@ -1153,28 +1158,29 @@ Schemas are serialized as a JSON object with the same fields as a struct in the Types are serialized according to this table: -|Type|JSON representation|Example| -|--- |--- |--- | -|**`unknown`**|`JSON string: "unknown"`|`"unknown"`| -|**`boolean`**|`JSON string: "boolean"`|`"boolean"`| -|**`int`**|`JSON string: "int"`|`"int"`| -|**`long`**|`JSON string: "long"`|`"long"`| -|**`float`**|`JSON string: "float"`|`"float"`| -|**`double`**|`JSON string: "double"`|`"double"`| -|**`date`**|`JSON string: "date"`|`"date"`| -|**`time`**|`JSON string: "time"`|`"time"`| -|**`timestamp, microseconds, without zone`**|`JSON string: "timestamp"`|`"timestamp"`| -|**`timestamp, microseconds, with zone`**|`JSON string: "timestamptz"`|`"timestamptz"`| -|**`timestamp, nanoseconds, without zone`**|`JSON string: "timestamp_ns"`|`"timestamp_ns"`| -|**`timestamp, nanoseconds, with zone`**|`JSON string: "timestamptz_ns"`|`"timestamptz_ns"`| -|**`string`**|`JSON string: "string"`|`"string"`| -|**`uuid`**|`JSON string: "uuid"`|`"uuid"`| -|**`fixed(L)`**|`JSON string: "fixed[]"`|`"fixed[16]"`| -|**`binary`**|`JSON string: "binary"`|`"binary"`| -|**`decimal(P, S)`**|`JSON string: "decimal(

,)"`|`"decimal(9,2)"`,
`"decimal(9, 2)"`| -|**`struct`**|`JSON object: {`
  `"type": "struct",`
  `"fields": [ {`
    `"id": ,`
    `"name": ,`
    `"required": ,`
    `"type": ,`
    `"doc": ,`
    `"initial-default": ,`
    `"write-default": `
    `}, ...`
  `] }`|`{`
  `"type": "struct",`
  `"fields": [ {`
    `"id": 1,`
    `"name": "id",`
    `"required": true,`
    `"type": "uuid",`
    `"initial-default": "0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb",`
    `"write-default": "ec5911be-b0a7-458c-8438-c9a3e53cffae"`
  `}, {`
    `"id": 2,`
    `"name": "data",`
    `"required": false,`
    `"type": {`
      `"type": "list",`
      `...`
    `}`
  `} ]`
`}`| -|**`list`**|`JSON object: {`
  `"type": "list",`
  `"element-id": ,`
  `"element-required": `
  `"element": `
`}`|`{`
  `"type": "list",`
  `"element-id": 3,`
  `"element-required": true,`
  `"element": "string"`
`}`| -|**`map`**|`JSON object: {`
  `"type": "map",`
  `"key-id": ,`
  `"key": ,`
  `"value-id": ,`
  `"value-required": `
  `"value": `
`}`|`{`
  `"type": "map",`
  `"key-id": 4,`
  `"key": "string",`
  `"value-id": 5,`
  `"value-required": false,`
  `"value": "double"`
`}`| +| Type | JSON representation | Example | +|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **`unknown`** | `JSON string: "unknown"` | `"unknown"` | +| **`boolean`** | `JSON string: "boolean"` | `"boolean"` | +| **`int`** | `JSON string: "int"` | `"int"` | +| **`long`** | `JSON string: "long"` | `"long"` | +| **`float`** | `JSON string: "float"` | `"float"` | +| **`double`** | `JSON string: "double"` | `"double"` | +| **`date`** | `JSON string: "date"` | `"date"` | +| **`time`** | `JSON string: "time"` | `"time"` | +| **`timestamp, microseconds, without zone`** | `JSON string: "timestamp"` | `"timestamp"` | +| **`timestamp, microseconds, with zone`** | `JSON string: "timestamptz"` | `"timestamptz"` | +| **`timestamp, nanoseconds, without zone`** | `JSON string: "timestamp_ns"` | `"timestamp_ns"` | +| **`timestamp, nanoseconds, with zone`** | `JSON string: "timestamptz_ns"` | `"timestamptz_ns"` | +| **`string`** | `JSON string: "string"` | `"string"` | +| **`uuid`** | `JSON string: "uuid"` | `"uuid"` | +| **`fixed(L)`** | `JSON string: "fixed[]"` | `"fixed[16]"` | +| **`binary`** | `JSON string: "binary"` | `"binary"` | +| **`decimal(P, S)`** | `JSON string: "decimal(

,)"` | `"decimal(9,2)"`,
`"decimal(9, 2)"` | +| **`struct`** | `JSON object: {`
  `"type": "struct",`
  `"fields": [ {`
    `"id": ,`
    `"name": ,`
    `"required": ,`
    `"type": ,`
    `"doc": ,`
    `"initial-default": ,`
    `"write-default": `
    `}, ...`
  `] }` | `{`
  `"type": "struct",`
  `"fields": [ {`
    `"id": 1,`
    `"name": "id",`
    `"required": true,`
    `"type": "uuid",`
    `"initial-default": "0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb",`
    `"write-default": "ec5911be-b0a7-458c-8438-c9a3e53cffae"`
  `}, {`
    `"id": 2,`
    `"name": "data",`
    `"required": false,`
    `"type": {`
      `"type": "list",`
      `...`
    `}`
  `} ]`
`}` | +| **`list`** | `JSON object: {`
  `"type": "list",`
  `"element-id": ,`
  `"element-required": `
  `"element": `
`}` | `{`
  `"type": "list",`
  `"element-id": 3,`
  `"element-required": true,`
  `"element": "string"`
`}` | +| **`map`** | `JSON object: {`
  `"type": "map",`
  `"key-id": ,`
  `"key": ,`
  `"value-id": ,`
  `"value-required": `
  `"value": `
`}` | `{`
  `"type": "map",`
  `"key-id": 4,`
  `"key": "string",`
  `"value-id": 5,`
  `"value-required": false,`
  `"value": "double"`
`}` | +| **`variant`** | `JSON string: "variant"` | `"variant"` | Note that default values are serialized using the JSON single-value serialization in [Appendix D](#appendix-d-single-value-serialization). @@ -1302,54 +1308,56 @@ Example This serialization scheme is for storing single values as individual binary values in the lower and upper bounds maps of manifest files. -| Type | Binary serialization | -|------------------------------|--------------------------------------------------------------------------------------------------------------| -| **`unknown`** | Not supported | -| **`boolean`** | `0x00` for false, non-zero byte for true | -| **`int`** | Stored as 4-byte little-endian | -| **`long`** | Stored as 8-byte little-endian | -| **`float`** | Stored as 4-byte little-endian | -| **`double`** | Stored as 8-byte little-endian | -| **`date`** | Stores days from the 1970-01-01 in an 4-byte little-endian int | -| **`time`** | Stores microseconds from midnight in an 8-byte little-endian long | -| **`timestamp`** | Stores microseconds from 1970-01-01 00:00:00.000000 in an 8-byte little-endian long | -| **`timestamptz`** | Stores microseconds from 1970-01-01 00:00:00.000000 UTC in an 8-byte little-endian long | -| **`timestamp_ns`** | Stores nanoseconds from 1970-01-01 00:00:00.000000000 in an 8-byte little-endian long | -| **`timestamptz_ns`** | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC in an 8-byte little-endian long | -| **`string`** | UTF-8 bytes (without length) | -| **`uuid`** | 16-byte big-endian value, see example in Appendix B | -| **`fixed(L)`** | Binary value | -| **`binary`** | Binary value (without length) | -| **`decimal(P, S)`** | Stores unscaled value as two’s-complement big-endian binary, using the minimum number of bytes for the value | -| **`struct`** | Not supported | -| **`list`** | Not supported | -| **`map`** | Not supported | +| Type | Binary serialization | +|----------------------|--------------------------------------------------------------------------------------------------------------| +| **`unknown`** | Not supported | +| **`boolean`** | `0x00` for false, non-zero byte for true | +| **`int`** | Stored as 4-byte little-endian | +| **`long`** | Stored as 8-byte little-endian | +| **`float`** | Stored as 4-byte little-endian | +| **`double`** | Stored as 8-byte little-endian | +| **`date`** | Stores days from the 1970-01-01 in an 4-byte little-endian int | +| **`time`** | Stores microseconds from midnight in an 8-byte little-endian long | +| **`timestamp`** | Stores microseconds from 1970-01-01 00:00:00.000000 in an 8-byte little-endian long | +| **`timestamptz`** | Stores microseconds from 1970-01-01 00:00:00.000000 UTC in an 8-byte little-endian long | +| **`timestamp_ns`** | Stores nanoseconds from 1970-01-01 00:00:00.000000000 in an 8-byte little-endian long | +| **`timestamptz_ns`** | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC in an 8-byte little-endian long | +| **`string`** | UTF-8 bytes (without length) | +| **`uuid`** | 16-byte big-endian value, see example in Appendix B | +| **`fixed(L)`** | Binary value | +| **`binary`** | Binary value (without length) | +| **`decimal(P, S)`** | Stores unscaled value as two’s-complement big-endian binary, using the minimum number of bytes for the value | +| **`struct`** | Not supported | +| **`list`** | Not supported | +| **`map`** | Not supported | +| **`variant`** | Not supported | ### JSON single-value serialization Single values are serialized as JSON by type according to the following table: -| Type | JSON representation | Example | Description | -| ------------------ | ----------------------------------------- | ------------------------------------------ | -- | -| **`boolean`** | **`JSON boolean`** | `true` | | -| **`int`** | **`JSON int`** | `34` | | -| **`long`** | **`JSON long`** | `34` | | -| **`float`** | **`JSON number`** | `1.0` | | -| **`double`** | **`JSON number`** | `1.0` | | -| **`decimal(P,S)`** | **`JSON string`** | `"14.20"`, `"2E+20"` | Stores the string representation of the decimal value, specifically, for values with a positive scale, the number of digits to the right of the decimal point is used to indicate scale, for values with a negative scale, the scientific notation is used and the exponent must equal the negated scale | -| **`date`** | **`JSON string`** | `"2017-11-16"` | Stores ISO-8601 standard date | -| **`time`** | **`JSON string`** | `"22:31:08.123456"` | Stores ISO-8601 standard time with microsecond precision | -| **`timestamp`** | **`JSON string`** | `"2017-11-16T22:31:08.123456"` | Stores ISO-8601 standard timestamp with microsecond precision; must not include a zone offset | -| **`timestamptz`** | **`JSON string`** | `"2017-11-16T22:31:08.123456+00:00"` | Stores ISO-8601 standard timestamp with microsecond precision; must include a zone offset and it must be '+00:00' | -| **`timestamp_ns`** | **`JSON string`** | `"2017-11-16T22:31:08.123456789"` | Stores ISO-8601 standard timestamp with nanosecond precision; must not include a zone offset | -| **`timestamptz_ns`** | **`JSON string`** | `"2017-11-16T22:31:08.123456789+00:00"` | Stores ISO-8601 standard timestamp with nanosecond precision; must include a zone offset and it must be '+00:00' | -| **`string`** | **`JSON string`** | `"iceberg"` | | -| **`uuid`** | **`JSON string`** | `"f79c3e09-677c-4bbd-a479-3f349cb785e7"` | Stores the lowercase uuid string | -| **`fixed(L)`** | **`JSON string`** | `"000102ff"` | Stored as a hexadecimal string | -| **`binary`** | **`JSON string`** | `"000102ff"` | Stored as a hexadecimal string | -| **`struct`** | **`JSON object by field ID`** | `{"1": 1, "2": "bar"}` | Stores struct fields using the field ID as the JSON field name; field values are stored using this JSON single-value format | -| **`list`** | **`JSON array of values`** | `[1, 2, 3]` | Stores a JSON array of values that are serialized using this JSON single-value format | -| **`map`** | **`JSON object of key and value arrays`** | `{ "keys": ["a", "b"], "values": [1, 2] }` | Stores arrays of keys and values; individual keys and values are serialized using this JSON single-value format | +| Type | JSON representation | Example | Description | +|----------------------|--------------------------------------------------------------|--------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **`boolean`** | **`JSON boolean`** | `true` | | +| **`int`** | **`JSON int`** | `34` | | +| **`long`** | **`JSON long`** | `34` | | +| **`float`** | **`JSON number`** | `1.0` | | +| **`double`** | **`JSON number`** | `1.0` | | +| **`decimal(P,S)`** | **`JSON string`** | `"14.20"`, `"2E+20"` | Stores the string representation of the decimal value, specifically, for values with a positive scale, the number of digits to the right of the decimal point is used to indicate scale, for values with a negative scale, the scientific notation is used and the exponent must equal the negated scale | +| **`date`** | **`JSON string`** | `"2017-11-16"` | Stores ISO-8601 standard date | +| **`time`** | **`JSON string`** | `"22:31:08.123456"` | Stores ISO-8601 standard time with microsecond precision | +| **`timestamp`** | **`JSON string`** | `"2017-11-16T22:31:08.123456"` | Stores ISO-8601 standard timestamp with microsecond precision; must not include a zone offset | +| **`timestamptz`** | **`JSON string`** | `"2017-11-16T22:31:08.123456+00:00"` | Stores ISO-8601 standard timestamp with microsecond precision; must include a zone offset and it must be '+00:00' | +| **`timestamp_ns`** | **`JSON string`** | `"2017-11-16T22:31:08.123456789"` | Stores ISO-8601 standard timestamp with nanosecond precision; must not include a zone offset | +| **`timestamptz_ns`** | **`JSON string`** | `"2017-11-16T22:31:08.123456789+00:00"` | Stores ISO-8601 standard timestamp with nanosecond precision; must include a zone offset and it must be '+00:00' | +| **`string`** | **`JSON string`** | `"iceberg"` | | +| **`uuid`** | **`JSON string`** | `"f79c3e09-677c-4bbd-a479-3f349cb785e7"` | Stores the lowercase uuid string | +| **`fixed(L)`** | **`JSON string`** | `"000102ff"` | Stored as a hexadecimal string | +| **`binary`** | **`JSON string`** | `"000102ff"` | Stored as a hexadecimal string | +| **`struct`** | **`JSON object by field ID`** | `{"1": 1, "2": "bar"}` | Stores struct fields using the field ID as the JSON field name; field values are stored using this JSON single-value format | +| **`list`** | **`JSON array of values`** | `[1, 2, 3]` | Stores a JSON array of values that are serialized using this JSON single-value format | +| **`map`** | **`JSON object of key and value arrays`** | `{ "keys": ["a", "b"], "values": [1, 2] }` | Stores arrays of keys and values; individual keys and values are serialized using this JSON single-value format | +| **`variant`** | **`Same JSON representation in this table for stored type`** | `null`, `true`, `{"1": 1, "2": "bar"}` | The JSON representation matches the format shown in this table for the type stored in the Variant. | ## Appendix E: Format version changes