-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Should we support PARQUET_2_8 version? #35776
Comments
To provide some facts: It seems that the arrow/cpp/src/parquet/column_writer.cc Lines 2062 to 2072 in 2d32efe
arrow/cpp/src/parquet/metadata.cc Lines 1467 to 1470 in 2d32efe
arrow/cpp/src/parquet/arrow/schema.cc Lines 319 to 323 in 2d32efe
parquet-mr does not recognize any 2.x format version and always hardcodes version 1 to the footer metadata: |
My question is: should we check if any enabled feature is beyond the support of the specified format version? If yes, should we support deduce the version from enabled feature set? It is not easy for a user to know which format version to set but it is much easier to know what features are needed. |
Hmmm I go through the Rust implementions, and found that it just uses "1.0" or "2.0". All implementions use different adhoc way to setting this... |
How about adding a |
cc @emkornfield |
Yeah, this "version" field in the footer metadata is not very well specified. See also related discussion at apache/parquet-format#164 (comment) |
Something like that could be useful, yes. |
I guess it's a big hard, because checking is separted to different places...
case ArrowTypeId::TIMESTAMP:
RETURN_NOT_OK(
GetTimestampMetadata(static_cast<::arrow::TimestampType&>(*field->type()),
properties, arrow_properties, &type, &logical_type));
break;
I guess we need a validate_format(const WriterProperties& properties, const ArrowWriterProperties& arrow_properties, Schema); |
I see the problem. IMO, we can add a new option in the |
I'll try to add |
Describe the enhancement requested
Nowadays, we support BYTE_STREAM_SPLIT in parquet. However, during writing, our highest format is PARQUET_2_6. So, do we need to support Parquet 2.8 or higher version
Changelogs: https://github.com/apache/parquet-format/blob/master/CHANGES.md#version-280
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: