Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg Metadata for Variant Shredding #11958

Open
6 tasks
aihuaxu opened this issue Jan 13, 2025 · 0 comments
Open
6 tasks

Iceberg Metadata for Variant Shredding #11958

aihuaxu opened this issue Jan 13, 2025 · 0 comments
Labels
proposal Iceberg Improvement Proposal (spec/major changes/etc)

Comments

@aihuaxu
Copy link
Contributor

aihuaxu commented Jan 13, 2025

Proposed Change

To support file pruning for shredded subcolumns, it is necessary to collect subcolumn metadata in the manifest files. The proposal includes the following key points:

  • Lower and upper bounds for subcolumns of a Variant column are encoded as Variant values.

  • Other metadata, such as value_counts and null_value_counts, will not be collected for Variant columns.

  • Type promotion from primitive types to Variant will not be supported.

  • Bounds will be stored only if the subcolumn values match the shredded types.

This approach ensures efficient pruning while minimizing changes to the metadata format.

Proposal document

https://docs.google.com/document/d/1gAvt0x_ez89O8y-YqkCdMnTEykb-583YslYOgzf5sPg/edit?tab=t.0#heading=h.escoiuuiw331

Specifications

  • Table
  • View
  • REST
  • Puffin
  • Encryption
  • Other

Part of #10392

@aihuaxu aihuaxu added the proposal Iceberg Improvement Proposal (spec/major changes/etc) label Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Iceberg Improvement Proposal (spec/major changes/etc)
Projects
None yet
Development

No branches or pull requests

1 participant