Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(types): remove type_name and field_descs from ColumnDesc #20497

Merged
merged 10 commits into from
Feb 20, 2025

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Feb 14, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

The follow-up to #20496. ColumnDesc is simply Field with some extra information.

field_descs will be generated on the fly based on the StructType when it's used in DESCRIBE. Resolve #17128.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

Comment on lines +66 to +69
// TODO(struct): since we deprecated `field_descs` in `ColumnDesc`, there's no need to
// traverse the fields here except for maintaining the same column id (index) as the
// original implementation.
let _field_descs = if let DataType::List { .. } = field_type {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result, there's no information for the "column" id of the nested fields, which seems to make resolving #19755 even more challenging. I may reintroduce it in future PRs, but in a clearer manner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, does it mean that we will only be able to alter subfields for newly created structs? 🤔

Copy link
Member Author

@BugenZhao BugenZhao Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that true as if we really want this to be available on existing tables, we can introduce a "mini" multi-versioned schema for the STRUCT type when we are going to upgrade its storage representation for compatibility purposes.

@@ -128,27 +126,14 @@ pub struct ColumnDesc {

impl ColumnDesc {
pub fn unnamed(column_id: ColumnId, data_type: DataType) -> ColumnDesc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could rename it to unnamed_for_test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just for_test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused when we have unnamed column, and found it's just in tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counting on @xiangjinwu! 😁

Comment on lines 210 to 211
/// Should only be used for observability/debugging purposes.
pub fn flatten(&self) -> Vec<ColumnDesc> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should only be used for observability/debugging purposes.

I think for these we should always name it like xxx_for_debug to avoid misuse. Because developers don't always read docs. 🤣

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there is only one caller in the handler for DESCRIBE table. I might update it in the future to stop returning ColumnDesc, which would prevent potential misuse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update it in the future to stop returning ColumnDesc, which would prevent potential misuse.

Agree. Or it can be moved as part of describe's implementation, rather than a reusable method on common::catalog::column::ColumnDesc.

Copy link
Member Author

@BugenZhao BugenZhao Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

Comment on lines 210 to 211
/// Should only be used for observability/debugging purposes.
pub fn flatten(&self) -> Vec<ColumnDesc> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update it in the future to stop returning ColumnDesc, which would prevent potential misuse.

Agree. Or it can be moved as part of describe's implementation, rather than a reusable method on common::catalog::column::ColumnDesc.

Comment on lines 233 to +220
pub fn new_struct(
name: &str,
column_id: i32,
type_name: &str,
_type_name: &str,
fields: Vec<ColumnDesc>,
) -> Self {
// TODO(struct): assign type name to the struct once supported
let data_type =
StructType::new(fields.iter().map(|f| (&f.name, f.data_type.clone()))).into();
Self {
data_type,
column_id: ColumnId::new(column_id),
name: name.to_owned(),
field_descs: fields,
type_name: type_name.to_owned(),
generated_or_default_column: None,
description: None,
additional_column: AdditionalColumn { column_type: None },
version: ColumnDescVersion::LATEST,
system_column: None,
}

Self::named(name, ColumnId::new(column_id), data_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will deprecate this in favor of named as well.

additional_column: c.additional_column.clone().into(),
version: c.version as i32,
}
c.to_protobuf()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not notice such duplication 🤯

Comment on lines +132 to +134
// TODO(struct): since we deprecated `field_descs` in `ColumnDesc`, there's no need to
// traverse the fields here except for maintaining the same column id (index) as the
// original implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the immediate motivation that I restarted #19538 after putting #13104 on hold.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, because we always reassign the ColumnId in the final step of creating a source or table, there's no need to allocate a column ID here (but simply specifying ColumnId::placeholder()), unless we need the uniqueness property of ColumnId before that step.

Base automatically changed from bz/struct-part-2 to main February 20, 2025 09:42
Copy link

graphite-app bot commented Feb 20, 2025

Merge activity

  • Feb 20, 4:44 AM EST: Graphite rebased this pull request after merging its parent, because this pull request is set to merge when ready.

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
@BugenZhao BugenZhao added this pull request to the merge queue Feb 20, 2025
Merged via the queue into main with commit c632835 Feb 20, 2025
31 checks passed
@BugenZhao BugenZhao deleted the bz/struct-part-3 branch February 20, 2025 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

when using external schema, struct's fields are not shown in describe <table>
3 participants