-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support writing to not only 3/7 protocol #693
base: main
Are you sure you want to change the base?
Support writing to not only 3/7 protocol #693
Conversation
@@ -139,136 +149,174 @@ fn new_commit_info() -> DeltaResult<Box<ArrowEngineData>> { | |||
Ok(Box::new(ArrowEngineData::new(commit_info_batch))) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there are a lot of changes in this file, but mainly it's just:
- Allowing
setup
to create a 3/7 or 1/1 table - adding a new function to generate both types of tables and associated store/engine/location
- making each test iterate over those tables instead of just one
"operationParameters": {}, | ||
"engineCommitInfo": { | ||
"engineInfo": "default engine" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for basically every test I just:
- removed the line like:
let table = create_table(store.clone(), table_location, schema, &[]).await?;
and replaced it withfor (table, engine, store, table_name) in setup_tables(schema, &[]).await? {
- updated any hard coded paths to rather use
table_name
It looks like more changes, but if you hide whitespace you'll see it's minimal.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #693 +/- ##
==========================================
- Coverage 84.09% 84.08% -0.02%
==========================================
Files 77 77
Lines 17805 17808 +3
Branches 17805 17808 +3
==========================================
Hits 14973 14973
- Misses 2117 2120 +3
Partials 715 715 ☔ View full report in Codecov by Sentry. |
"Tables with min writer version != 7 should not have table features.", | ||
)) | ||
} | ||
None => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need the default case (but with a different error message).
Every writer feature > 1 introduces new capabilities kernel must support (cumulative), with no way to know whether the table actually uses those features. See the full list here.
Versions 2, 3, and 4 are especially problematic because they respectively require column invariants, check constraints, and generated cols -- none of which we could implement because they are underspecified and their implementation is highly spark-specific.
Writer version 6 is also problematic due to identity cols. We maybe could implement support for those, but it would be an annoying time sink -- even blind appends are affected.
So basically, we can only support writer versions 1 and 7.
Meanwhile, we can actually support all three reader versions, because 2 just requires column mapping which we already support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to partly work around this because some features have an obvious effect on the table's Metadata when they're active. We can probably get away with allowing the protocol version, as long as we can prove that the table doesn't use any unsupported features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this does cover all cases. Note it's checking what writerFeatures
is set to. So in the top two cases there is a writerFeatures
in the protocol. If we're not on version 7, that's an error, if we are on 7, we validate the features as before.
The None
case is then interesting. We could actually be on any version:
- 7 is invalid because then there should be writer features, but there aren't.
- 1 is valid, and we can write to it
- all the rest are currently invalid.
I've updated the code to enforce that, but I got the sense DBR might go to (1,2) sometimes, so perhaps we should check and if it's 2 we can look in the metadata configuration and make sure it doesn't contain delta.invariants
before continuing.
thoughts?
kernel/src/actions/mod.rs
Outdated
require!( | ||
self.min_writer_version != 7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the first match arm guarantee this assertion can never fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This is the case where there was no writerFeatures
specified in the protocol. So I've switched this to be a check that we're on version 1 only (but maybe we need 2, see other comment)
let (store_37, engine_37, table_location_37) = setup("test_table_37", true); | ||
let (store_11, engine_11, table_location_11) = setup("test_table_11", true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do the different tables need different engines, out of curiosity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pass the storage to the engine, so because we want disjoint storage, we need to have two engines
What changes are proposed in this pull request?
Our previous check was too strict. Now we just ensure that the protocol makes sense given what features are present/specified.
How was this change tested?
Made all existing
write.rs
tests also write to a protocol 1/1 table, and they all work.