-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add relevant NOTICE portions from ALv2 bundled dependencies #12095
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jbonofre for cleaning this up! 🙌 I left some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jbonofre , I had to go through https://infra.apache.org/licensing-howto.htm a few times but these changes seem right to me now. I'll wait for @rdblue @Fokko input , especially since we need to make sure this is correct for the release.
fc5395d
to
b88d315
Compare
@amogh-jahagirdar @rdblue @Fokko I fixed the versions in the bundle jar files. I added fixed on flink-runtime. I'm checking/updating LICENSE/NOTICE in spark-runtime now. |
This binary artifact includes Apache Parquet 1.15.0 with the following in its NOTICE file: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do the same update for the older Flink versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to do a pass on latest versions (for Flink and Spark), and if we are all good with the changes, I will check deps versions in previous Flink/Spark versions.
Does it work for you ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I just wanted to make sure we didn't forget
FYI, I checked/fixed/updates spark-runtime |
This binary artifact includes Apache Parquet 1.15.0 with the following in its NOTICE file: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I just wanted to make sure we didn't forget
Group: com.google.flatbuffers Name: flatbuffers-java Version: 23.5.26 | ||
Copyright: 2013-2020 Google Inc. | ||
Home page: https://google.github.io/flatbuffers/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually bundled in the release? I know our LICENSE mentioned it prior to this change but it looks like this is part of the test runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's in the jar (I checked the deps in the jar directly).
Copyright: 2014-2020 The Apache Software Foundation. | ||
Home page: https://parquet.apache.org/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🫠
| src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java | ||
| contains test data from http://aspell.net/test/orig/batch0.tab. | ||
| Copyright (C) 2002 Kevin Atkinson ([email protected]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we ship this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the NOTICE
from commons-codec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shouldn't we include stuff in the notice when is it strickly required? This is pointing at a class in test/org/apache/commons/
| Apache Avro | ||
| Copyright 2010-2015 The Apache Software Foundation | ||
| | ||
| This product includes software developed at | ||
| The Apache Software Foundation (http://www.apache.org/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also mention Avro above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the NOTICE
from Parquet.
| | This product includes software developed at | ||
| | The Apache Software Foundation (https://www.apache.org/). | ||
| | ||
| csharp reflect serializers were contributed by Pitney Bowes Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is C# related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied the NOTICE as it is
| | ||
| -------------------------------------------------------------------------------- | ||
| | ||
| This product includes parquet-tools, initially developed at ARRIS, Inc. with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parquet-tools hasn't been released since 1.11.2: https://mvnrepository.com/artifact/org.apache.parquet/parquet-tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix this upstream as well: apache/parquet-java#3140
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything starting with | is the content of the dependency NOTICE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think for now we should just copy the content of the NOTICE as is for the particular version of the dependency we are bundling. If there's some issue in the upstream project's notice, we can fix that there and update our NOTICE after the dependency is released?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that, just pointing out that we don't strictly follow the rules of how to compose a NOTICE. For the bundled products, I think it is less error-prone to have scripts copy the whole NOTICE, instead of having to curate the NOTICES ourselves. Previously we did everything by hand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great improvement! Thanks @jbonofre 🙌
| * Eclipse Distribution License, Version 1.0 | ||
| * Eclipse Public License, Version 1.0 | ||
| * Eclipse Public License, Version 2.0 | ||
| * GNU General Public License, Version 2 with the GNU Classpath Exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a release blocker. If there is GPL code in Nessie, it must be removed as a dependency and we can no longer ship Nessie in our runtime bundles. @jbonofre, @amogh-jahagirdar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a look. Does the fact that it's a "classpath exception" matter or not? https://softwareengineering.stackexchange.com/questions/119436/what-does-gpl-with-classpath-exception-mean-in-practice
From my reading of this, the classpath exception has the effect of not having the typical incompatibility between Apache and GPL licenses but I could be oversimplifying/misunderstanding that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked in Nessie and according to https://github.com/projectnessie/nessie/releases/download/nessie-0.102.2/nessie-aggregated-license-report-0.102.2.zip
It's dual license CDDL + GPL or EPL + GPL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.apache.org/legal/resolved.html#category-x
"GNU Classpath Exception" can be included in ASF project (under special exception)
Places restrictions on larger works:
GNU GPL 1, 2, 3
Special exceptions to the GNU GPL (e.g. GNU Classpath) unless otherwise permitted elsewhere on this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no: Classpath Extension allows you to add GPL dependency in your classpath (that's the name). But you can not bundle GPL dependency in ASF project.
@@ -244,8 +372,63 @@ This binary artifact includes Apache Arrow with the following in its NOTICE file | |||
|
|||
-------------------------------------------------------------------------------- | |||
|
|||
This binary artifact includes Netty buffers with the following in its NOTICE | |||
file: | |||
This binary artifact includes Apache Spark 3.5.4 with the following in its NOTICE file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbonofre, Spark is not included in the runtime bundle -- I just double checked the Jar. It is provided by the Spark installation where the runtime bundle is used. I don't think this is accurate.
For the Spark runtime bundle, we shade the runtimeClasspath
dependency configuration and excluded dependencies that are coming from the Spark installation. Dependencies are excluded from the implementation
configuration:
configurations {
implementation {
exclude group: 'org.apache.spark'
// included in Spark
exclude group: 'org.slf4j'
exclude group: 'org.apache.commons'
exclude group: 'commons-pool'
exclude group: 'commons-codec'
exclude group: 'org.xerial.snappy'
exclude group: 'javax.xml.bind'
exclude group: 'javax.annotation'
exclude group: 'com.github.luben'
exclude group: 'com.ibm.icu'
exclude group: 'org.glassfish'
exclude group: 'org.abego.treelayout'
exclude group: 'org.antlr'
exclude group: 'org.scala-lang'
exclude group: 'org.scala-lang.modules'
}
}
Here's the expected set of dependencies from the runtimeClasspath
configuration:
runtimeClasspath - Runtime classpath of source set 'main'.
+--- project :iceberg-api
| \--- project :iceberg-bundled-guava
+--- project :iceberg-spark:iceberg-spark-3.5_2.12
| +--- project :iceberg-api (*)
| +--- project :iceberg-bundled-guava
| +--- project :iceberg-common
| | \--- project :iceberg-bundled-guava
| +--- project :iceberg-core
| | +--- project :iceberg-api (*)
| | +--- org.apache.avro:avro:1.12.0
| | | +--- com.fasterxml.jackson.core:jackson-core:2.17.2 -> 2.15.2
| | | | \--- com.fasterxml.jackson:jackson-bom:2.15.2 -> 2.18.2
| | | | +--- com.fasterxml.jackson.core:jackson-annotations:2.18.2 (c)
| | | | +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (c)
| | | | +--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (c)
| | | | \--- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.18.2 (c)
| | | \--- com.fasterxml.jackson.core:jackson-databind:2.17.2 -> 2.15.2
| | | +--- com.fasterxml.jackson.core:jackson-annotations:2.15.2 -> 2.18.2
| | | | \--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
| | | +--- com.fasterxml.jackson.core:jackson-core:2.15.2 (*)
| | | \--- com.fasterxml.jackson:jackson-bom:2.15.2 -> 2.18.2 (*)
| | +--- project :iceberg-common (*)
| | +--- project :iceberg-bundled-guava
| | +--- io.airlift:aircompressor:0.27 -> 2.0.2
| | +--- org.apache.httpcomponents.client5:httpclient5:5.4.1
| | | +--- org.apache.httpcomponents.core5:httpcore5:5.3.1
| | | \--- org.apache.httpcomponents.core5:httpcore5-h2:5.3.1
| | | \--- org.apache.httpcomponents.core5:httpcore5:5.3.1
| | +--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
| | +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (*)
| | +--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (*)
| | +--- com.github.ben-manes.caffeine:caffeine:2.9.3
| | | +--- org.checkerframework:checker-qual:3.19.0
| | | \--- com.google.errorprone:error_prone_annotations:2.10.0
| | \--- org.roaringbitmap:RoaringBitmap:1.3.0
| +--- project :iceberg-data
| | +--- project :iceberg-api (*)
| | +--- project :iceberg-bundled-guava
| | +--- project :iceberg-core (*)
| | +--- org.apache.orc:orc-core:1.9.5
| | | +--- org.apache.orc:orc-shims:1.9.5
| | | +--- io.airlift:aircompressor:0.27 -> 2.0.2
| | | +--- org.jetbrains:annotations:17.0.0
| | | \--- org.threeten:threeten-extra:1.7.1
| | \--- org.apache.parquet:parquet-avro:1.15.0
| | +--- org.apache.parquet:parquet-column:1.15.0
| | | +--- org.apache.parquet:parquet-common:1.15.0
| | | | \--- org.apache.parquet:parquet-format-structures:1.15.0
| | | \--- org.apache.parquet:parquet-encoding:1.15.0
| | | \--- org.apache.parquet:parquet-common:1.15.0 (*)
| | +--- org.apache.parquet:parquet-hadoop:1.15.0
| | | +--- org.apache.parquet:parquet-column:1.15.0 (*)
| | | +--- org.apache.parquet:parquet-format-structures:1.15.0
| | | +--- org.apache.parquet:parquet-common:1.15.0 (*)
| | | +--- org.apache.parquet:parquet-jackson:1.15.0
| | | \--- io.airlift:aircompressor:2.0.2
| | \--- org.apache.parquet:parquet-common:1.15.0 (*)
| +--- project :iceberg-orc
| | +--- project :iceberg-api (*)
| | +--- project :iceberg-bundled-guava
| | +--- project :iceberg-common (*)
| | +--- project :iceberg-core (*)
| | +--- org.apache.avro:avro:1.12.0 (*)
| | \--- org.apache.orc:orc-core:1.9.5 (*)
| +--- project :iceberg-parquet
| | +--- project :iceberg-api (*)
| | +--- project :iceberg-bundled-guava
| | +--- project :iceberg-core (*)
| | +--- project :iceberg-common (*)
| | \--- org.apache.parquet:parquet-avro:1.15.0 (*)
| +--- project :iceberg-arrow
| | +--- project :iceberg-api (*)
| | +--- project :iceberg-bundled-guava
| | +--- project :iceberg-core (*)
| | +--- project :iceberg-parquet (*)
| | +--- org.apache.arrow:arrow-vector:15.0.2
| | | +--- org.apache.arrow:arrow-format:15.0.2
| | | | \--- com.google.flatbuffers:flatbuffers-java:23.5.26
| | | +--- org.apache.arrow:arrow-memory-core:15.0.2
| | | +--- com.fasterxml.jackson.core:jackson-core:2.16.0 -> 2.15.2 (*)
| | | +--- com.fasterxml.jackson.core:jackson-annotations:2.16.0 -> 2.18.2 (*)
| | | +--- com.fasterxml.jackson.core:jackson-databind:2.16.0 -> 2.15.2 (*)
| | | +--- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.16.0 -> 2.18.2
| | | | +--- com.fasterxml.jackson.core:jackson-annotations:2.18.2 (*)
| | | | +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (*)
| | | | +--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (*)
| | | | \--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
| | | +--- com.google.flatbuffers:flatbuffers-java:23.5.26
| | | \--- org.eclipse.collections:eclipse-collections:11.1.0
| | | \--- org.eclipse.collections:eclipse-collections-api:11.1.0
| | +--- org.apache.arrow:arrow-memory-netty:15.0.2
| | | \--- org.apache.arrow:arrow-memory-core:15.0.2
| | +--- org.apache.parquet:parquet-avro:1.15.0 (*)
| | \--- io.netty:netty-buffer:4.1.117.Final
| | \--- io.netty:netty-common:4.1.117.Final
| +--- org.apache.datasketches:datasketches-java:6.2.0
| | \--- org.apache.datasketches:datasketches-memory:3.0.2
| +--- org.apache.parquet:parquet-column:1.15.0 (*)
| +--- org.apache.parquet:parquet-hadoop:1.15.0 (*)
| +--- org.apache.orc:orc-core:1.9.5 (*)
| +--- org.apache.arrow:arrow-vector:15.0.2 (*)
| \--- com.github.ben-manes.caffeine:caffeine:2.9.3 (*)
+--- project :iceberg-spark:iceberg-spark-extensions-3.5_2.12
| \--- org.roaringbitmap:RoaringBitmap:1.3.0
+--- project :iceberg-aws
| +--- project :iceberg-api (*)
| +--- project :iceberg-bundled-guava
| +--- project :iceberg-common (*)
| +--- project :iceberg-core (*)
| +--- com.github.ben-manes.caffeine:caffeine:2.9.3 (*)
| +--- dev.failsafe:failsafe:3.3.2
| +--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
| +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (*)
| \--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (*)
+--- project :iceberg-azure
| +--- project :iceberg-api (*)
| +--- project :iceberg-bundled-guava
| +--- project :iceberg-common (*)
| \--- project :iceberg-core (*)
+--- project :iceberg-aliyun
| +--- project :iceberg-api (*)
| +--- project :iceberg-bundled-guava
| +--- project :iceberg-core (*)
| \--- project :iceberg-common (*)
+--- project :iceberg-gcp
| +--- project :iceberg-api (*)
| +--- project :iceberg-bundled-guava
| +--- project :iceberg-common (*)
| \--- project :iceberg-core (*)
+--- project :iceberg-hive-metastore
| +--- project :iceberg-api (*)
| +--- project :iceberg-bundled-guava
| +--- project :iceberg-core (*)
| +--- project :iceberg-common (*)
| \--- com.github.ben-manes.caffeine:caffeine:2.9.3 (*)
+--- project :iceberg-nessie
| +--- project :iceberg-api (*)
| +--- project :iceberg-common (*)
| +--- project :iceberg-core (*)
| +--- project :iceberg-bundled-guava
| +--- org.projectnessie.nessie:nessie-client:0.102.2
| | +--- com.fasterxml.jackson.core:jackson-core -> 2.15.2 (*)
| | +--- com.fasterxml.jackson.core:jackson-databind -> 2.15.2 (*)
| | +--- com.fasterxml.jackson.core:jackson-annotations -> 2.18.2 (*)
| | +--- org.eclipse.microprofile.openapi:microprofile-openapi-api:4.0.2
| | \--- org.projectnessie.nessie:nessie-model:0.102.2
| | +--- com.fasterxml.jackson.core:jackson-databind -> 2.15.2 (*)
| | \--- com.fasterxml.jackson.core:jackson-annotations -> 2.18.2 (*)
| +--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
| +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (*)
| \--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (*)
\--- project :iceberg-snowflake
+--- project :iceberg-core (*)
+--- project :iceberg-common (*)
+--- project :iceberg-bundled-guava
+--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
+--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (*)
\--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (*)
The bundled dependencies that can be are also relocated:
// Relocate dependencies to avoid conflicts
relocate 'com.google.errorprone', 'org.apache.iceberg.shaded.com.google.errorprone'
relocate 'com.google.flatbuffers', 'org.apache.iceberg.shaded.com.google.flatbuffers'
relocate 'com.fasterxml', 'org.apache.iceberg.shaded.com.fasterxml'
relocate 'com.github.benmanes', 'org.apache.iceberg.shaded.com.github.benmanes'
relocate 'org.checkerframework', 'org.apache.iceberg.shaded.org.checkerframework'
relocate 'org.apache.avro', 'org.apache.iceberg.shaded.org.apache.avro'
relocate 'avro.shaded', 'org.apache.iceberg.shaded.org.apache.avro.shaded'
relocate 'com.thoughtworks.paranamer', 'org.apache.iceberg.shaded.com.thoughtworks.paranamer'
relocate 'org.apache.parquet', 'org.apache.iceberg.shaded.org.apache.parquet'
relocate 'shaded.parquet', 'org.apache.iceberg.shaded.org.apache.parquet.shaded'
relocate 'org.apache.orc', 'org.apache.iceberg.shaded.org.apache.orc'
relocate 'io.airlift', 'org.apache.iceberg.shaded.io.airlift'
relocate 'org.apache.hc.client5', 'org.apache.iceberg.shaded.org.apache.hc.client5'
relocate 'org.apache.hc.core5', 'org.apache.iceberg.shaded.org.apache.hc.core5'
// relocate Arrow and related deps to shade Iceberg specific version
relocate 'io.netty', 'org.apache.iceberg.shaded.io.netty'
relocate 'org.apache.arrow', 'org.apache.iceberg.shaded.org.apache.arrow'
relocate 'com.carrotsearch', 'org.apache.iceberg.shaded.com.carrotsearch'
relocate 'org.threeten.extra', 'org.apache.iceberg.shaded.org.threeten.extra'
relocate 'org.roaringbitmap', 'org.apache.iceberg.shaded.org.roaringbitmap'
relocate 'org.apache.datasketches', 'org.apache.iceberg.shaded.org.apache.datasketches'
The result is that the Jar should have very few classes outside of the org.apache.iceberg
namespace. When building a LICENSE and NOTICE, we need to use the classes actually in the Jar file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue is it not spark sql:
$ jar tvf iceberg-spark-runtime-3.5_2.12-1.8.0-SNAPSHOT.jar|grep -i spark
...
17709 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/CreateIcebergView.class
2628 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/DropIcebergView$.class
9049 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/DropIcebergView.class
2772 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/ResolvedV2View$.class
8309 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/ResolvedV2View.class
3667 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/ShowIcebergViews$.class
10910 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/ShowIcebergViews.class
0 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/
3823 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AddPartitionFieldExec$.class
13747 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AddPartitionFieldExec.class
3550 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AlterV2ViewSetPropertiesExec$.class
12242 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AlterV2ViewSetPropertiesExec.class
3726 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AlterV2ViewUnsetPropertiesExec$.class
13921 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AlterV2ViewUnsetPropertiesExec.class
3283 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CallExec$.class
9458 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CallExec.class
4164 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec$.class
16977 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec.class
4107 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceTagExec$.class
16254 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceTagExec.class
5761 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec$.class
22816 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.class
3118 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DescribeV2ViewExec$.class
16744 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DescribeV2ViewExec.class
3239 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropBranchExec$.class
11002 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropBranchExec.class
3376 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropIdentifierFieldsExec$.class
13778 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropIdentifierFieldsExec.class
3312 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropPartitionFieldExec$.class
12029 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropPartitionFieldExec.class
3206 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropTagExec$.class
10941 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropTagExec.class
3018 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropV2ViewExec$.class
9067 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropV2ViewExec.class
2214 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy$.class
2696 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy$IcebergCatalogAndIdentifier$.class
36410 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.class
3061 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/RenameV2ViewExec$.class
9166 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/RenameV2ViewExec.class
4245 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ReplacePartitionFieldExec$.class
15925 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ReplacePartitionFieldExec.class
3365 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetIdentifierFieldsExec$.class
11569 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetIdentifierFieldsExec.class
4147 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetWriteDistributionAndOrderingExec$.class
17208 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetWriteDistributionAndOrderingExec.class
2854 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowCreateV2ViewExec$.class
16975 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowCreateV2ViewExec.class
3184 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewPropertiesExec$.class
14011 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewPropertiesExec.class
3412 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewsExec$.class
18726 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewsExec.class
It looks like Spark SQL here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are Iceberg classes to build Spark SQL extensions in Scala. The package name is a trick to access Spark SQL internal interfaces, methods, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, it's these classes: https://github.com/apache/iceberg/tree/main/spark/v3.5/spark/src/jmh/java/org/apache/iceberg/spark
I was surprised by the number of classes.
Thanks, it helps.
@@ -82,42 +245,7 @@ NOTICE file: | |||
|
|||
-------------------------------------------------------------------------------- | |||
|
|||
This binary artifact includes Carrot Search Labs HPPC with the following in its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is no longer present, can you verify there are no classes in the Jar and remove relocations for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ jar tvf iceberg-spark-runtime-3.5_2.12-1.8.0-SNAPSHOT.jar|grep -i carrot
$
I don't see any carrot class in the jar.
Copyright: 2020 The Apache Software Foundation | ||
Home page: https://commons.apache.org/ | ||
License: https://www.apache.org/licenses/LICENSE-2.0 | ||
Group: dev.failsafe Name: failsafe Version: 3.3.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please roll back the reformatting? It makes this much harder to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree.
* casting logic in AssignmentAlignmentSupport | ||
* implementation of SetAccumulator. | ||
|
||
Group: org.apache.spark Name: spark-sql_2.12 Version: 3.5.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This artifact (spark-sql_2.12) is not included in the Jar. This section used to document only what was used from Spark. I don't think it is a good idea to remove the list of what was copied from the ASF project because the result is confusing: now it is far more broad and says that spark-sql_2.12 is included when it was not. In addition, I think this is what resulted in the addition of a Spark section to the NOTICE file, which is not needed because only small sections of contributed code (as were previously documented) are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment: I see spark sql classes in the jar:
$ jar tvf iceberg-spark-runtime-3.5_2.12-1.8.0-SNAPSHOT.jar|grep -i spark
...
4147 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetWriteDistributionAndOrderingExec$.class
17208 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetWriteDistributionAndOrderingExec.class
2854 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowCreateV2ViewExec$.class
16975 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowCreateV2ViewExec.class
3184 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewPropertiesExec$.class
14011 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewPropertiesExec.class
3412 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewsExec$.class
18726 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewsExec.class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, it's the spark-sql classes from Iceberg.
@@ -203,109 +203,78 @@ | |||
|
|||
-------------------------------------------------------------------------------- | |||
|
|||
This binary artifact contains Apache Avro. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please roll back formatting changes so that this can be reviewed more easily. There are a ton of changes here that make maintenance much harder. For example, if a dependency is moved and reformatted here, it takes time to validate that it is or is not still present in order to see if any action needs to be taken for NOTICE content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it's what you asked using |
.
|
||
-------------------------------------------------------------------------------- | ||
|
||
This binary artifact contains fastutil. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking through this to review, I am not very confident that it is correct. There are a lot of unnecessary changes that make it harder to review (like moving and/or reformatting LICENSE entries). Many changes also don't have justification, like why JavaFastPFOR
is no longer included -- was it removed from Parquet?
I think that we need a stable diff in order to see what actually changed in LICENSE. Then we need comments that justify the changes (i.e. "Depedency X was previously included because Y which is no longer the case") and then we can look for the corresponding changes that are needed for NOTICE.
@rdblue ok fair. So let me use another approach:
|
So, let me take a step back on this PR, overriding with the minimum required changes (formatting can come later). |
To facilitate the review, I close this PR, and I will open one PR per module. |
Bundle jar files actually bundle a few ALv2 dependencies. These dependencies are correctly listed in the
LICENSE
file, but theNOTICE
file don't contain relevant portions when the dependency provides aNOTICE
.I think it would be great to have both copyright and modified code from the dependencies in the
NOTICE
.@Fokko @rdblue thoughts ?