Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relevant NOTICE portions from ALv2 bundled dependencies #12095

Closed
wants to merge 1 commit into from

Conversation

jbonofre
Copy link
Member

Bundle jar files actually bundle a few ALv2 dependencies. These dependencies are correctly listed in the LICENSE file, but the NOTICE file don't contain relevant portions when the dependency provides a NOTICE.
I think it would be great to have both copyright and modified code from the dependencies in the NOTICE.

@Fokko @rdblue thoughts ?

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jbonofre for cleaning this up! 🙌 I left some comments

aws-bundle/NOTICE Outdated Show resolved Hide resolved
aws-bundle/NOTICE Outdated Show resolved Hide resolved
aws-bundle/NOTICE Outdated Show resolved Hide resolved
aws-bundle/NOTICE Outdated Show resolved Hide resolved
azure-bundle/NOTICE Outdated Show resolved Hide resolved
gcp-bundle/NOTICE Outdated Show resolved Hide resolved
gcp-bundle/NOTICE Outdated Show resolved Hide resolved
gcp-bundle/NOTICE Outdated Show resolved Hide resolved
gcp-bundle/NOTICE Outdated Show resolved Hide resolved
gcp-bundle/NOTICE Outdated Show resolved Hide resolved
gcp-bundle/NOTICE Show resolved Hide resolved
gcp-bundle/NOTICE Show resolved Hide resolved
aws-bundle/NOTICE Outdated Show resolved Hide resolved
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jbonofre , I had to go through https://infra.apache.org/licensing-howto.htm a few times but these changes seem right to me now. I'll wait for @rdblue @Fokko input , especially since we need to make sure this is correct for the release.

@jbonofre jbonofre force-pushed the notice-fix branch 3 times, most recently from fc5395d to b88d315 Compare January 29, 2025 16:11
@github-actions github-actions bot added the flink label Jan 29, 2025
@jbonofre
Copy link
Member Author

@amogh-jahagirdar @rdblue @Fokko I fixed the versions in the bundle jar files. I added fixed on flink-runtime. I'm checking/updating LICENSE/NOTICE in spark-runtime now.

Comment on lines +61 to +107
This binary artifact includes Apache Parquet 1.15.0 with the following in its NOTICE file:

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do the same update for the older Flink versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to do a pass on latest versions (for Flink and Spark), and if we are all good with the changes, I will check deps versions in previous Flink/Spark versions.

Does it work for you ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I just wanted to make sure we didn't forget

@jbonofre
Copy link
Member Author

FYI, I checked/fixed/updates spark-runtime LICENSE and NOTICE.

flink/v1.20/flink-runtime/NOTICE Outdated Show resolved Hide resolved
Comment on lines +61 to +107
This binary artifact includes Apache Parquet 1.15.0 with the following in its NOTICE file:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I just wanted to make sure we didn't forget

flink/v1.20/flink-runtime/NOTICE Outdated Show resolved Hide resolved
flink/v1.20/flink-runtime/NOTICE Outdated Show resolved Hide resolved
flink/v1.20/flink-runtime/LICENSE Outdated Show resolved Hide resolved
Comment on lines +418 to +420
Group: com.google.flatbuffers Name: flatbuffers-java Version: 23.5.26
Copyright: 2013-2020 Google Inc.
Home page: https://google.github.io/flatbuffers/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually bundled in the release? I know our LICENSE mentioned it prior to this change but it looks like this is part of the test runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's in the jar (I checked the deps in the jar directly).

Copyright: 2014-2020 The Apache Software Foundation.
Home page: https://parquet.apache.org/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫠

azure-bundle/NOTICE Outdated Show resolved Hide resolved
Comment on lines +31 to +79
| src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java
| contains test data from http://aspell.net/test/orig/batch0.tab.
| Copyright (C) 2002 Kevin Atkinson ([email protected])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ship this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the NOTICE from commons-codec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't we include stuff in the notice when is it strickly required? This is pointing at a class in test/org/apache/commons/

Comment on lines +111 to +161
| Apache Avro
| Copyright 2010-2015 The Apache Software Foundation
|
| This product includes software developed at
| The Apache Software Foundation (http://www.apache.org/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also mention Avro above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the NOTICE from Parquet.

| | This product includes software developed at
| | The Apache Software Foundation (https://www.apache.org/).
|
| csharp reflect serializers were contributed by Pitney Bowes Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is C# related

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied the NOTICE as it is

|
| --------------------------------------------------------------------------------
|
| This product includes parquet-tools, initially developed at ARRIS, Inc. with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parquet-tools hasn't been released since 1.11.2: https://mvnrepository.com/artifact/org.apache.parquet/parquet-tools

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix this upstream as well: apache/parquet-java#3140

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything starting with | is the content of the dependency NOTICE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think for now we should just copy the content of the NOTICE as is for the particular version of the dependency we are bundling. If there's some issue in the upstream project's notice, we can fix that there and update our NOTICE after the dependency is released?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that, just pointing out that we don't strictly follow the rules of how to compose a NOTICE. For the bundled products, I think it is less error-prone to have scripts copy the whole NOTICE, instead of having to curate the NOTICES ourselves. Previously we did everything by hand

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great improvement! Thanks @jbonofre 🙌

| * Eclipse Distribution License, Version 1.0
| * Eclipse Public License, Version 1.0
| * Eclipse Public License, Version 2.0
| * GNU General Public License, Version 2 with the GNU Classpath Exception
Copy link
Contributor

@rdblue rdblue Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a release blocker. If there is GPL code in Nessie, it must be removed as a dependency and we can no longer ship Nessie in our runtime bundles. @jbonofre, @amogh-jahagirdar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look. Does the fact that it's a "classpath exception" matter or not? https://softwareengineering.stackexchange.com/questions/119436/what-does-gpl-with-classpath-exception-mean-in-practice

From my reading of this, the classpath exception has the effect of not having the typical incompatibility between Apache and GPL licenses but I could be oversimplifying/misunderstanding that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked in Nessie and according to https://github.com/projectnessie/nessie/releases/download/nessie-0.102.2/nessie-aggregated-license-report-0.102.2.zip
It's dual license CDDL + GPL or EPL + GPL.

Copy link
Member

@ajantha-bhat ajantha-bhat Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.apache.org/legal/resolved.html#category-x

"GNU Classpath Exception" can be included in ASF project (under special exception)

Places restrictions on larger works:
GNU GPL 1, 2, 3
Special exceptions to the GNU GPL (e.g. GNU Classpath) unless otherwise permitted elsewhere on this page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no: Classpath Extension allows you to add GPL dependency in your classpath (that's the name). But you can not bundle GPL dependency in ASF project.

@@ -244,8 +372,63 @@ This binary artifact includes Apache Arrow with the following in its NOTICE file

--------------------------------------------------------------------------------

This binary artifact includes Netty buffers with the following in its NOTICE
file:
This binary artifact includes Apache Spark 3.5.4 with the following in its NOTICE file:
Copy link
Contributor

@rdblue rdblue Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbonofre, Spark is not included in the runtime bundle -- I just double checked the Jar. It is provided by the Spark installation where the runtime bundle is used. I don't think this is accurate.

For the Spark runtime bundle, we shade the runtimeClasspath dependency configuration and excluded dependencies that are coming from the Spark installation. Dependencies are excluded from the implementation configuration:

  configurations {
    implementation {
      exclude group: 'org.apache.spark'
      // included in Spark
      exclude group: 'org.slf4j'
      exclude group: 'org.apache.commons'
      exclude group: 'commons-pool'
      exclude group: 'commons-codec'
      exclude group: 'org.xerial.snappy'
      exclude group: 'javax.xml.bind'
      exclude group: 'javax.annotation'
      exclude group: 'com.github.luben'
      exclude group: 'com.ibm.icu'
      exclude group: 'org.glassfish'
      exclude group: 'org.abego.treelayout'
      exclude group: 'org.antlr'
      exclude group: 'org.scala-lang'
      exclude group: 'org.scala-lang.modules'
    }
  }

Here's the expected set of dependencies from the runtimeClasspath configuration:

runtimeClasspath - Runtime classpath of source set 'main'.
+--- project :iceberg-api
|    \--- project :iceberg-bundled-guava
+--- project :iceberg-spark:iceberg-spark-3.5_2.12
|    +--- project :iceberg-api (*)
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-common
|    |    \--- project :iceberg-bundled-guava
|    +--- project :iceberg-core
|    |    +--- project :iceberg-api (*)
|    |    +--- org.apache.avro:avro:1.12.0
|    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.17.2 -> 2.15.2
|    |    |    |    \--- com.fasterxml.jackson:jackson-bom:2.15.2 -> 2.18.2
|    |    |    |         +--- com.fasterxml.jackson.core:jackson-annotations:2.18.2 (c)
|    |    |    |         +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (c)
|    |    |    |         +--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (c)
|    |    |    |         \--- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.18.2 (c)
|    |    |    \--- com.fasterxml.jackson.core:jackson-databind:2.17.2 -> 2.15.2
|    |    |         +--- com.fasterxml.jackson.core:jackson-annotations:2.15.2 -> 2.18.2
|    |    |         |    \--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
|    |    |         +--- com.fasterxml.jackson.core:jackson-core:2.15.2 (*)
|    |    |         \--- com.fasterxml.jackson:jackson-bom:2.15.2 -> 2.18.2 (*)
|    |    +--- project :iceberg-common (*)
|    |    +--- project :iceberg-bundled-guava
|    |    +--- io.airlift:aircompressor:0.27 -> 2.0.2
|    |    +--- org.apache.httpcomponents.client5:httpclient5:5.4.1
|    |    |    +--- org.apache.httpcomponents.core5:httpcore5:5.3.1
|    |    |    \--- org.apache.httpcomponents.core5:httpcore5-h2:5.3.1
|    |    |         \--- org.apache.httpcomponents.core5:httpcore5:5.3.1
|    |    +--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
|    |    +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (*)
|    |    +--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (*)
|    |    +--- com.github.ben-manes.caffeine:caffeine:2.9.3
|    |    |    +--- org.checkerframework:checker-qual:3.19.0
|    |    |    \--- com.google.errorprone:error_prone_annotations:2.10.0
|    |    \--- org.roaringbitmap:RoaringBitmap:1.3.0
|    +--- project :iceberg-data
|    |    +--- project :iceberg-api (*)
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core (*)
|    |    +--- org.apache.orc:orc-core:1.9.5
|    |    |    +--- org.apache.orc:orc-shims:1.9.5
|    |    |    +--- io.airlift:aircompressor:0.27 -> 2.0.2
|    |    |    +--- org.jetbrains:annotations:17.0.0
|    |    |    \--- org.threeten:threeten-extra:1.7.1
|    |    \--- org.apache.parquet:parquet-avro:1.15.0
|    |         +--- org.apache.parquet:parquet-column:1.15.0
|    |         |    +--- org.apache.parquet:parquet-common:1.15.0
|    |         |    |    \--- org.apache.parquet:parquet-format-structures:1.15.0
|    |         |    \--- org.apache.parquet:parquet-encoding:1.15.0
|    |         |         \--- org.apache.parquet:parquet-common:1.15.0 (*)
|    |         +--- org.apache.parquet:parquet-hadoop:1.15.0
|    |         |    +--- org.apache.parquet:parquet-column:1.15.0 (*)
|    |         |    +--- org.apache.parquet:parquet-format-structures:1.15.0
|    |         |    +--- org.apache.parquet:parquet-common:1.15.0 (*)
|    |         |    +--- org.apache.parquet:parquet-jackson:1.15.0
|    |         |    \--- io.airlift:aircompressor:2.0.2
|    |         \--- org.apache.parquet:parquet-common:1.15.0 (*)
|    +--- project :iceberg-orc
|    |    +--- project :iceberg-api (*)
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-common (*)
|    |    +--- project :iceberg-core (*)
|    |    +--- org.apache.avro:avro:1.12.0 (*)
|    |    \--- org.apache.orc:orc-core:1.9.5 (*)
|    +--- project :iceberg-parquet
|    |    +--- project :iceberg-api (*)
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core (*)
|    |    +--- project :iceberg-common (*)
|    |    \--- org.apache.parquet:parquet-avro:1.15.0 (*)
|    +--- project :iceberg-arrow
|    |    +--- project :iceberg-api (*)
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core (*)
|    |    +--- project :iceberg-parquet (*)
|    |    +--- org.apache.arrow:arrow-vector:15.0.2
|    |    |    +--- org.apache.arrow:arrow-format:15.0.2
|    |    |    |    \--- com.google.flatbuffers:flatbuffers-java:23.5.26
|    |    |    +--- org.apache.arrow:arrow-memory-core:15.0.2
|    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.16.0 -> 2.15.2 (*)
|    |    |    +--- com.fasterxml.jackson.core:jackson-annotations:2.16.0 -> 2.18.2 (*)
|    |    |    +--- com.fasterxml.jackson.core:jackson-databind:2.16.0 -> 2.15.2 (*)
|    |    |    +--- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.16.0 -> 2.18.2
|    |    |    |    +--- com.fasterxml.jackson.core:jackson-annotations:2.18.2 (*)
|    |    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (*)
|    |    |    |    +--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (*)
|    |    |    |    \--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
|    |    |    +--- com.google.flatbuffers:flatbuffers-java:23.5.26
|    |    |    \--- org.eclipse.collections:eclipse-collections:11.1.0
|    |    |         \--- org.eclipse.collections:eclipse-collections-api:11.1.0
|    |    +--- org.apache.arrow:arrow-memory-netty:15.0.2
|    |    |    \--- org.apache.arrow:arrow-memory-core:15.0.2
|    |    +--- org.apache.parquet:parquet-avro:1.15.0 (*)
|    |    \--- io.netty:netty-buffer:4.1.117.Final
|    |         \--- io.netty:netty-common:4.1.117.Final
|    +--- org.apache.datasketches:datasketches-java:6.2.0
|    |    \--- org.apache.datasketches:datasketches-memory:3.0.2
|    +--- org.apache.parquet:parquet-column:1.15.0 (*)
|    +--- org.apache.parquet:parquet-hadoop:1.15.0 (*)
|    +--- org.apache.orc:orc-core:1.9.5 (*)
|    +--- org.apache.arrow:arrow-vector:15.0.2 (*)
|    \--- com.github.ben-manes.caffeine:caffeine:2.9.3 (*)
+--- project :iceberg-spark:iceberg-spark-extensions-3.5_2.12
|    \--- org.roaringbitmap:RoaringBitmap:1.3.0
+--- project :iceberg-aws
|    +--- project :iceberg-api (*)
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-common (*)
|    +--- project :iceberg-core (*)
|    +--- com.github.ben-manes.caffeine:caffeine:2.9.3 (*)
|    +--- dev.failsafe:failsafe:3.3.2
|    +--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
|    +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (*)
|    \--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (*)
+--- project :iceberg-azure
|    +--- project :iceberg-api (*)
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-common (*)
|    \--- project :iceberg-core (*)
+--- project :iceberg-aliyun
|    +--- project :iceberg-api (*)
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-core (*)
|    \--- project :iceberg-common (*)
+--- project :iceberg-gcp
|    +--- project :iceberg-api (*)
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-common (*)
|    \--- project :iceberg-core (*)
+--- project :iceberg-hive-metastore
|    +--- project :iceberg-api (*)
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-core (*)
|    +--- project :iceberg-common (*)
|    \--- com.github.ben-manes.caffeine:caffeine:2.9.3 (*)
+--- project :iceberg-nessie
|    +--- project :iceberg-api (*)
|    +--- project :iceberg-common (*)
|    +--- project :iceberg-core (*)
|    +--- project :iceberg-bundled-guava
|    +--- org.projectnessie.nessie:nessie-client:0.102.2
|    |    +--- com.fasterxml.jackson.core:jackson-core -> 2.15.2 (*)
|    |    +--- com.fasterxml.jackson.core:jackson-databind -> 2.15.2 (*)
|    |    +--- com.fasterxml.jackson.core:jackson-annotations -> 2.18.2 (*)
|    |    +--- org.eclipse.microprofile.openapi:microprofile-openapi-api:4.0.2
|    |    \--- org.projectnessie.nessie:nessie-model:0.102.2
|    |         +--- com.fasterxml.jackson.core:jackson-databind -> 2.15.2 (*)
|    |         \--- com.fasterxml.jackson.core:jackson-annotations -> 2.18.2 (*)
|    +--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
|    +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (*)
|    \--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (*)
\--- project :iceberg-snowflake
     +--- project :iceberg-core (*)
     +--- project :iceberg-common (*)
     +--- project :iceberg-bundled-guava
     +--- com.fasterxml.jackson:jackson-bom:2.18.2 (*)
     +--- com.fasterxml.jackson.core:jackson-core:2.18.2 -> 2.15.2 (*)
     \--- com.fasterxml.jackson.core:jackson-databind:2.18.2 -> 2.15.2 (*)

The bundled dependencies that can be are also relocated:

    // Relocate dependencies to avoid conflicts
    relocate 'com.google.errorprone', 'org.apache.iceberg.shaded.com.google.errorprone'
    relocate 'com.google.flatbuffers', 'org.apache.iceberg.shaded.com.google.flatbuffers'
    relocate 'com.fasterxml', 'org.apache.iceberg.shaded.com.fasterxml'
    relocate 'com.github.benmanes', 'org.apache.iceberg.shaded.com.github.benmanes'
    relocate 'org.checkerframework', 'org.apache.iceberg.shaded.org.checkerframework'
    relocate 'org.apache.avro', 'org.apache.iceberg.shaded.org.apache.avro'
    relocate 'avro.shaded', 'org.apache.iceberg.shaded.org.apache.avro.shaded'
    relocate 'com.thoughtworks.paranamer', 'org.apache.iceberg.shaded.com.thoughtworks.paranamer'
    relocate 'org.apache.parquet', 'org.apache.iceberg.shaded.org.apache.parquet'
    relocate 'shaded.parquet', 'org.apache.iceberg.shaded.org.apache.parquet.shaded'
    relocate 'org.apache.orc', 'org.apache.iceberg.shaded.org.apache.orc'
    relocate 'io.airlift', 'org.apache.iceberg.shaded.io.airlift'
    relocate 'org.apache.hc.client5', 'org.apache.iceberg.shaded.org.apache.hc.client5'
    relocate 'org.apache.hc.core5', 'org.apache.iceberg.shaded.org.apache.hc.core5'
    // relocate Arrow and related deps to shade Iceberg specific version
    relocate 'io.netty', 'org.apache.iceberg.shaded.io.netty'
    relocate 'org.apache.arrow', 'org.apache.iceberg.shaded.org.apache.arrow'
    relocate 'com.carrotsearch', 'org.apache.iceberg.shaded.com.carrotsearch'
    relocate 'org.threeten.extra', 'org.apache.iceberg.shaded.org.threeten.extra'
    relocate 'org.roaringbitmap', 'org.apache.iceberg.shaded.org.roaringbitmap'
    relocate 'org.apache.datasketches', 'org.apache.iceberg.shaded.org.apache.datasketches'

The result is that the Jar should have very few classes outside of the org.apache.iceberg namespace. When building a LICENSE and NOTICE, we need to use the classes actually in the Jar file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue is it not spark sql:

$ jar tvf iceberg-spark-runtime-3.5_2.12-1.8.0-SNAPSHOT.jar|grep -i spark
...
 17709 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/CreateIcebergView.class
  2628 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/DropIcebergView$.class
  9049 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/DropIcebergView.class
  2772 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/ResolvedV2View$.class
  8309 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/ResolvedV2View.class
  3667 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/ShowIcebergViews$.class
 10910 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/catalyst/plans/logical/views/ShowIcebergViews.class
     0 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/
  3823 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AddPartitionFieldExec$.class
 13747 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AddPartitionFieldExec.class
  3550 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AlterV2ViewSetPropertiesExec$.class
 12242 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AlterV2ViewSetPropertiesExec.class
  3726 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AlterV2ViewUnsetPropertiesExec$.class
 13921 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/AlterV2ViewUnsetPropertiesExec.class
  3283 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CallExec$.class
  9458 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CallExec.class
  4164 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec$.class
 16977 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec.class
  4107 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceTagExec$.class
 16254 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceTagExec.class
  5761 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec$.class
 22816 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.class
  3118 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DescribeV2ViewExec$.class
 16744 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DescribeV2ViewExec.class
  3239 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropBranchExec$.class
 11002 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropBranchExec.class
  3376 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropIdentifierFieldsExec$.class
 13778 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropIdentifierFieldsExec.class
  3312 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropPartitionFieldExec$.class
 12029 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropPartitionFieldExec.class
  3206 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropTagExec$.class
 10941 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropTagExec.class
  3018 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropV2ViewExec$.class
  9067 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/DropV2ViewExec.class
  2214 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy$.class
  2696 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy$IcebergCatalogAndIdentifier$.class
 36410 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.class
  3061 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/RenameV2ViewExec$.class
  9166 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/RenameV2ViewExec.class
  4245 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ReplacePartitionFieldExec$.class
 15925 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ReplacePartitionFieldExec.class
  3365 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetIdentifierFieldsExec$.class
 11569 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetIdentifierFieldsExec.class
  4147 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetWriteDistributionAndOrderingExec$.class
 17208 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetWriteDistributionAndOrderingExec.class
  2854 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowCreateV2ViewExec$.class
 16975 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowCreateV2ViewExec.class
  3184 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewPropertiesExec$.class
 14011 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewPropertiesExec.class
  3412 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewsExec$.class
 18726 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewsExec.class

It looks like Spark SQL here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are Iceberg classes to build Spark SQL extensions in Scala. The package name is a trick to access Spark SQL internal interfaces, methods, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it's these classes: https://github.com/apache/iceberg/tree/main/spark/v3.5/spark/src/jmh/java/org/apache/iceberg/spark

I was surprised by the number of classes.

Thanks, it helps.

@@ -82,42 +245,7 @@ NOTICE file:

--------------------------------------------------------------------------------

This binary artifact includes Carrot Search Labs HPPC with the following in its
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is no longer present, can you verify there are no classes in the Jar and remove relocations for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ jar tvf iceberg-spark-runtime-3.5_2.12-1.8.0-SNAPSHOT.jar|grep -i carrot
$

I don't see any carrot class in the jar.

Copyright: 2020 The Apache Software Foundation
Home page: https://commons.apache.org/
License: https://www.apache.org/licenses/LICENSE-2.0
Group: dev.failsafe Name: failsafe Version: 3.3.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please roll back the reformatting? It makes this much harder to review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree.

* casting logic in AssignmentAlignmentSupport
* implementation of SetAccumulator.

Group: org.apache.spark Name: spark-sql_2.12 Version: 3.5.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This artifact (spark-sql_2.12) is not included in the Jar. This section used to document only what was used from Spark. I don't think it is a good idea to remove the list of what was copied from the ASF project because the result is confusing: now it is far more broad and says that spark-sql_2.12 is included when it was not. In addition, I think this is what resulted in the addition of a Spark section to the NOTICE file, which is not needed because only small sections of contributed code (as were previously documented) are used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment: I see spark sql classes in the jar:

$ jar tvf iceberg-spark-runtime-3.5_2.12-1.8.0-SNAPSHOT.jar|grep -i spark
...
  4147 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetWriteDistributionAndOrderingExec$.class
 17208 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/SetWriteDistributionAndOrderingExec.class
  2854 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowCreateV2ViewExec$.class
 16975 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowCreateV2ViewExec.class
  3184 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewPropertiesExec$.class
 14011 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewPropertiesExec.class
  3412 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewsExec$.class
 18726 Fri Feb 01 00:00:00 CET 1980 org/apache/spark/sql/execution/datasources/v2/ShowV2ViewsExec.class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, it's the spark-sql classes from Iceberg.

@@ -203,109 +203,78 @@

--------------------------------------------------------------------------------

This binary artifact contains Apache Avro.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please roll back formatting changes so that this can be reviewed more easily. There are a ton of changes here that make maintenance much harder. For example, if a dependency is moved and reformatted here, it takes time to validate that it is or is not still present in order to see if any action needs to be taken for NOTICE content.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's what you asked using |.


--------------------------------------------------------------------------------

This binary artifact contains fastutil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change necessary?

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking through this to review, I am not very confident that it is correct. There are a lot of unnecessary changes that make it harder to review (like moving and/or reformatting LICENSE entries). Many changes also don't have justification, like why JavaFastPFOR is no longer included -- was it removed from Parquet?

I think that we need a stable diff in order to see what actually changed in LICENSE. Then we need comments that justify the changes (i.e. "Depedency X was previously included because Y which is no longer the case") and then we can look for the corresponding changes that are needed for NOTICE.

@jbonofre
Copy link
Member Author

@rdblue ok fair. So let me use another approach:

  1. Let me create a summary of my findings and share with you.
  2. I agree the reformating is not required (I thought it's what you asked by using |).

@jbonofre
Copy link
Member Author

So, let me take a step back on this PR, overriding with the minimum required changes (formatting can come later).

@jbonofre
Copy link
Member Author

To facilitate the review, I close this PR, and I will open one PR per module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants