Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/tidy #4

Merged
merged 8 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions .github/workflows/dotnet-desktop.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
name: Spark Dotnet

on:
push:
branches: [ "feature/*" ]
pull_request:
branches: [ "main" ]

Expand Down Expand Up @@ -61,11 +59,11 @@ jobs:
- run: $SPARK_HOME/sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.5.1

- name: Install dependencies
working-directory: ./src/Spark.Connect.Dotnet/
working-directory: ./src
run: dotnet restore

- name: Build the project
working-directory: ./src/Spark.Connect.Dotnet/
working-directory: ./src
run: dotnet build --configuration Release --no-restore


Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/dotnet-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ jobs:
- run: $SPARK_HOME/sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.5.1

- name: Install dependencies
working-directory: ./src/Spark.Connect.Dotnet/
working-directory: ./src/
run: dotnet restore

- name: Build the project
working-directory: ./src/Spark.Connect.Dotnet/
working-directory: ./src/
run: dotnet build --configuration Release --no-restore


Expand Down
7 changes: 7 additions & 0 deletions docs/adding-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,10 @@

The `spark.sql.functions.*` aren't completly implemented yet so if you would like to use a function that is not available then you can follow this process to implement the function (contributions are welcome if you feel like creating a pr).

1. Find the PySpark version of the function you want to implement
1. Decide on the types of parameters you want to accept, if you see the Python definition is ColumnOrName then you might want to consider using an overload to support both and doing `Col(col)` for the string version to map it to a column.
1. I have written a generator that created most of the functions so thee functions are split across the [Generated Functions](../src/Spark.Connect.Dotnet/Spark.Connect.Dotnet/Sql/Functions.cs) partial class and the [Manually implemented Functions](../src/Spark.Connect.Dotnet/Spark.Connect.Dotnet/Sql/ManualFunctions.cs) partial class. Add your new function to the `ManualFunctions` file.
1. If you find another function that has a similar signature then it should be fairly easy to replicate
1. For the test I ensure that the function is called and a `DataFrame.Show()` is called which will make sure that our logic is not causing an error, I will look at the output to make sure it looks correct and we can also do a collect or count to verify the new function.
1. Test your function and when you are happy raise a PR (or don't, whatever!)
1. I am not completely sure what we should put for the docs, it is a bit random at the moment. Probably at a minimum we should copy what Python function docs say but I am not sure tbh.
3 changes: 3 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Contributing

Contributions welcome, to docs, functions, core classes. If you are going to do a large thing feel free to open an issue to discuss it first.
2 changes: 1 addition & 1 deletion docs/versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

The nuget packages will follow this schema:

SparkVersion-OurReleaseVersion for example "3.5.1-1" is the first release of this and it supports up to Spark 3.5.1 (backwards compatible to 3.4.0).
SparkVersion-OurReleaseVersion for example "3.5.1-build.1" is the first release of this and it supports up to Spark 3.5.1 (backwards compatible to 3.4.0).
39 changes: 39 additions & 0 deletions src/Spark.Connect.Dotnet.sln
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@

Microsoft Visual Studio Solution File, Format Version 12.00
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Spark.Connect.Dotnet", "Spark.Connect.Dotnet\Spark.Connect.Dotnet\Spark.Connect.Dotnet.csproj", "{8F30FBDE-3659-4703-86C6-2C324FC097D0}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Test", "Test", "{3F60E0E6-5620-4095-85C0-574C780816BF}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Source", "Source", "{B8FF5C3A-A7DD-4B36-AF8A-ECA059A691BE}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Xamples", "Xamples", "{EB76790E-E1F6-48FA-972A-AC1848A549E9}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Spark.Connect.Dotnet.Tests", "test\Spark.Connect.Dotnet.Tests\Spark.Connect.Dotnet.Tests.csproj", "{3757182D-374B-40B1-B920-88F8FC48D988}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "databricks_example", "example\databricks_example\databricks_example.csproj", "{C6F3754C-A5C3-4CB5-B91D-CC2DB2D5FB32}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{8F30FBDE-3659-4703-86C6-2C324FC097D0}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{8F30FBDE-3659-4703-86C6-2C324FC097D0}.Debug|Any CPU.Build.0 = Debug|Any CPU
{8F30FBDE-3659-4703-86C6-2C324FC097D0}.Release|Any CPU.ActiveCfg = Release|Any CPU
{8F30FBDE-3659-4703-86C6-2C324FC097D0}.Release|Any CPU.Build.0 = Release|Any CPU
{3757182D-374B-40B1-B920-88F8FC48D988}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{3757182D-374B-40B1-B920-88F8FC48D988}.Debug|Any CPU.Build.0 = Debug|Any CPU
{3757182D-374B-40B1-B920-88F8FC48D988}.Release|Any CPU.ActiveCfg = Release|Any CPU
{3757182D-374B-40B1-B920-88F8FC48D988}.Release|Any CPU.Build.0 = Release|Any CPU
{C6F3754C-A5C3-4CB5-B91D-CC2DB2D5FB32}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{C6F3754C-A5C3-4CB5-B91D-CC2DB2D5FB32}.Debug|Any CPU.Build.0 = Debug|Any CPU
{C6F3754C-A5C3-4CB5-B91D-CC2DB2D5FB32}.Release|Any CPU.ActiveCfg = Release|Any CPU
{C6F3754C-A5C3-4CB5-B91D-CC2DB2D5FB32}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(NestedProjects) = preSolution
{8F30FBDE-3659-4703-86C6-2C324FC097D0} = {B8FF5C3A-A7DD-4B36-AF8A-ECA059A691BE}
{3757182D-374B-40B1-B920-88F8FC48D988} = {3F60E0E6-5620-4095-85C0-574C780816BF}
{C6F3754C-A5C3-4CB5-B91D-CC2DB2D5FB32} = {EB76790E-E1F6-48FA-972A-AC1848A549E9}
EndGlobalSection
EndGlobal
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Copyright 2024 GOEddie (Ed Elliott [email protected])

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Spark Connect Dotnet

Wrapper around the Spark Connect proto files/gRPC client.

See [Repo README](https://github.com/GoEddie/spark-connect-dotnet/blob/main/README.md)
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,23 @@
<Protobuf Include="spark\connect\types.proto"/>
</ItemGroup>

<PropertyGroup>
<PackageId>GOEddie.Spark.Dotnet.GrpcClient</PackageId>
<Version>3.5.1-build.5</Version>
<Authors>GOEddie (Ed Elliott)</Authors>
<Product>GOEddie Spark Dotnet</Product>
<Description>Wrap Apache Spark Connect proto files in Autogenerated .NET classes</Description>
<PackageLicenseFile>LICENSE.txt</PackageLicenseFile>
<PackageReadmeFile>README.md</PackageReadmeFile>
<RepositoryType>git</RepositoryType>
<RepositoryBranch>main</RepositoryBranch>
<RepositoryUrl>https://github.com/GoEddie/spark-connect-dotnet</RepositoryUrl>
<PackageIcon>logo.png</PackageIcon>
</PropertyGroup>

<ItemGroup>
<None Include="LICENSE.txt" Pack="true" PackagePath=""/>
<None Include="README.md" Pack="true" PackagePath=""/>
<None Include="logo.png" Pack="true" PackagePath="" />
</ItemGroup>
</Project>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 0 additions & 63 deletions src/Spark.Connect.Dotnet/Spark.Connect.Dotnet.sln

This file was deleted.

2 changes: 1 addition & 1 deletion src/Spark.Connect.Dotnet/Spark.Connect.Dotnet/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright 2024 GOEddie (Ed Elliott)
Copyright 2024 GOEddie (Ed Elliott [email protected])

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

<ItemGroup>
<PackageReference Include="Apache.Arrow" Version="16.0.0" />
<PackageReference Include="Google.Protobuf" Version="3.24.4"/>
<PackageReference Include="Google.Protobuf" Version="3.24.4" />
<PackageReference Include="Grpc.Core.Api" Version="2.62.0" />
<PackageReference Include="Grpc.Net.Client" Version="2.62.0" />
<PackageReference Include="Grpc.Tools" Version="2.62.0">
Expand Down Expand Up @@ -43,13 +43,13 @@
</ItemGroup>

<ItemGroup>
<None Remove="spark\connect\catalog.proto"/>
<None Remove="spark\connect\commands.proto"/>
<None Remove="spark\connect\common.proto"/>
<None Remove="spark\connect\example_plugins.proto"/>
<None Remove="spark\connect\expressions.proto"/>
<None Remove="spark\connect\relations.proto"/>
<None Remove="spark\connect\types.proto"/>
<None Remove="spark\connect\catalog.proto" />
<None Remove="spark\connect\commands.proto" />
<None Remove="spark\connect\common.proto" />
<None Remove="spark\connect\example_plugins.proto" />
<None Remove="spark\connect\expressions.proto" />
<None Remove="spark\connect\relations.proto" />
<None Remove="spark\connect\types.proto" />
<None Update="LICENSE.txt">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
Expand All @@ -58,26 +58,29 @@
</None>
</ItemGroup>

<ItemGroup>
<ProjectReference Include="..\Spark.Connect.Dotnet.GrpcClient\Spark.Connect.Dotnet.GrpcClient.csproj"/>
</ItemGroup>


<PropertyGroup>
<PackageId>GOEddie.Spark.Dotnet</PackageId>
<Version>3.5.1-1</Version>
<Version>3.5.1-build.5</Version>
<Authors>GOEddie (Ed Elliott)</Authors>
<Company>OSS</Company>
<Product>GOEddie Spark Dotnet</Product>
<Description>DataFrame API over the Apache Spark gRPC API</Description>
<Description>DataFrame API over the Apache Spark Connect gRPC API</Description>
<PackageLicenseFile>LICENSE.txt</PackageLicenseFile>
<PackageReadmeFile>README.md</PackageReadmeFile>
<RepositoryType>git</RepositoryType>
<RepositoryBranch>main</RepositoryBranch>
<RepositoryUrl>https://github.com/GoEddie/spark-connect-dotnet</RepositoryUrl>
<PackageIcon>logo.png</PackageIcon>
</PropertyGroup>

<ItemGroup>
<None Include="LICENSE.txt" Pack="true" PackagePath=""/>
<None Include="README.md" Pack="true" PackagePath=""/>
<None Include="LICENSE.txt" Pack="true" PackagePath="" />
<None Include="README.md" Pack="true" PackagePath="" />
<None Include="logo.png" Pack="true" PackagePath="" />
</ItemGroup>

<ItemGroup>
<ProjectReference Include="..\Spark.Connect.Dotnet.GrpcClient\Spark.Connect.Dotnet.GrpcClient.csproj" />
</ItemGroup>

</Project>
Loading
Loading