Tensor contractions #105

wardvermeulen · 2023-06-07T23:01:37Z

This PR adds Tensor Contraction functionality to GemmKernels.jl using the GEMM-like Tensor Tensor (GETT) multiplication algorithm. The API mimics the cuTENSOR API. It is still a draft, the benchmark scripts need to be refined further. Because it is a draft, I disabled the other tests temporarily. I also had to revert the CUDA runtime to v11.8 for cuTENSOR to work.

The following function of cuTENSOR is implemented:

contraction!

This works both with WMMAOp as with FPUOp of #101. As far as I can tell, cuTENSOR does not support different operations than multiplication and addition for its contraction. Because of the FPU operator, that is a possibility here.

The following functions are not implemented:

permutation!
reduction!
elementwiseTrinary!
elementwiseBinary!

I think it could be interesting future work to create a permutation (i.e. transposition) kernel and reduction kernel through the reuse of GemmKernels.jl building blocks. The same goes for the elementwise functions. You could probably do something extremely similar to GETT, but the kernel would need changing since contractions are not allowed.

The contraction! functionality is tested against the TCCG benchmark suite, using cuTENSOR to verify the results. Benchmarks for the Tesla P100, GeForce RTX 2080 Ti and Tesla V100 will be added later.

thomasfaingnaert · 2023-06-11T09:24:11Z

Can you rebase against master?

maleadt · 2023-06-13T17:53:51Z

See JuliaGPU/CUDA.jl#1960, both for CI and for any impact it may have on your work.

maleadt · 2023-06-15T05:54:44Z

If you want to use a Manifest, it'll have to be one generated by the oldest version of Julia you want to test, i.e., 1.6 (should be easy enough using juliaup). And for that Manifest to work with 1.9, you need to call Pkg.resolve: https://github.com/JuliaGPU/CUDA.jl/blob/3321fc8aa53fee2ac39783ad1119af665545750b/.buildkite/pipeline.yml#L22-L24

maleadt · 2023-06-28T07:55:21Z

src/GettContractions.jl

@@ -0,0 +1,56 @@
+export GETT


What is this file for? It's included nowhere.

maleadt · 2023-06-28T08:07:06Z

src/tensors/descriptor.jl

+    extent::Vector{Int}
+    stride::Vector{Int}
+    dataType::DataType
+    unaryOp


Why is this part of the tensor, and not the contraction plan?

E.g. in case we do not find a configuration for a certain TC, the ratio is cuTENOR_time / Inf = 0, leading to a 0 geomean. While we should fix such cases, let's calculate the geomean w/o taking these into account to get an idea of the average performance for now.

…tractions

wardvermeulen force-pushed the TensorContractions branch from 110754b to 841a2ca Compare June 11, 2023 13:48

thomasfaingnaert force-pushed the TensorContractions branch from aacf071 to 31156dc Compare June 26, 2023 12:56

maleadt reviewed Jun 28, 2023

View reviewed changes

src/GettContractions.jl Outdated

@@ -0,0 +1,56 @@

export GETT

Copy link

Member

maleadt Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this file for? It's included nowhere.

maleadt reviewed Jun 28, 2023

View reviewed changes

wardvermeulen force-pushed the TensorContractions branch 3 times, most recently from f9266e8 to 6a7af93 Compare October 24, 2023 21:20

wardvermeulen force-pushed the TensorContractions branch 3 times, most recently from 582366c to cf04fa2 Compare November 7, 2023 21:00

wardvermeulen force-pushed the TensorContractions branch from d75016c to d3d5b28 Compare November 26, 2023 15:39

wardvermeulen force-pushed the TensorContractions branch from d3d5b28 to 700f50a Compare January 14, 2024 14:11

wardvermeulen force-pushed the TensorContractions branch from 7f06ac8 to 2202f41 Compare January 21, 2024 22:47

wardvermeulen added 14 commits February 18, 2024 15:12

TensorContractions relevant changes from dev

273f20b

add GettContractions file

d82166f

refactoring

e23d7cf

add testing and elementwiseTrinary

208ded7

contraction with different binary operators

b2184b9

try to set CUDA runtime to v11.8

88d5f4e

fix benchmark suite, plotting, and bug

c341549

small improvement to vectorised loading of C

4ba98e4

variable name refactor

5a32e46

fix tiny error

c012367

performance improvement + better variable names

9287dc3

temporarily commit manifest for CI

e2fd288

add Pkg.resolve to CI

22e0ff3

manifest not needed in test/

2edce99

thomasfaingnaert added 23 commits June 5, 2024 15:37

Do not checkpoint if no jobs were run

fc249b7

Print how many compilation and measurement workers were started

606929d

Print starting time of sweep

aff27b2

Fix incorrect timezone for start time

c75c0c7

Add worker type in profile filename

ac23434

Write FlameGraph for profiles

69c5e06

Print some more information in logs

5495b31

Avoid allocations when finding min time

62d47b8

Avoid recalculating number of done configurations

1b2d76c

Add type of worker to log filename

b976e26

Merge branch 'master' into TensorContractions

39aff7e

Add some more annotations to performance graph

fdb3c6a

Add unprivileged mode to tune script

bfc2a0d

Allow running sweep without systemd

4c78071

Print time limit of sweep in progress

bbf9ca2

Fix

1d8ddc3

Print TC index in plot

0bbe66c

Merge remote-tracking branch 'ward/TensorContractions' into TensorCon…

fb6a4c0

…tractions

Calculate geomean when combining cuTENSOR and GK

459d487

Include row major D in sweep

8883b3f

Add option to only plot

1ea2455

Fix

f2cfc98

thomasfaingnaert force-pushed the TensorContractions branch from 968d1c2 to 5c50b5f Compare November 4, 2024 12:57

thomasfaingnaert added 2 commits November 4, 2024 13:59

Add version of extents rounded to multiples of 32

6a1820e

Use sizes rounded to 32

ab3d38c

thomasfaingnaert force-pushed the TensorContractions branch from 5c50b5f to ab3d38c Compare November 4, 2024 13:00

thomasfaingnaert added 3 commits November 4, 2024 14:59

Revert to old sizes

b1e44c3

Add profile scripts

249ba45

Fix profile scripts

1990e9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor contractions #105

Tensor contractions #105

wardvermeulen commented Jun 7, 2023 •

edited

Loading

thomasfaingnaert commented Jun 11, 2023

maleadt commented Jun 13, 2023

maleadt commented Jun 15, 2023

maleadt Jun 28, 2023

maleadt Jun 28, 2023

Tensor contractions #105

Are you sure you want to change the base?

Tensor contractions #105

Conversation

wardvermeulen commented Jun 7, 2023 • edited Loading

thomasfaingnaert commented Jun 11, 2023

maleadt commented Jun 13, 2023

maleadt commented Jun 15, 2023

maleadt Jun 28, 2023

Choose a reason for hiding this comment

maleadt Jun 28, 2023

Choose a reason for hiding this comment

wardvermeulen commented Jun 7, 2023 •

edited

Loading