-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor contractions #105
base: master
Are you sure you want to change the base?
Tensor contractions #105
Conversation
Can you rebase against |
110754b
to
841a2ca
Compare
See JuliaGPU/CUDA.jl#1960, both for CI and for any impact it may have on your work. |
If you want to use a Manifest, it'll have to be one generated by the oldest version of Julia you want to test, i.e., 1.6 (should be easy enough using |
aacf071
to
31156dc
Compare
src/GettContractions.jl
Outdated
@@ -0,0 +1,56 @@ | |||
export GETT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this file for? It's included nowhere.
extent::Vector{Int} | ||
stride::Vector{Int} | ||
dataType::DataType | ||
unaryOp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this part of the tensor, and not the contraction plan?
f9266e8
to
6a7af93
Compare
582366c
to
cf04fa2
Compare
d75016c
to
d3d5b28
Compare
d3d5b28
to
700f50a
Compare
7f06ac8
to
2202f41
Compare
E.g. in case we do not find a configuration for a certain TC, the ratio is cuTENOR_time / Inf = 0, leading to a 0 geomean. While we should fix such cases, let's calculate the geomean w/o taking these into account to get an idea of the average performance for now.
968d1c2
to
5c50b5f
Compare
5c50b5f
to
ab3d38c
Compare
This PR adds Tensor Contraction functionality to GemmKernels.jl using the GEMM-like Tensor Tensor (GETT) multiplication algorithm. The API mimics the cuTENSOR API. It is still a draft, the benchmark scripts need to be refined further. Because it is a draft, I disabled the other tests temporarily. I also had to revert the CUDA runtime to v11.8 for cuTENSOR to work.
The following function of cuTENSOR is implemented:
contraction!
This works both with
WMMAOp
as withFPUOp
of #101. As far as I can tell, cuTENSOR does not support different operations than multiplication and addition for its contraction. Because of the FPU operator, that is a possibility here.The following functions are not implemented:
permutation!
reduction!
elementwiseTrinary!
elementwiseBinary!
I think it could be interesting future work to create a permutation (i.e. transposition) kernel and reduction kernel through the reuse of GemmKernels.jl building blocks. The same goes for the elementwise functions. You could probably do something extremely similar to GETT, but the kernel would need changing since contractions are not allowed.
The
contraction!
functionality is tested against the TCCG benchmark suite, using cuTENSOR to verify the results. Benchmarks for the Tesla P100, GeForce RTX 2080 Ti and Tesla V100 will be added later.