-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update Embedding layer #1656
base: master
Are you sure you want to change the base?
update Embedding layer #1656
Conversation
src/outputsize.jl
Outdated
@@ -168,3 +168,6 @@ for (fn, Dims) in ((:conv, DenseConvDims), (:depthwiseconv, DepthwiseConvDims)) | |||
end | |||
end | |||
end | |||
|
|||
(m::Embedding)(x::AbstractVector{<:Nil}) = fill(nil, size(m.weight, 1), length(x)) | |||
(m::Embedding)(x::AbstractArray{<:Nil}) = fill(nil, size(m.weight, 1), last(size(x))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods should not be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly where they should be (see the other code in the file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, these methods should not be defined at all since this is what we would expect output size to already know how to do. If output size cannot find the expected size, then it should be fixed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputsize
's underlying mechanism was designed for numerical array operations which hits ~95% case even with custom layers. There will always be some layers that we might need to special case. A layer that uses indexing as the underlying operation is non-sensical with outputsize
. How would you know the result of indexing an array with Nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only need the size, so maybe outputsize should know what to do with getindex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether it's useful, but the ordinary function above accepts any array. They should surely match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to specialise on OneHot, but what happens if by some chance you collect it? Best case is that this gives you the same result, it's just an optimisation. (Indexing things often treat an array of Bools differently to other integers.)
Second best is for it to be an error, which is what I get now, but is this guaranteed? (Perhaps it is without offset arrays?)
julia> m(Flux.OneHotMatrix([6, 15, 15], 26))
2×3 Matrix{Float32}:
4.01 22.01 22.01
5.01 23.01 23.01
julia> m(collect(Flux.OneHotMatrix([6, 15, 15], 26)))
ERROR: BoundsError: attempt to access 2×26 Matrix{Float32} at index [1:2, 78-element Vector{Bool}]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether it's useful, but the ordinary function above accepts any array. They should surely match.
Yeah, but my intent was the >2D hits the reshape route in the embedding layer source. We only need to specify the Nil
path for what finally gets called which is AbstractVecOrMat
.
Good to specialise on OneHot, but what happens if by some chance you collect it?
This looks related to your other suggestions. We should constraint on AbstractVector{<:Integer}
and AbstractVecOrMat{<:Bool}
separately? In the latter case, it should be a matmul (mathematically) or m.weight[:, idx]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this make more sense for AbstractArray
?
(m::Embedding)(x::AbstractArray{<:Nil}) = fill(nil, size(m.weight, 1), size(x)...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would keep the Nil
overrides to a minimum and only do the vector case. Let the rest go through the normal routing for all arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need some tests, but this looks good.
Needs tests |
ref. #1516 |
I rebase the ref. #1516 on master since there were some conflicts. So I guess you should do |
Or maybe it is just easier if I merge #1516 and then this targets master |
Yeah sure that works |
Maybe labels other than integers starting at 1 deserve some thought. Flux's one-hot encoding lets you specify the set of labels, which are not stored,
Should this do something similar? Then |
Mapping to an indices or one-hot space is a standard data transformation for using a layer like If we did want to include it, I would store a "transform" function within the layer. PS: we do support |
Ok, indeed this should probably be written something like |
Should this be re-targeted for master? |
yes. Probably filing a new PR is easier |
I changed the base branch to master. |
Looks like you need a rebase too |
src/layers/basic.jl
Outdated
|
||
Embedding(dims::Pair{<:Integer, <:Integer}; init = randn32) = Embedding(init(last(dims), first(dims))) | ||
|
||
(m::Embedding)(x::Union{OneHotVector, OneHotMatrix}) = m.weight * x # equivalent to m.weight[:,onecold(x)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I think treating collect(onehot(...))
the same as onehot()
would be nice. They are both arrays of booleans, so something like this may work:
(m::Embedding)(x::Union{OneHotVector, OneHotMatrix}) = m.weight * x # equivalent to m.weight[:,onecold(x)] | |
(m::Embedding)(x::AbstractVecOrMat{Bool}) = m.weight * x # handles OneHotVector, OneHotMatrix |
(Or it may require a bit more thought which is more specific.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it comes down to whether the Vector{Bool}
case should be a matmul or index operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those should agree when it's actually one-hot (if not OneHot) up to a dropdims:
julia> [1 2 3; 4 5 6] * [false, true, false]
2-element Vector{Int64}:
2
5
julia> [1 2 3; 4 5 6][:, [false, true, false]]
2×1 Matrix{Int64}:
2
5
But not when it's not:
julia> [1 2 3; 4 5 6] * [false, true, true]
2-element Vector{Int64}:
5
11
julia> [1 2 3; 4 5 6][:, [false, true, true]]
2×2 Matrix{Int64}:
2 3
5 6
Should the latter case be an error?
Maybe indexing would be faster although I'm not sure we care about optimising this. Logical indexing can't give you aliasing problems, if I'm thinking correctly.
Another possible strategy would just be to it an error on AbstractArray{Bool}
. Only permit OneHot, or counting numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mostly speaking from an optimization stand point. If we don't care about this case being fast (it is rare), then the suggested code works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to change the current onehot paths from
function (m::Embedding)(x::Union{OneHotLikeVector{T,L}, OneHotLikeMatrix{T,L,I}}) where {T,L,I}
size(m.weight, 2) == L || throw(DimensionMismatch("Matrix column must correspond with OneHot size: $(size(m.weight, 2)) != $L"))
return m(onecold(x))
end
back to a matmul?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely, the current implementation is just code duplication.
src/layers/basic.jl
Outdated
(m::Embedding)(x::AbstractVector) = NNlib.gather(m.weight, x) | ||
(m::Embedding)(x::AbstractArray) = reshape(m(vec(x)), :, size(x)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also restrict these to integers
(m::Embedding)(x::AbstractVector) = NNlib.gather(m.weight, x) | |
(m::Embedding)(x::AbstractArray) = reshape(m(vec(x)), :, size(x)...) | |
(m::Embedding)(x::AbstractVector{<:Integer}) = NNlib.gather(m.weight, x) | |
(m::Embedding)(x::AbstractArray{<:Integer}) = reshape(m(vec(x)), :, size(x)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe only for AbstractVector
so that Nil
-arrays can go through the reshape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I wasn't thinking about Nil, good point. Do you want Array{Nil} to behave like Array{Int} or like Array{Bool}?
One other weird feature is that if you make a OneHotTensor, like a boolean 3-array, this will reshape it to a vector. Maybe that's too weird to worry about. Or maybe it should reshape AbstractArray{Bool}
not with vec but with m(reshape(size(x,1), :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you are correct. Reshaping should be different when the first dimension encodes one-hot information. This definitely presents a problem for Nil
, because the Nil
-array is agnostic of whether the original input is one-hot or indices. In other words, the output size is input type dependent for this layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, is the only way for getting the correct output from outputsize
just to ignore the first dimension of the one-hot input while passing in the inputsizes
argument?
@manikyabard are you still up for rebasing this and moving forward? We ought to close up this loose end. |
Yeah I can continue working on this, although I am not sure about the approach we should take for |
Yeah, sounds good! |
Summarizing what was discussed on the call:
|
You mean something like this? NNlib.gather!(dst::AbstractArray, ::AbstractArray, ::AbstractArray{<:Nil}) = fill(nil, size(dst)...)
(m::Embedding)(x::AbstractVector{<:Nil}) = NNlib.gather(m.weight, x) |
That looks right but you won't need to special case for |
this needs a rebase in master |
4c242f1
to
de43bf5
Compare
Codecov Report
@@ Coverage Diff @@
## master #1656 +/- ##
==========================================
- Coverage 84.58% 84.51% -0.08%
==========================================
Files 21 21
Lines 1486 1485 -1
==========================================
- Hits 1257 1255 -2
- Misses 229 230 +1
Continue to review full report at Codecov.
|
Can you add a test for |
weight::W | ||
end | ||
|
||
@functor Embedding | ||
|
||
Embedding(in::Integer, out::Integer; init = randn32) = Embedding(init(out, in)) | ||
Embedding(dims::Pair{<:Integer, <:Integer}; init = randn32) = Embedding(init(last(dims), first(dims))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the old constructor should be deprecated
|
||
julia> vocab_idxs = [1, 722, 53, 220, 3] | ||
```jldoctest | ||
julia> m = Embedding(reshape(-6:45, 2, 26) .+ 0.01f0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old example was much clearer.
This constructor (Embed(weight)
) is not even part of the docstring, we should add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed on the constructor.
The virtue of this example is that it doesn't have random numbers, so it can be a doctest. My hope is that onehotbatch("foo", 'a':'z')
might connect with 26
well enough to be easy to follow. Maybe it can be made clearer somehow?
test/cuda/layers.jl
Outdated
gpu_gradtest("Embedding OneHotMatrix index", embedding, OneHotMatrix([1,2,3], 5), 5, 2) | ||
gpu_gradtest("Embedding OneHotMatrix repeated indices", embedding, OneHotMatrix([1,2,2], 5), 5, 2) | ||
gpu_gradtest("Embedding", embedding, [1,3,5], 5 => 2) | ||
gpu_gradtest("Embedding repeated indices", embedding, rand(1:50, 10^6), 50 => 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to test such large arrays, CI takes already a lot of time, previous test was fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10^6 seems quite big, but maybe there's a some value to bigger than 3, in case e.g. 3 is never parallelised. No idea if there's a particular cutoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it to be 10^3
. Would this also add a lot of time to CI?
test/cuda/layers.jl
Outdated
gpu_gradtest("Embedding 2d index", embedding, [1 2; 3 4], 5 => 2) | ||
gpu_gradtest("Embedding OneHotVec index", embedding, OneHotVector(1, 5), 5 => 2) | ||
gpu_gradtest("Embedding OneHotMatrix index", embedding, OneHotMatrix([1,2,3], 5), 5 => 2) | ||
gpu_gradtest("Embedding OneHotMatrix repeated indices", embedding, OneHotMatrix(rand(1:50, 10^6), 50), 50 => 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
src/outputsize.jl
Outdated
@@ -168,3 +168,5 @@ for (fn, Dims) in ((:conv, DenseConvDims), (:depthwiseconv, DepthwiseConvDims)) | |||
end | |||
end | |||
end | |||
|
|||
NNlib.gather!(dst::AbstractArray, ::AbstractArray, ::AbstractArray{<:Nil}) = fill(nil, size(dst)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To understand what's going on, this is using Nil to stand in as an index, right? Maybe that warrants a comment. This is not otherwise handled:
julia> rand(3)[Flux.nil]
ERROR: ArgumentError: invalid index: Flux.NilNumber.Nil() of type Flux.NilNumber.Nil
Stacktrace:
[1] to_index(i::Flux.NilNumber.Nil)
@ Base ./indices.jl:300
[2] to_index(A::Vector{Float64}, i::Flux.NilNumber.Nil)
Also, is the fact that this doesn't mutate dst
and returns something else going to bite us someday?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, we can modify the non-mutating variation: NNlib.gather
which is what Flux layers will end up calling.
Co-authored-by: Michael Abbott <[email protected]> Co-authored-by: Kyle Daruwalla <[email protected]>
Updated Embedding constructor to use `=>` and added OneHotLikeVector and OneHotLikeMatrix consts.
Co-authored-by: Michael Abbott <[email protected]>
Co-authored-by: Michael Abbott <[email protected]>
Updates the Embedding layer to use
gather
forAbstractVector
which earlier had an issue with repeated indices. Also adds special case foroutputsize
.