-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA][HIP] Fix kernel arguments being overwritten when added out of order #2559
base: main
Are you sure you want to change the base?
Conversation
4400095
to
7ca2133
Compare
ccccf85
to
0a071be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add CTS tests for this? Probably in test/conformance/exp_command_buffer/update/local_memory_update.cpp
& test/conformance/kernel/urKernelSetArgLocal.cpp
Might make it easier if we want to merge this fix earlier than the intel/llvm#16573 feature
86885a1
to
bf6b6f9
Compare
source/adapters/cuda/kernel.hpp
Outdated
if (ParamSizes[Index] == 0) { | ||
ParamSizes[Index] = Size; | ||
std::memcpy(&Storage[InsertPos], Arg, Size); | ||
Indices[Index] = &Storage[InsertPos]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't an issue introduced by this PR, but the description of Indices
as Byte offset into /p Storage allocation for each parameter.
is a little confusing, isn't it? An index, to me, implies a relative value from the beginning of the storage, not an absolute address. Do you think we should do something about this? I note this PR is embedding the use of Indices
more deeply into the code so it might be best to fix now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't mind changing the variable name and related getters. Do you have a preference for the name? Perhaps ParamPointers
or ParamOffsets
`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing getter was called getArgIndices()
so I decided to rename the variable to ArgPointers
and the getter to getArgPointers()
. Hopefully that name makes more sense.
source/adapters/hip/kernel.hpp
Outdated
// For every existing local argument which follows at later argument | ||
// indices, update the offset and pointer into the kernel local memory. | ||
// Required as padding will need to be recalculated. | ||
// Iterate over all existing local argument which follows StartIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Iterate over all existing local argument which follows StartIndex | |
// Iterate over all existing local arguments which follow StartIndex |
// Iterate over all existing local argument which follows StartIndex | |
// Iterate over each existing local argument which follows StartIndex |
either works
void TearDown() override { | ||
if (command_handle) { | ||
EXPECT_SUCCESS(urCommandBufferReleaseCommandExp(command_handle)); | ||
} | ||
|
||
UUR_RETURN_ON_FATAL_FAILURE( | ||
LocalMemoryUpdateTestBaseOutOfOrder::TearDown()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to do right now, but heads up this will conflict with #2578
In the Cuda and Hip adapter, when kernel arguments are added out of order (e.g. argument at index 1 is added before argument at index 0), the existing arguments are currently being overwritten. This happens because some of the argument sizes might not be known when adding them out of order and the code relies on those sizes to choose where to store the argument. This commit avoids this issue by storing the arguments in the same order that they are added and accessing them using pointer offsets.
27a3561
to
e3dcfc3
Compare
In the Cuda and Hip adapter, when kernel arguments are added out of order (e.g. argument at index 1 is added before argument at index 0), the existing arguments are currently being overwritten. This happens because some of the argument sizes might not be known when adding them out of order and the code relies on those sizes to choose where to store the argument.
This PR avoids this issue by storing the arguments in the same order that they are added and accessing them using pointer offsets.