Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA][HIP] Fix kernel arguments being overwritten when added out of order #2559

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fabiomestre
Copy link
Contributor

@fabiomestre fabiomestre commented Jan 14, 2025

In the Cuda and Hip adapter, when kernel arguments are added out of order (e.g. argument at index 1 is added before argument at index 0), the existing arguments are currently being overwritten. This happens because some of the argument sizes might not be known when adding them out of order and the code relies on those sizes to choose where to store the argument.

This PR avoids this issue by storing the arguments in the same order that they are added and accessing them using pointer offsets.

@fabiomestre fabiomestre force-pushed the fix_kernel_arg_indices branch from 4400095 to 7ca2133 Compare January 14, 2025 13:25
@github-actions github-actions bot added the cuda CUDA adapter specific issues label Jan 14, 2025
@fabiomestre fabiomestre force-pushed the fix_kernel_arg_indices branch 3 times, most recently from ccccf85 to 0a071be Compare January 14, 2025 15:25
@github-actions github-actions bot added the hip HIP adapter specific issues label Jan 14, 2025
@fabiomestre fabiomestre changed the title Fix kernel argument indices bug [CUDA][HIP] Fix kernel arguments being overriden when added out of order Jan 14, 2025
Copy link
Contributor

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add CTS tests for this? Probably in test/conformance/exp_command_buffer/update/local_memory_update.cpp & test/conformance/kernel/urKernelSetArgLocal.cpp

Might make it easier if we want to merge this fix earlier than the intel/llvm#16573 feature

@github-actions github-actions bot added the conformance Conformance test suite issues. label Jan 15, 2025
@fabiomestre fabiomestre force-pushed the fix_kernel_arg_indices branch from 86885a1 to bf6b6f9 Compare January 15, 2025 14:57
@fabiomestre fabiomestre marked this pull request as ready for review January 15, 2025 14:58
@fabiomestre fabiomestre requested review from a team as code owners January 15, 2025 14:58
@fabiomestre fabiomestre requested review from reble and npmiller January 15, 2025 14:58
if (ParamSizes[Index] == 0) {
ParamSizes[Index] = Size;
std::memcpy(&Storage[InsertPos], Arg, Size);
Indices[Index] = &Storage[InsertPos];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an issue introduced by this PR, but the description of Indices as Byte offset into /p Storage allocation for each parameter. is a little confusing, isn't it? An index, to me, implies a relative value from the beginning of the storage, not an absolute address. Do you think we should do something about this? I note this PR is embedding the use of Indices more deeply into the code so it might be best to fix now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't mind changing the variable name and related getters. Do you have a preference for the name? Perhaps ParamPointers or ParamOffsets `?

Copy link
Contributor Author

@fabiomestre fabiomestre Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing getter was called getArgIndices() so I decided to rename the variable to ArgPointers and the getter to getArgPointers(). Hopefully that name makes more sense.

// For every existing local argument which follows at later argument
// indices, update the offset and pointer into the kernel local memory.
// Required as padding will need to be recalculated.
// Iterate over all existing local argument which follows StartIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Iterate over all existing local argument which follows StartIndex
// Iterate over all existing local arguments which follow StartIndex
Suggested change
// Iterate over all existing local argument which follows StartIndex
// Iterate over each existing local argument which follows StartIndex

either works

@fabiomestre fabiomestre changed the title [CUDA][HIP] Fix kernel arguments being overriden when added out of order [CUDA][HIP] Fix kernel arguments being overwritten when added out of order Jan 15, 2025
Comment on lines +1213 to +1209
void TearDown() override {
if (command_handle) {
EXPECT_SUCCESS(urCommandBufferReleaseCommandExp(command_handle));
}

UUR_RETURN_ON_FATAL_FAILURE(
LocalMemoryUpdateTestBaseOutOfOrder::TearDown());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to do right now, but heads up this will conflict with #2578

In the Cuda and Hip adapter, when kernel arguments are added out of order
(e.g. argument at index 1 is added before argument at index 0), the
existing arguments are currently being overwritten. This happens because
some of the argument sizes might not be known when adding them out of
order and the code relies on those sizes to choose where to store the
argument.

This commit avoids this issue by storing the arguments in the same order that
they are added and accessing them using pointer offsets.
@github-actions github-actions bot added the command-buffer Command Buffer feature addition/changes/specification label Jan 21, 2025
@fabiomestre fabiomestre force-pushed the fix_kernel_arg_indices branch from 27a3561 to e3dcfc3 Compare January 21, 2025 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
command-buffer Command Buffer feature addition/changes/specification conformance Conformance test suite issues. cuda CUDA adapter specific issues hip HIP adapter specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants