Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[L0] Enable Copy engine support with in-order command lists #2636

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 163 additions & 14 deletions source/adapters/level_zero/command_buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,64 @@ ur_result_t getEventsFromSyncPoints(
return UR_RESULT_SUCCESS;
}

/**
* If necessary, it creates a signal event and appends it to the previous
* command list (copy or compute), to indicate when it's finished executing.
* @param[in] CommandType The type of the command.
* @param[in] CommandBuffer The CommandBuffer where the command is appended.
* @param[in] IsFirstNode A boolean inidicating if the current node is at the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param[in] IsFirstNode A boolean inidicating if the current node is at the
* @param[in] IsFirstNode A boolean indicating if the current node is at the

* beginning of the command list.
* @param[in] isCopy A boolean indicating if the current command uses copy
* engine.
* @param[out] ZeSignalPrevCommandEvent The event which signals when the
* previous command appended to a different command list is finished executing.
* @return UR_RESULT_SUCCESS or an error code on failure
*/
ur_result_t createSyncPointBetweenCopyAndCompute(
ur_command_t CommandType, ur_exp_command_buffer_handle_t CommandBuffer,
bool IsFirstNode, bool IsCopy,
ze_event_handle_t &ZeSignalPrevCommandEvent) {
if (IsFirstNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be CommandBuffer->IsFirstNode() now? and we drop the function parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid using IsFirstNode completely. It seems to only be used in this check and if I understand it correctly, initializing MWasPrevCopyCommandList to true in the CommandBuffer constructor would do the same. I could be missing something though

CommandBuffer->MCopyCommandListEmpty = !IsCopy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this bit of code needed? Isn't chooseCommandList() already assigning a value to MCopyCommandListEmpty and MComputeCommandListEmtpy?

CommandBuffer->MComputeCommandListEmpty = IsCopy;
CommandBuffer->MWasPrevCopyCommandList = IsCopy;
return UR_RESULT_SUCCESS;
} else if (CommandBuffer->MWasPrevCopyCommandList == IsCopy) {
return UR_RESULT_SUCCESS;
}

ur_event_handle_t SignalPrevCommandEvent = nullptr;
UR_CALL(EventCreate(CommandBuffer->Context, nullptr /*Queue*/,
false /*IsMultiDevice*/, false, &SignalPrevCommandEvent,
false /*CounterBasedEventEnabled*/,
!CommandBuffer->IsProfilingEnabled,
false /*InterruptBasedEventEnabled*/));
SignalPrevCommandEvent->CommandType = CommandType;

/*
* If the current CommandType is different from the one used lastly, we
* need to append ZeSignalPrevCommandEvent to the latter to know when it's
* finished executing.
*/
ZeSignalPrevCommandEvent = SignalPrevCommandEvent->ZeEvent;
if (CommandBuffer->MWasPrevCopyCommandList && !IsCopy) {
ZE2UR_CALL(zeCommandListAppendSignalEvent,
(CommandBuffer->ZeCopyCommandList, ZeSignalPrevCommandEvent));
CommandBuffer->MWasPrevCopyCommandList = false;
} else {
ZE2UR_CALL(zeCommandListAppendSignalEvent,
(CommandBuffer->ZeComputeCommandList, ZeSignalPrevCommandEvent));
CommandBuffer->MWasPrevCopyCommandList = true;
}

// Get sync point and register the event with it.
ur_exp_command_buffer_sync_point_t SyncPoint =
CommandBuffer->getNextSyncPoint();
CommandBuffer->registerSyncPoint(SyncPoint, SignalPrevCommandEvent);
Comment on lines +223 to +226
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is every call to this function inside a check for CommandBuffer->IsInOrderCmdList ? If so I don't think we need to care about sync points (and should also stick that in function comment) Also preference for renaming the function to remove "syncpoint" in that case.

Copy link
Contributor Author

@konradkusiak97 konradkusiak97 Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. createSyncPointBetweenCopyAndCompute is only ever called inside CommandBuffer->IsInOrderCmdList check to synchronize between the potential in-orderCopy vs in-orderCompute race.

AFAIK Registering SignalPrevCommandEvent adds it to the event list which is later being reset. If we managed to use counter-based events, lines 223-226 wouldn't be necessary.

I might be misunderstanding the purpose of getNextSyncPoint() and registerSyncPoint(), maybe there is a more straightforward way to cache this signal event and later reset it but I thought that's the current way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding the purpose of getNextSyncPoint() and registerSyncPoint(), maybe there is a more straightforward way to cache this signal event and later reset it but I thought that's the current way.

This is more what I was getting at. As was trying to ask if we need this sync point if the UR user doesn't care about sync points. However, you've answered my question by saying that we need to do this to cache the L0 event and reset it later. So I guess my feedback is to update the code comment to append ", so that the event is reset between executions"


return UR_RESULT_SUCCESS;
}

/**
* If needed, creates a sync point for a given command and returns the L0
* events associated with the sync point.
Expand Down Expand Up @@ -225,7 +283,7 @@ ur_result_t createSyncPointAndGetZeEvents(
return UR_RESULT_SUCCESS;
}

// Shared by all memory read/write/copy PI interfaces.
// Shared by all memory read/write/copy UR interfaces.
// Helper function for common code when enqueuing memory operations to a command
// buffer.
ur_result_t enqueueCommandBufferMemCopyHelper(
Expand All @@ -241,9 +299,25 @@ ur_result_t enqueueCommandBufferMemCopyHelper(
CommandType, CommandBuffer, NumSyncPointsInWaitList, SyncPointWaitList,
false, RetSyncPoint, ZeEventList, ZeLaunchEvent));

// The chooseCommandList() will modify MComputeCommandListEmpty and
konradkusiak97 marked this conversation as resolved.
Show resolved Hide resolved
// MCopyCommandListEmpty so isFirstNode needs to be called beforehand.
bool IsFirstNode{CommandBuffer->isFirstNode()};

ze_command_list_handle_t ZeCommandList =
CommandBuffer->chooseCommandList(PreferCopyEngine);

// Create a sync point if both Copy and Compute in-order command lists are
// used.
if (CommandBuffer->IsInOrderCmdList &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit of code is copy pasted a lot. It can probably be refactored into its own function.

ZeCommandList == CommandBuffer->ZeCopyCommandList) {
ze_event_handle_t ZeSignalPrevCommandEvent = nullptr;
UR_CALL(createSyncPointBetweenCopyAndCompute(CommandType, CommandBuffer,
IsFirstNode, true /*isCopy=*/,
ZeSignalPrevCommandEvent));
if (ZeSignalPrevCommandEvent) {
ZeEventList.push_back(ZeSignalPrevCommandEvent);
}
}
ZE2UR_CALL(zeCommandListAppendMemoryCopy,
(ZeCommandList, Dst, Src, Size, ZeLaunchEvent, ZeEventList.size(),
getPointerFromVector(ZeEventList)));
Expand Down Expand Up @@ -299,9 +373,26 @@ ur_result_t enqueueCommandBufferMemCopyRectHelper(
CommandType, CommandBuffer, NumSyncPointsInWaitList, SyncPointWaitList,
false, RetSyncPoint, ZeEventList, ZeLaunchEvent));

// The chooseCommandList() will modify MComputeCommandListEmpty and
// MCopyCommandListEmpty so isFirstNode needs to be called beforehand.
bool IsFirstNode{CommandBuffer->isFirstNode()};

ze_command_list_handle_t ZeCommandList =
CommandBuffer->chooseCommandList(PreferCopyEngine);

// Create a sync point if both Copy and Compute in-order command lists are
// used.
if (CommandBuffer->IsInOrderCmdList &&
ZeCommandList == CommandBuffer->ZeCopyCommandList) {
ze_event_handle_t ZeSignalPrevCommandEvent = nullptr;
UR_CALL(createSyncPointBetweenCopyAndCompute(CommandType, CommandBuffer,
IsFirstNode, true /*isCopy=*/,
ZeSignalPrevCommandEvent));
if (ZeSignalPrevCommandEvent) {
ZeEventList.push_back(ZeSignalPrevCommandEvent);
}
}

ZE2UR_CALL(zeCommandListAppendMemoryCopyRegion,
(ZeCommandList, Dst, &ZeDstRegion, DstPitch, DstSlicePitch, Src,
&ZeSrcRegion, SrcPitch, SrcSlicePitch, ZeLaunchEvent,
Expand Down Expand Up @@ -331,9 +422,26 @@ ur_result_t enqueueCommandBufferFillHelper(
UR_CALL(
preferCopyEngineForFill(CommandBuffer, PatternSize, PreferCopyEngine));

// The chooseCommandList() will modify MComputeCommandListEmpty and
// MCopyCommandListEmpty so isFirstNode needs to be called beforehand.
bool IsFirstNode{CommandBuffer->isFirstNode()};

ze_command_list_handle_t ZeCommandList =
CommandBuffer->chooseCommandList(PreferCopyEngine);

// Create a sync point if both Copy and Compute in-order command lists are
// used.
if (CommandBuffer->IsInOrderCmdList &&
ZeCommandList == CommandBuffer->ZeCopyCommandList) {
ze_event_handle_t ZeSignalPrevCommandEvent = nullptr;
UR_CALL(createSyncPointBetweenCopyAndCompute(CommandType, CommandBuffer,
IsFirstNode, true /*isCopy=*/,
ZeSignalPrevCommandEvent));
if (ZeSignalPrevCommandEvent) {
ZeEventList.push_back(ZeSignalPrevCommandEvent);
}
}

ZE2UR_CALL(zeCommandListAppendMemoryFill,
(ZeCommandList, Ptr, Pattern, PatternSize, Size, ZeLaunchEvent,
ZeEventList.size(), getPointerFromVector(ZeEventList)));
Expand Down Expand Up @@ -489,12 +597,13 @@ void ur_exp_command_buffer_handle_t_::registerSyncPoint(

ze_command_list_handle_t
ur_exp_command_buffer_handle_t_::chooseCommandList(bool PreferCopyEngine) {
if (PreferCopyEngine && this->useCopyEngine() && !this->IsInOrderCmdList) {
if (PreferCopyEngine && useCopyEngine()) {
// We indicate that ZeCopyCommandList contains commands to be submitted.
this->MCopyCommandListEmpty = false;
return this->ZeCopyCommandList;
MCopyCommandListEmpty = false;
return ZeCopyCommandList;
}
return this->ZeComputeCommandList;
MComputeCommandListEmpty = false;
return ZeComputeCommandList;
}

ur_result_t ur_exp_command_buffer_handle_t_::getFenceForQueue(
Expand Down Expand Up @@ -661,7 +770,7 @@ urCommandBufferCreateExp(ur_context_handle_t Context, ur_device_handle_t Device,
// the current implementation only uses the main copy engine and does not use
// the link engine even if available.
if (Device->hasMainCopyEngine()) {
UR_CALL(createMainCommandList(Context, Device, false, false, true,
UR_CALL(createMainCommandList(Context, Device, IsInOrder, false, true,
ZeCopyCommandList));
}

Expand Down Expand Up @@ -827,18 +936,18 @@ finalizeWaitEventPath(ur_exp_command_buffer_handle_t CommandBuffer) {
(CommandBuffer->ZeCommandListResetEvents,
CommandBuffer->ExecutionFinishedEvent->ZeEvent));

// Reset the L0 events we use for command-buffer sync-points to the
// non-signaled state. This is required for multiple submissions.
for (auto &Event : CommandBuffer->ZeEventsList) {
ZE2UR_CALL(zeCommandListAppendEventReset,
(CommandBuffer->ZeCommandListResetEvents, Event));
}

if (CommandBuffer->IsInOrderCmdList) {
ZE2UR_CALL(zeCommandListAppendSignalEvent,
(CommandBuffer->ZeComputeCommandList,
CommandBuffer->ExecutionFinishedEvent->ZeEvent));
} else {
// Reset the L0 events we use for command-buffer sync-points to the
// non-signaled state. This is required for multiple submissions.
for (auto &Event : CommandBuffer->ZeEventsList) {
ZE2UR_CALL(zeCommandListAppendEventReset,
(CommandBuffer->ZeCommandListResetEvents, Event));
}

// Wait for all the user added commands to complete, and signal the
// command-buffer signal-event when they are done.
ZE2UR_CALL(zeCommandListAppendBarrier,
Expand Down Expand Up @@ -1084,6 +1193,18 @@ ur_result_t urCommandBufferAppendKernelLaunchExp(
UR_COMMAND_KERNEL_LAUNCH, CommandBuffer, NumSyncPointsInWaitList,
SyncPointWaitList, false, RetSyncPoint, ZeEventList, ZeLaunchEvent));

// Create a sync point if both Copy and Compute in-order command lists are
// used.
if (CommandBuffer->IsInOrderCmdList && CommandBuffer->ZeCopyCommandList) {
ze_event_handle_t ZeSignalPrevCommandEvent = nullptr;
UR_CALL(createSyncPointBetweenCopyAndCompute(
UR_COMMAND_KERNEL_LAUNCH, CommandBuffer, CommandBuffer->isFirstNode(),
false /*isCopy=*/, ZeSignalPrevCommandEvent));
if (ZeSignalPrevCommandEvent) {
ZeEventList.push_back(ZeSignalPrevCommandEvent);
}
}

ZE2UR_CALL(zeCommandListAppendLaunchKernel,
(CommandBuffer->ZeComputeCommandList, ZeKernel,
&ZeThreadGroupDimensions, ZeLaunchEvent, ZeEventList.size(),
Expand Down Expand Up @@ -1315,7 +1436,20 @@ ur_result_t urCommandBufferAppendUSMPrefetchExp(
std::ignore = Flags;

if (CommandBuffer->IsInOrderCmdList) {
// Add the prefetch command to the command-buffer.
// Create a sync point if both Copy and Compute in-order command lists are
// used.
if (CommandBuffer->ZeCopyCommandList) {
ze_event_handle_t ZeSignalPrevCommandEvent = nullptr;
UR_CALL(createSyncPointBetweenCopyAndCompute(
UR_COMMAND_KERNEL_LAUNCH, CommandBuffer, CommandBuffer->isFirstNode(),
false /*isCopy=*/, ZeSignalPrevCommandEvent));
if (ZeSignalPrevCommandEvent) {
ZE2UR_CALL(zeCommandListAppendWaitOnEvents,
(CommandBuffer->ZeComputeCommandList, 1,
&ZeSignalPrevCommandEvent));
}
}
// Add the prefetch command to the command buffer.
// Note that L0 does not handle migration flags.
ZE2UR_CALL(zeCommandListAppendMemoryPrefetch,
(CommandBuffer->ZeComputeCommandList, Mem, Size));
Expand All @@ -1342,6 +1476,7 @@ ur_result_t urCommandBufferAppendUSMPrefetchExp(
ZE2UR_CALL(zeCommandListAppendSignalEvent,
(CommandBuffer->ZeComputeCommandList, ZeLaunchEvent));
}
CommandBuffer->MComputeCommandListEmpty = false;

return UR_RESULT_SUCCESS;
}
Expand Down Expand Up @@ -1385,6 +1520,19 @@ ur_result_t urCommandBufferAppendUSMAdviseExp(
ze_memory_advice_t ZeAdvice = static_cast<ze_memory_advice_t>(Value);

if (CommandBuffer->IsInOrderCmdList) {
// Create a sync point if both Copy and Compute in-order command lists are
// used.
if (CommandBuffer->ZeCopyCommandList) {
ze_event_handle_t ZeSignalPrevCommandEvent = nullptr;
UR_CALL(createSyncPointBetweenCopyAndCompute(
UR_COMMAND_KERNEL_LAUNCH, CommandBuffer, CommandBuffer->isFirstNode(),
false /*isCopy=*/, ZeSignalPrevCommandEvent));
if (ZeSignalPrevCommandEvent) {
ZE2UR_CALL(zeCommandListAppendWaitOnEvents,
(CommandBuffer->ZeComputeCommandList, 1,
&ZeSignalPrevCommandEvent));
}
}
ZE2UR_CALL(zeCommandListAppendMemAdvise,
(CommandBuffer->ZeComputeCommandList,
CommandBuffer->Device->ZeDevice, Mem, Size, ZeAdvice));
Expand All @@ -1410,6 +1558,7 @@ ur_result_t urCommandBufferAppendUSMAdviseExp(
ZE2UR_CALL(zeCommandListAppendSignalEvent,
(CommandBuffer->ZeComputeCommandList, ZeLaunchEvent));
}
CommandBuffer->MComputeCommandListEmpty = false;

return UR_RESULT_SUCCESS;
}
Expand Down
9 changes: 9 additions & 0 deletions source/adapters/level_zero/command_buffer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ struct ur_exp_command_buffer_handle_t_ : public _ur_object {
return NextSyncPoint;
}

bool isFirstNode() const {
return MComputeCommandListEmpty && MCopyCommandListEmpty;
}

// Indicates if a copy engine is available for use
bool useCopyEngine() const { return ZeCopyCommandList != nullptr; }

Expand Down Expand Up @@ -110,6 +114,11 @@ struct ur_exp_command_buffer_handle_t_ : public _ur_object {
// This flag must be set to false if at least one copy command has been
// added to `ZeCopyCommandList`
bool MCopyCommandListEmpty = true;
// This flag must be set to false if at least one compute command has been
// added to `ZeComputeCommandList`
bool MComputeCommandListEmpty = true;
// This flag tracks if the previous node submission was compute or copy type.
bool MWasPrevCopyCommandList = false;
// [WaitEvent Path only] Level Zero fences for each queue the command-buffer
// has been enqueued to. These should be destroyed when the command-buffer is
// released.
Expand Down
Loading