Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for support of dmabuf with CRIU #2581

Draft
wants to merge 1 commit into
base: criu-dev
Choose a base branch
from

Conversation

Snorch
Copy link
Member

@Snorch Snorch commented Feb 3, 2025

@fdavid-amd I've converted your post in criu mailing list https://lists.openvz.org/pipermail/criu/2025-January/045531.html https://lists.openvz.org/pipermail/criu/2025-January/045532.html to this PR, as mailing list currently experiences problems with sending emails to gmail accounts and people might miss it there.

below are original patch descriptions:

[CRIU] Proposal for support of dmabuf with CRIU

We are working on extending support for CRIU checkpoint/restore in the amdgpu
driver to support new use cases and ROCm applications increasingly using
render node ioctls for memory management. In the longer term this may also
allow checkpoint/restore of graphics application. With this patch series we
are soliciting feedback about our design and proof-of-concept implementation.
Our intention is to upstream a future version of these changes in DRM, amdgpu
and CRIU.

Following the philosophy of CRIU (where “IU” stand for “in usermode”) much of
the complexity of saving and restoring memory sharing relationships is handled
in user mode. We believe our work can serve as a blueprint for supporting CRIU
in other DRM and dmabuf drivers in the future.

Motivation

Applications managing GPU/Compute resources on AMD platforms use the
amdgpu driver. This driver exposes one DRM /dev/dri/renderD interface per
AMD device and one KFD /dev/kfd device file per machine for all devices.

AMD's CRIU plugin currently deals with both /dev/kfd and renderD device files.
For /dev/kfd it uses a CRIU-exclusive ioctl on the device file to dump and
restore device data. For the renderD device files, it only records the
device ID and topology.

dmabuf is a kernel component that allows users to share memory between
devices. A user can request export of a file descriptor (a dmabuf_fd)
representing a region of memory allocated on a device from that device's
driver. That dmabuf_fd can be sent (imported) to another device (by the
same process or a different one) to create shared memory between the two
devices. This results in the two processes having handles on the same physical
memory.

Further details about dmabuf can be found at
https://docs.kernel.org/driver-api/dma-buf.html

AMD is moving towards dmabuf-based buffer object management to support
inter-process communication, as well as other features.

AMD's CRIU plugin needs to support processes that have shared memory created
with dmabuf. Currently, all applications that use this feature allocate
memory in /dev/kfd, export it from that device, and then import in an renderD
device. This CRIU solution should be able to support future use cases
including allocating and exporting from renderD, importing on /dev/kfd, and
even sharing memory with non-AMD devices. As much as possible, we would like to
contain these changes in the CRIU plugin, although changes in core CRIU,
amdgpu, amdkfd, and core drm are also necessary.

Please find attached the relevant kernel and CRIU patches.

Access relevant data

The amdgpu CRIU plugin already uses CRIU ioctls of the kfd device to
extract driver state from /dev/kfd, and to recreate that state during restore.
The renderD devices have their own driver state, and to access it we need
CRIU renderD ioctls.

The amdgpu data that needs to be stored during dump and restored during
restore is all BO-related: the bos' addresses, sizes, and flags, and the
similar data for associated virtual memory mappings.

Typically, applications access renderD drm ioctls through a wrapped library
called libdrm. However, since these ioctls will only ever be used by CRIU
itself, we have chosen to have CRIU call them directly. This means that the
relevant header file, amdgpu_drm.h, must be added to CRIU.

Identifying shared BOs (buffer objects)

During checkpoint, the amdgpu CRIU plugin must identify which buffer objects
across the renderD and /dev/kfd devices in all checkpointed processes
correspond to shared data, so that that sharing relationship can be later
restored. An easy way is to use memory’s GEM handle. Because all the
checkpointing happens in a single CRIU process, shared memory will always share
a GEM handle. Gem handles are intended to identify memory and require no
additional code to use.

Extracting the gem handles requires use of the libdrm function
amdgpu_device_get_fd, which was added in version libdrm-2.4.109 of libdrm in
November 2021. This introduces a dependency of CRIU on sufficiently new libdrm.

Restoring shared BOs - The Mechanism

During restore, CRIU must recreate the shared memory relationships that existed
before checkpoint. The plan is to mimic the means by which the memory was
shared in the first place. During checkpoint, one of the processes which holds
a shared piece of memory will be designated the exporter and the rest
importers. The exporter process will get a dmabuf_fd from the driver and
transfer it to the importer processes. Those processes will send the dmabuf_fd
to the driver, where it will be imported. This approach minimizes kernel
changes.

The renderD dmabuf import code cannot be reused as it is. When a dmabuf_fd is
imported normally, it is assigned a gem handle arbitrarily. However, the
checkpointed process uses the gem handle to identify this data, so the
restored memory must have the same gem handle as before. This necessitates
adding two new drm interfaces for creating and importing buffer objects with
a specified gem handle.

Transferring dmabuf_fds between CRIU processes requires a socket.

Each process creates its own socket and registers it to a name containing its
pid. When a process exports a dmabuf_fd, it sends it to the sockets of all
other processes. Before a process begins restoring a device file, it checks its
socket for incoming messages. In the current version of the patchset, this is
done in core CRIU, parallel to the existing socket for exchanging file
descriptors for shared files. It could be done within the amdgpu plugin
instead, but this would require an additional plugin callback to ensure that
all sockets are created before any plugin files are restored.

This new socket is created within the amdgpu CRIU plugin. Each process creates
its own socket and registers it to a name containing its pid. When a process
exports a dmabuf_fd, it sends it to the sockets of all other processes. Before
a process begins restoring a device file, it checks its socket for incoming
messages.

The sockets themselves and the dmabuf_fds received over sockets both must be
assigned fds. These fds cannot conflict with each other or with fds of other
restored files. We intend to alter find_unused_fd_pid to allow the allocation
of multiple regions of unused fds; this work is not yet complete.

Restoring shared BOs - The Logic

CRIU already has a mechanism for handling files that have requirements to be
restored. Files that can't fully restore on the first try can return 1,
indicating they are incomplete and will need to be retried. We extend this
mechanism to plugin files to handle the shared bo case. Any device file that
needs to restore imported bos will retry if the corresponding exports haven't
happened yet.

Another option would be to do the plugin restore in two phases, with all the
exported bos in the first phase and all the imports in the second phase. That
would be more efficient but would require more core CRIU changes, in particular
additional synchronization between restoring processes.

To support this, files-ext.c is modified to allow plugin files to retry.

TODO

  1. Backwards compatibility

It is important to us that older versions of CRIU continue to work with new
kernels with the changes. The existing amdkfd CRIU ioctl UAPI must be extended
in a way that does not break the ABI.

  1. Checkpoint and restore of dmabuf fds themselves

The usual approach to IPC taken by AMD libraries is to export a dmabuf,
immediately send it to another process, and then close it. Therefore, it is
unlikely that a process will have a dmabuf_fd open during checkpoint. We
would still like to handle this case, however. Work to do so is not done.

This would require handling of dmabuf file descriptors in the amdgpu CRIU
plugin. It woudl support only dmabuf fds that were exported by amdgpu. For
each such dmabuf fd, it would have to store an identifier of the underlying
amdgpu BO that can be used during restore to recreate the dmabuf fd from the
restored BO.

The CRIU patch can also be found at
https://github.com/fdavid-amd/criu/tree/dmabuf-post


CRIU mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/criu


[CRIU] [PATCH] Support dmabuf with amdgpu

This patch, in combination with an accompanying kernel patch, adds support for amdgpu dmabuf IPC with CRIU.

It includes

  • Updates to the amdgpu_drm.h header file to match the kernel.
  • Inclusion of the kfd_ioctl. h header file to allow the plugin to call the new amdgpu CRIU ioctl.
  • Plugin file restore can now retry; new files_ext.c code to support this case.
  • amdgpu plugin now checks for shared bos during checkpoint by finding the gem handle for each bo; introduces dependency on 3-year-old libdrm change.
  • Unpause step is now its own callback (DUMP_DEVICE_LATE) instead of triggered by counting drm files.
  • amdgpu plugin restores bos by designating one process as the exporter; that process will acquire a dmabuf_fd for that bo and send it to the other processes to be imported.
  • amdgpu plugin tracks which bos it has restored; it will signal a retry if it needs to restore an imported bo but the corresponding export bo has not been restored.
  • New service_fd for transferring dmabuf fds. There is a function that plugins can call to send out a dmabuf fd, and a callback that will notify plugins that a dmabuf fd has been received over the socket.
  • New mechanism for finding fds to dup the received dmabuf fds to that won't conflict with other restore fds. Would like to unify this with find_unused_fd_pid.

This patch, in combination with an accompanying kernel patch,
adds support for amdgpu dmabuf IPC with CRIU.

It includes
- Updates to the amdgpu_drm.h header file to match the kernel.
- Inclusion of the kfd_ioctl. h header file to allow the plugin
	to call the new amdgpu CRIU ioctl.
- Plugin file restore can now retry; new files_ext.c code to
	support this case.
- amdgpu plugin now checks for shared bos during checkpoint
	by finding the gem handle for each bo; introduces
	dependency on 3-year-old libdrm change.
- Unpause step is now its own callback (DUMP_DEVICE_LATE) instead
	of triggered by counting drm files.
- amdgpu plugin restores bos by designating one process as the
	exporter; that process will acquire a dmabuf_fd for that
	bo and send it to the other processes to be imported.
- amdgpu plugin tracks which bos it has restored; it will signal
	a retry if it needs to restore an imported bo but
	the corresponding export bo has not been restored.
- New service_fd for transferring dmabuf fds. There is a function
	that plugins can call to send out a dmabuf fd, and a
	callback that will notify plugins that a dmabuf fd
	has been received over the socket.
- New mechanism for finding fds to dup the received dmabuf fds to
	that won't conflict with other restore fds. Would like
	to unify this with find_unused_fd_pid.

Signed-off-by: David Francis <[email protected]>
static void dmabuf_socket_name_gen(struct sockaddr_un *addr, int *len, int pid)
{
addr->sun_family = AF_UNIX;
snprintf(addr->sun_path, UNIX_PATH_MAX, "x/crtools-dmabuf-%d-%" PRIx64, pid, criu_run_id);

Check failure

Code scanning / CodeQL

Wrong type of arguments to formatting function High

This format specifier for type 'unsigned long' does not match the argument type 'char *'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants