-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Unable to map memory regions to virtual address space #287
Comments
Experiencing the same. Operating SystemArch Linux 6.13.0-arch1-1 CPUAMD Ryzen 7 7840HS w/ Radeon 780M Graphics GPURx 7700S, Radeon 780M ROCm Version6.2.41134-0 Output of /opt/rocm/bin/rocminfo --support
|
Hi @daniandtheweb. Internal ticket has been created to investigate this issue. Thanks! |
Hi @daniandtheweb, thanks for reporting the issue. As the error log hints, it is likely caused by the system running out of memory -- the script provided is trying to allocated 32 GB of memory. Would you be able to try changing the following line (L16) to
and see if the issue persists? Thanks! |
Changing the line to 32 fixes the issue, thanks. However the program still doesn't complete correctly.
Here's the systemd-coredump:
Using the second reproducer file that's mentioned in the other issue the program doesn't even manage to unmap the memory.
If this is unrelated to rocr I can close the issue. |
Hi @daniandtheweb, thanks for the update! I am not quite sure the cause of your second error. If I have to guess, it is probably incompatibility between gfx1010 with clr. Unfortunately, we don't have a system at hand where I can reproduce your issue. I only managed to get this on a system with gfx1100 and the latest ROCm6.3.1, where everything seems to work. |
My main system currently runs ROCm 6.1.2. I'll update it to a more recent version and test again. Thanks for the help. |
The code is trying to allocate 32GB of virtual address space, not memory, the amount of physical memory on the card should be immaterial. indeed the same code in on cuda platforms allows one to allocate 32GB of address space even when the card has only 4GB or less of physical memory. |
I can also confirm that the reproducer with 32GB works fine for me on rx6800xt (which ofc also dosent have 32GB of physical memory) and mi100 (which dose) others of our users report that the reproducer dose not work on rx6700xt and also rx7900xtx, The commonality seams to be that users of the failing tests are on consumer platforms (am4/am5/LGA 1700) while i am on server platform (EPYC rome) |
@IMbackK Thanks for the additional info! This is actually a great point. Running the reproducer with HSAKMT_DEBUG_LEVEL=3 shows
which indicates that it is in fact, the host running out of memory, not the device, which could be the reason why the code works on server but not on consumer platforms. The message can be traced down to this line of code https://github.com/ROCm/ROCR-Runtime/blob/amd-staging/libhsakmt/src/memory.c#L188. I am not quite sure if this is the intended behavior, but I will try to look into it. Thanks! |
Regarding the other issue I was having I can confirm it was caused by the old ROCm version, using ROCm 6.2.2 the reproducer works correcly.
|
@daniandtheweb Thanks for the update. Glad it is working! |
ROCR allocating a bunch of ram via malloc on the host when virtual address space is requested is quite strange and makes using vmm to eventually fill the device on machines with more vram than ram impossible. |
@IMbackK This is definitely a valid point. We are currently holding an investigation into the cause of this behavior. I will keep you posted on the progress. Thanks! |
Problem Description
When trying to map a memory region to a virtual address space via hipMemMap rocr reports the gpu to be out of memory and it's unable to continue.
This issue can be reproduced using the same reproducer code that's been included in #285.
Here's the output:
Operating System
Arch Linux, Mainline Kernel
CPU
Intel(R) Core(TM) i7-9700K
GPU
AMD Radeon RX 5700 XT
ROCm Version
ROCm 6.3.0, ROCm 6.1.0
ROCm Component
ROCR-Runtime, clr
Steps to Reproduce
Run the reproducer
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
Additional Information
This issue has been reported here: ggerganov/llama.cpp#11405
The text was updated successfully, but these errors were encountered: