You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/usr/local/lib/python3.10/dist-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
{'sys.platform': 'linux', 'Python': '3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0]', 'CUDA available': True, 'GPU 0,1,2,3,4,5,6,7': 'NVIDIA H20', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Cuda compilation tools, release 12.1, V12.1.105', 'GCC': 'x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0', 'PyTorch': '2.1.2+cu121', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX512\n - CUDA Runtime 12.1\n - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n - CuDNN 8.9.2\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n', 'TorchVision': '0.16.2+cu121', 'OpenCV': '4.8.1', 'MMCV': '1.7.2', 'MMCV Compiler': 'GCC 9.3', 'MMCV CUDA Compiler': '12.1'}
Reproduces the problem - code sample
the problem is on loss feedback progress.
Reproduces the problem - command or script
run the Sparse4D code with only one GPU
Reproduces the problem - error message
"
File "/usr/local/lib/python3.10/dist-packages/mmdet/models/losses/focal_loss.py", line 233, in forward
loss_cls = self.loss_weight * calculate_loss_func(
File "/usr/local/lib/python3.10/dist-packages/mmdet/models/losses/focal_loss.py", line 139, in sigmoid_focal_loss
loss = _sigmoid_focal_loss(pred.contiguous(), target.contiguous(), gamma,
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/mmcv/ops/focal_loss.py", line 59, in forward
ext_module.sigmoid_focal_loss_forward(
RuntimeError: CUDA error: no kernel image is available for execution on the device
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
"
Additional information
i tried use another focal loss replace mmcv/ops/focal loss , and it is work.
The text was updated successfully, but these errors were encountered:
---- Replied Message ----
| From | ***@***.***> |
| Date | 01/09/2025 22:41 |
| To | open-mmlab/mmcv ***@***.***> |
| Cc | Jenny0420 ***@***.***>,
Author ***@***.***> |
| Subject | Re: [open-mmlab/mmcv] [Bug] mmcv-full 1.7.2 with H20GPU and torch 2.1.X Focal loss error (Issue #3221) |
想问下8卡的H20训练效率怎么样,相较于A100性能怎么样
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
Prerequisite
Environment
/usr/local/lib/python3.10/dist-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
{'sys.platform': 'linux', 'Python': '3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0]', 'CUDA available': True, 'GPU 0,1,2,3,4,5,6,7': 'NVIDIA H20', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Cuda compilation tools, release 12.1, V12.1.105', 'GCC': 'x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0', 'PyTorch': '2.1.2+cu121', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX512\n - CUDA Runtime 12.1\n - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n - CuDNN 8.9.2\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n', 'TorchVision': '0.16.2+cu121', 'OpenCV': '4.8.1', 'MMCV': '1.7.2', 'MMCV Compiler': 'GCC 9.3', 'MMCV CUDA Compiler': '12.1'}
Reproduces the problem - code sample
the problem is on loss feedback progress.
Reproduces the problem - command or script
run the Sparse4D code with only one GPU
Reproduces the problem - error message
"
File "/usr/local/lib/python3.10/dist-packages/mmdet/models/losses/focal_loss.py", line 233, in forward
loss_cls = self.loss_weight * calculate_loss_func(
File "/usr/local/lib/python3.10/dist-packages/mmdet/models/losses/focal_loss.py", line 139, in sigmoid_focal_loss
loss = _sigmoid_focal_loss(pred.contiguous(), target.contiguous(), gamma,
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/mmcv/ops/focal_loss.py", line 59, in forward
ext_module.sigmoid_focal_loss_forward(
RuntimeError: CUDA error: no kernel image is available for execution on the device
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions."
Additional information
i tried use another focal loss replace mmcv/ops/focal loss , and it is work.
The text was updated successfully, but these errors were encountered: