Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llava BF16 and FP16 inference accuracy got out of memory #1277

Open
Tracked by #1223
mengfei25 opened this issue Jan 10, 2025 · 0 comments
Open
Tracked by #1223

Llava BF16 and FP16 inference accuracy got out of memory #1277

mengfei25 opened this issue Jan 10, 2025 · 0 comments

Comments

@mengfei25
Copy link
Contributor

🐛 Describe the bug

Ever pass in July 2024

python benchmarks/dynamo/torchbench.py --accuracy --bfloat16 -d xpu -n10 --inference --only llava --backend=inductor

xpu  eval  llava                              
Traceback (most recent call last):
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 4886, in run
    ) = runner.load_model(
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 372, in load_model
    self.validate_model(model, example_inputs)
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2747, in validate_model
    model = self.deepcopy_model(model)
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2707, in deepcopy_model
    return copy.deepcopy(model)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 153, in deepcopy
    y = copier(memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/parameter.py", line 68, in __deepcopy__
    self.data.clone(memory_format=torch.preserve_format), self.requires_grad
torch.OutOfMemoryError: XPU out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacity of 48.00 GiB. Of the allocated memory 47.92 GiB is allocated by PyTorch, and 12.48 MiB is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.

eager_fail_to_run


Versions

Envirnoments:
Device: PVC 1100
torch-xpu-ops: 18bcd9a
python: 3.10
TRITON_COMMIT_ID: e98b6fcb8df5b44eb0d0addb6767c573d37ba024
TORCH_COMMIT_ID: b9fbd65dfd5e703bacbc6c25258d1215108b4faf
TORCHBENCH_COMMIT_ID: 766a5e3a189384659fd35a68c3b17b88c761aaac
TORCHVISION_COMMIT_ID: d23a6e1664d20707c11781299611436e1f0c104f
TORCHAUDIO_COMMIT_ID: b6d4675c7aedc53ba04f3f55786aac1de32be6b4
DRIVER_VERSION: 1.23.10.49.231129.50 (803.61)
KERNEL_VERSION: 5.15.0-73-generic #80 SMP Mon May 15 15:18:26 UTC 2023
BUNDLE_VERSION: 2025.0.1.20241113 (DL-Essential 2025.0.1)
OS_PRETTY_NAME: Ubuntu 22.04.2 LTS
GCC_VERSION: 11

github-merge-queue bot pushed a commit that referenced this issue Jan 16, 2025
Last reference updated is 20240709
Related issues: 

- [x] #1216
- [x] #1217
- [x] #1219
- [x] #1220
- [ ] #1221
- [x] #1222
- [ ] #1256
- [ ] #1260
- [ ] #1261
- [ ] #1262
- [ ] #1263
- [ ] #1264
- [ ] #1273
- [ ] #1274
- [ ] #1275
- [ ] #1276
- [ ] #1277
- [ ] #1278
- [ ] #508
- [ ] #509
- [ ] #510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant