Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platform specific largest floating point type #214

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

iyassou
Copy link

@iyassou iyassou commented Jan 20, 2025

Hello 👋 This library helped me with my uni coursework and I wanted to fix two issues I ran into.

Summary

Certain metrics have the torch.float64 type hardcoded but not all backends support it, MPS in particular. The device the metric is meant to reside on is now used to determine the largest available floating-point type: either torch.float64 or torch.float32.

Certain metrics use the deprecated Tensor.scatter_(dim, index, src, reduce="add") which I've replaced with the equivalent non-deprecated Tensor.scatter_add_(dim, index, src).

I've made small fixes along the way in the form of correcting docstring typos, removing duplicated code, and fixing a minor bug in the generated documentation.

Test plan

Environment:

  • OS: macOS Sonoma 14.6.1
  • Machine: M3 MacBook Pro
  • RAM: 48GB
  • Python: 3.10.16 (within a virtual environment)
  • Environment Variables: PYTORCH_ENABLE_MPS_FALLBACK=1, gotten as a recommendation after initially running without

The main change is platform-specific and I don't have all the supported backends at my disposal, but I've tested largest_float with torch.device("cpu") and torch.device("mps") on my machine, and torch.device("xla") and torch.device("cuda") in Colab.

I've otherwise run the entire test suite on my machine and passed all but nine tests.

short-summary

6 failures from tests/metrics/test_synclib.py

SynclibTest::test_sync_dtype_and_shape fails because MPS doesn't support torch.float64. The other 5 failing tests are:

  • test_tensor_sync_states
  • test_tensor_list_sync_states
  • test_tensor_dict_sync_states
  • test_complex_mixed_state_sync
  • test_empty_tensor_list_sync_state

which all yield torch.distributed.elastic.multiprocessing.errors.ChildFailedError errors that trace back to the following error

RuntimeError: ProcessGroupGloo::allgather: invalid tensor type at index 0 (expected TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)), got TensorOptions(dtype=float, device=mps:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))

indicating that CPU and MPS tensors are being compared at some point in the testing process. I'm guessing the tests weren't written with the CPU fallback in mind but haven't investigated any further.

2 failures from tests/metrics/test_toolkit.py

  • MetricToolkitTest::test_metric_sync
  • MetricCollectionToolkitTest::test_metric_collection_sync

both yielding the same error message as the 5 tests from test_synclib.py.

1 failure from tests/metrics/image/test_fid.py

TestFrechetInceptionDistance::test_fid_random_data_default_model fails on an AssertionError:

AssertionError: Scalars are not close!

Expected 4.483039855957031 but got 4.410880088806152.
Absolute difference: 0.0721597671508789 (up to 0.01 allowed)
Relative difference: 0.016096168999031657 (up to 0.01 allowed)

I haven't made any changes to the FrechetInceptionDistance metric and have the image-requirements.txt dependencies installed, so was this test case always failing or have I encountered a platform-specific bug?

Questions

torchaudio requirement

I added the new audio-requirements.txt file in line with the image-requirements.txt naming convention for the torchaudio dependency for using the audio metrics. In general, is there a reliance on these requirements being specified in distinct text files that are preventing them from being specified in the pyproject.toml?

skimage requirement

scikit-image is now installed with pip install scikit-image instead of pip install skimage, so installing the image requirements with pip install -r image-requirements.txt raises an error: can this dependency be renamed without other conflicts?

Union type

When rebasing my fork I reverted changes to 3 files that moved from typing.Union to the newer | union type to annotate function parameters. I noticed a large number of files use the | style, but the project README requires "Python >= 3.8" while the | type was introduced in 3.10, so maybe bump up the required Python version?

Failed to recreate usage example

When fixing docstring typos I tried to recreate the second usage example for the WindowedBinaryAUROC metric.

Expected:

>>> metric = WindowedBinaryAUROC(max_num_samples=5, num_tasks=2)
>>> metric.update(torch.tensor([[0.2, 0.3], [0.5, 0.1]]), torch.tensor([[1.0, 0.0], [0.0, 1.0]]))
>>> metric.update(torch.tensor([[0.8, 0.3], [0.6, 0.1]]), torch.tensor([[1.0, 1.0], [1.0, 0.0]]))
>>> metric.update(torch.tensor([[0.5, 0.1], [0.3, 0.9]]), torch.tensor([[0.0, 1.0], [0.0, 0.0]]))
>>> metric.inputs
tensor([[0.1000, 0.3000, 0.8000, 0.3000, 0.5000],
        [0.9000, 0.1000, 0.6000, 0.1000, 0.3000]])
>>> metric.targets
tensor([[1., 0., 1., 1., 0.],
        [0., 1., 1., 0., 0.]])
>>> metric.compute()
tensor([0.4167, 0.5000])

Actual:

>>> metric = WindowedBinaryAUROC(max_num_samples=5, num_tasks=2)
>>> metric.update(torch.tensor([[0.2, 0.3], [0.5, 0.1]]), torch.tensor([[1.0, 0.0], [0.0, 1.0]]))
>>> metric.update(torch.tensor([[0.8, 0.3], [0.6, 0.1]]), torch.tensor([[1.0, 1.0], [1.0, 0.0]]))
>>> metric.update(torch.tensor([[0.5, 0.1], [0.3, 0.9]]), torch.tensor([[0.0, 1.0], [0.0, 0.0]]))
>>> metric.inputs
tensor([[0.1000, 0.3000, 0.8000, 0.3000, 0.5000],
        [0.9000, 0.1000, 0.6000, 0.1000, 0.3000]])
>>> metric.targets
tensor([[1., 0., 1., 1., 0.],
        [0., 1., 1., 0., 0.]])
>>> metric.compute()
tensor([0.4167, 0.4167], dtype=torch.float64)

I'm not entirely sure why the metric inputs look the way they do anyway so I'm struggling to follow along with the logic: is this a bug?

MPS-specific bug

My system version of Python is high and so not ideal for working on most projects, so I used uv to create a virtual environment with a lower version of Python. I made mistakes setting it up properly and was inadvertently working with torch==2.2.0 (which meets the project requirement of "PyTorch >= 1.11").

When testing the different metrics with the device set to the CPU and then the MPS backend, I encountered a bug with the BLEU score metric. On CPU the example highlighted in the docstring works as expected:

>>> import torch
>>> from torcheval.metrics import BLEUScore
>>> metric = BLEUScore(n_gram=4)
>>> candidates = ["the squirrel is eating the nut", "the cat is on the mat"]
>>> references = [["a squirrel is eating a nut", "the squirrel is eating a tasty nut"], ["there is a cat on the mat", "a cat is on the mat"]]
>>> metric.update(candidates, references)
>>> metric.compute()
tensor(0.65341892)

However the same example when run on the MPS backend yields

>>> metric = BLEUScore(n_gram=4, device=torch.device("mps"))
>>> metric.update(candidates, references)                                                                                        
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/iyassou/Desktop/torcheval/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/iyassou/Desktop/torcheval/torcheval/metrics/text/bleu.py", line 100, in update
    ) = _bleu_score_update(input, target, self.n_gram, self.device)
  File "/Users/iyassou/Desktop/torcheval/torcheval/metrics/functional/text/bleu.py", line 112, in _bleu_score_update
    raise ValueError(
ValueError: the input is too short to find all n-gram matches with n_gram=4

I tracked the issue down to lines 104-109 in torcheval/metrics/functional/text/bleu.py:

        for ngram in overlap:
            matches_by_order[len(ngram) - 1] += overlap[ngram]

        for i in range(n_gram):
            if len_candidate - i > 0:
                possible_matches_by_order[i] += len_candidate - i

The MPS implementation of in-place addition was buggy and the tensors matches_by_order and possible_matches_by_order would only be modified at their zero-th index and nowhere else. Replacing the above with:

        for ngram in overlap:
            matches_by_order[len(ngram) - 1] = matches_by_order[len(ngram) - 1] + overlap[ngram]

        for i in range(n_gram):
            if len_candidate - i > 0:
                possible_matches_by_order[i] = possible_matches_by_order[i] + len_candidate - i

fixed the issue

>>> metric = BLEUScore(n_gram=4, device=torch.device("mps"))
>>> metric.update(candidates, references)
<torcheval.metrics.text.bleu.BLEUScore object at 0x1031fcfa0>
>>> metric.compute()
tensor(0.6534, device='mps:0')

I didn't think this fix was worth the non-Pythonic code, so I left the in-place addition untouched and upgraded my version of PyTorch. Although I couldn't track down the exact commit that fixed it, the bug appears to have been resolved in torch==2.4.0 at the earliest. Is this worth adding a platform-specific PyTorch version dependency in case macOS users want to use the MPS backend for the BLEU score metric?

PyTorch raises a `DeprecationWarning` when you call `Tensor.scatter_(dim, index, src, reduce="add")`. It's been replaced by the equivalent and non-deprecated call to `Tensor.scatter_add_(dim, index, src)`.
Added docstring links to objects, separate requirements file for developing audio metrics, updated documentation reST files, and fixed some typos.
The empty case now returns a tensor that's on the same device as the metric.
The function to do so was defined but never called.
Staying consistent with the return types expected by the text metrics suite, as seen in `test_word_information_lost.py` and `test_word_information_preserved.py`.
I accepted the wrong changes when rebasing my fork. I've reinstated the old style of typing in the modified functions because the project README requires "Python >= 3.8" and the `|` Union type was introduced in 3.10.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants