Samplers: add support for edge-case policies #241

NicolasHug · 2024-10-04T15:56:11Z

This PR addresses the first big item in #239, namely: support a "policy" to apply when the sampler tries to sample a frame beyond the end of the video. We don't automatically error anymore, which leads to more flexibility. As discussed offline, the available policies are "repeat_last", "wrap", and "error".

Important points:

I factorized the samplers into this _abtract_sampler() helper function. The only difference between the 2 samplers is the call to torch.randint vs torch.linspace, and I wanted to make that clear.
The different policies are implemented as well-defined Python function. This leaves the door open for us to let the user define their own policies by providing a function, should we want to in the future.
The crux of this PR is the _build_all_clips_indices() function. It's simple, but it does a lot of things. Look at it carefully.
The other important part is the _decode_all_clips_indices() function, which we plan to re-write in a much more efficient way, with a C++ core.

Note: This PR is already complex enough, so I have not implemented the "try to reduce num_indices_between_frames" strategy that I originally suggested #221 (comment). This is orthogonal to the policies implemented here, so can revisit later if we want it.

…ge_case

NicolasHug · 2024-10-07T10:55:16Z

src/torchcodec/samplers/_implem.py

+    # Important note: sampling_range_end defines the upper bound of where a clip
+    # can *start*, not where a clip can end.


This is fairly important, and something we haven't made explicit yet in our discussions. The reason it has to be defined as the upper bound of where a clip can start is because if sampling_range_end defined the upper bound of where a clip can end, then the last few frames in that sampling range will always have a lower probability of being sampled. I.e. we would always be in this scenario that this comment pointed out #221 (comment).

If sampling_range_end defines the upper bound of where a clip can end, then technically, there really there is no need for a policy. Hope that makes sense.

Almost entirely agreed, but the wrinkle is that even if we defined sampling_range_end to be the upper bound of where the last clip and end, we'd still need a policy unless we also defined len(decoder) as the default last possible frame for the end of a clip. If we did that for both, we'd also ensure only whole clips, and then we would not need a policy.

Agreed that this is the right approach. We'll need to document carefully. :) I also think there's no way around needing to document carefully; whatever behavior we choose will end up with subtle behavior.

I was trying to prove to myself that that comment about probabilities made sense so I wrote a simple simulator:

It seems like the distribution is symmetric in the sense that even the start few frames have smaller chance of being selected.

I think that is because when you pick frame X you pick frame X + 1 as well, but you never pick frame -1 for example. So frames at the start also have lower chance. Note that this with with 0 dilation so that may change up things too.

I don't know if it breaks your assumptions but you can think about this.

It may make sense to have a "slow" test with a simulator to see if the fair policy is within some threshold of an "even" distribution.

This comment isn't blocking or anything. It was mostly for my own curiosity since we don't have a closed-form mathematical expression for the probability of being selected.

Ah, it's true that the first frames are also less likely to be sampled. Oh boy. I don't think we can or should do anything about that side though?

You could make your code symmetric to pick those "negative frames" and then repeat them in the other direction (i.e. repeat the first frame) with your policy code. That will make it symmetric.

Again, not blocking, but worth a consideration if you want to make it symmetric and fair.

It's kind of like how Conv1D adds padding to both sides:

https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html

You would have to do something symmetric like repeating frames on the left and right boundaries to make it truly fair. Not sure if it's worth the effort.

It may make sense to have a "slow" test with a simulator to see if the fair policy is within some threshold of an "even" distribution.

I did write a test to illustrate the difference of behavior depending on the value of sampling_range_end.

Writing an actual statistical test is challenging because in fact when sampling_range_end=len(decoder), the last frames tend to be over-represented (because they are repeated). I.e. the frames still don't follow a uniform probability, there's no distribution to check against.

Thinking about it a bit more, if you want frames to all be totally equal in terms of output probability, you will have to pick first first and last few frames with higher probability. It's not as simple as repeating the last frame, etc. because the second last frame also needs to be weighted higher, etc.

And that's why your test doesn't follow a uniform distribution.

For now this implementation is easy and simple. Later on if there is a need, we can implement something complex and "fair".

Yeah, enforcing a uniform distribution across all frames is pretty hard.
That being said, we can already enforce a uniform distribution for clip starts: all the user has to do is to set sampling_range_end=len(decoder). With that, all the frames have an equal probability of being picked to start a clip.

scotts · 2024-10-07T13:58:53Z

src/torchcodec/samplers/_implem.py

+
+    return [to_framebatch(clip) for clip in all_clips]
+
+


Nit: I think a better name is _generic_sampler(). I tend to think of things called "abstract" as defining desired behavior without an actual implementation. I think of things that are called "generic" as having an implementation that is generic to one or several parameters.

scotts · 2024-10-07T14:24:50Z

src/torchcodec/samplers/_implem.py

    if sampling_range_end is None:
-        sampling_range_end = num_frames - clip_span + 1
+        sampling_range_end = max(num_frames_in_video - clip_span + 1, 1)


Based on the discussion below, shouldn't this just be num_frames_in_video? That is, the default last index at which a clip can start is the final frame in the video. And if we're worried that num_frames_in_video is not at least 1, maybe we should explicitly check for (and error?) on that?

Even if we decide line 56 is correct, I can't reconcile it and 67. There, it looks like we're making sure that sampling_range_end is not larger than the video itself - which makes sense to me. But there, we don't subtract off the size of a clip, but here at line 56, we do.

I think the lack of documentation is making this significantly harder to review, sorry about that.

So, my reasoning is the following:

what happens at line 56 is the default behavior. In line 67, this isn't the default behavior anymore, sampling_range_end is passed by the user

the default behavior (56) is: "automatically set the upper bound of where a clip can start to a value such that we never try to sample beyond the last frame." That's why we have to deduct clip_span from num_frames_in_video.

the reason this is the default behavior is because there is no way for users to explicitly set such value, unless they do the math of computing the clip_span (and we don't want them to do that I think?).

Now, at line 67 when the user explicitly passes sampling_range_end, we just make sure it doesn't go above num_frames_in_video to avoid errors on short videos. If we were to substract clip_span there, it would change the semantic of sampling_range_end from "upper bound of the start" to "upper bound of the end"

Basically:

default behavior means that the last few frames are less likely to be sampled, and policy doesn't come into play, but there are no "degenerate" clips (degenerate = policy had to be applied)

if user explicitly passes a value, then last few frames probability may increase (depending on the value), and policies become relevant. Clips may be degenrate.

Yes, that all tracks. I think that is consistent with what I surmised below, when I read the test. I do have a slight preference for what I explained, but it's only a slight preference and not blocking.

SG, thanks. I don't have a super strong preference either. Let me merge now to unblock, but happy to revisit.

NicolasHug · 2024-10-07T14:25:46Z

src/torchcodec/samplers/_implem.py

+    # TODO:
+    # - sort the frames to avoid backward seeks, dedup, decode, and re-organize frames.
+    # - write most of this in C++


If you want a peek of what this will look like: 3dcbe1e (#245)

I'm leaving it for a follow-up PR to ease reviewing. Basic benchmark shows 5X speedups.

scotts · 2024-10-07T14:26:18Z

src/torchcodec/samplers/_implem.py

+
+def _error_policy(frames_indices: list[int], num_frames_per_clip: int) -> list[int]:
+    raise ValueError(
+        "You set the 'error' policy, and the sampler tried the decode a frame "


"tried the decode" -> "tried to decode"

scotts · 2024-10-07T14:30:58Z

src/torchcodec/samplers/_implem.py

@@ -74,130 +88,211 @@ def _get_clip_span(*, num_indices_between_frames, num_frames_per_clip):
    return num_indices_between_frames * (num_frames_per_clip - 1) + 1


-def clips_at_random_indices(
+def _repeat_last_policy(


I ❤️ the elegance of these policies.

scotts · 2024-10-07T14:47:42Z

src/torchcodec/samplers/_implem.py

+        num_indices_between_frames=num_indices_between_frames,
+        num_frames_per_clip=num_frames_per_clip,
+    )
+


The fact that this loop is so easy to implement - we can just call Python's range() directly, and how we're using our policy is obvious and still generic - makes me more confident we're doing the right thing.

scotts · 2024-10-07T15:42:07Z

test/samplers/test_samplers.py

@@ -131,6 +134,80 @@ def test_sampling_range_negative(sampler):
        assert_tensor_equal(clip.data, clips_1[0].data)


+@pytest.mark.parametrize("sampler", (clips_at_random_indices, clips_at_regular_indices))
+def test_sampling_range_default_behavior(sampler):


Okay, based on this comment, I think that explains the behavior I was confused about in my comment above.

This means that effectively, the default values for the sampler are:

sampling_range_start = 0, sampling_range_end = max(len(decoder) - clip_span + 1, 1),

I do think it may be simpler to just make them:

sampling_range_start = 0, sampling_range_end = len(decoder),

And then let the policy deal with the case that we start a clip in the range [len(encoder) - clip_size, len(encoder)). (I may be off by 1 somewhere.) I don't have a strong preference for this, but I think it will make our code more straight-forward, and I think it's easier to explain the behavior. I'm unsure if it's actually a better result for the user, though.

test/samplers/test_samplers.py

scotts · 2024-10-07T15:47:16Z

test/samplers/test_samplers.py

+    # on the default behavior.
+    # Note: ideally we would directly assert the number of occurrences of the
+    # last frame based on its index, but our APIs don't return indices, only pts
+    # (yet?)


Yes, we should do that. Created #246.

NicolasHug added 12 commits September 24, 2024 03:49

Add Random index-based clip sampler

c090e44

Merge branch 'main' of github.com:pytorch/torchcodec into samplerz

6b52cfd

Basic linear sampler

c618160

Merge branch 'main' of github.com:pytorch/torchcodec into samplerzz

b5545a9

Add tests

2c5a559

Handle edge case when num_clips is larger than available sampling range

ce46196

WIP

5fed662

Address comments

1873174

Merge branch 'samplerzz' into samplers_edge_case

e86e017

Merge branch 'main' of github.com:pytorch/torchcodec into samplers_ed…

63462e9

…ge_case

Samplers: add support for edge-case policies

a97afe7

Refactor + comments

83c6763

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 4, 2024

Add tests

71a839a

NicolasHug marked this pull request as ready for review October 7, 2024 10:41

NicolasHug commented Oct 7, 2024

View reviewed changes

NicolasHug added 2 commits October 7, 2024 05:07

Minor clip_span refactoring

a333e9b

Don't add defaults to private abstract sampler

08833b0

scotts reviewed Oct 7, 2024

View reviewed changes

NicolasHug mentioned this pull request Oct 7, 2024

Speed-up samplers by avoiding backwards seeks #245

Merged

scotts reviewed Oct 7, 2024

View reviewed changes

NicolasHug commented Oct 7, 2024

View reviewed changes

scotts reviewed Oct 7, 2024

View reviewed changes

NicolasHug added 4 commits October 7, 2024 07:50

abstract -> generic

054b72e

typo fix

2bb1d58

Typo

9b93214

Remove unnecessary call to torch.manual_seed

d706e72

scotts reviewed Oct 7, 2024

View reviewed changes

NicolasHug commented Oct 7, 2024

View reviewed changes

test/samplers/test_samplers.py Outdated Show resolved Hide resolved

scotts reviewed Oct 7, 2024

View reviewed changes

scotts approved these changes Oct 7, 2024

View reviewed changes

Simplify test

dd62d32

NicolasHug merged commit 5a69001 into main Oct 7, 2024
17 checks passed

NicolasHug deleted the samplers_edge_case branch October 7, 2024 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samplers: add support for edge-case policies #241

Samplers: add support for edge-case policies #241

NicolasHug commented Oct 4, 2024 •

edited

Loading

NicolasHug Oct 7, 2024 •

edited

Loading

scotts Oct 7, 2024 •

edited

Loading

ahmadsharif1 Oct 7, 2024 •

edited

Loading

ahmadsharif1 Oct 7, 2024

NicolasHug Oct 7, 2024

ahmadsharif1 Oct 7, 2024

ahmadsharif1 Oct 7, 2024 •

edited

Loading

NicolasHug Oct 7, 2024 •

edited

Loading

ahmadsharif1 Oct 7, 2024

NicolasHug Oct 7, 2024

scotts Oct 7, 2024 •

edited

Loading

scotts Oct 7, 2024

NicolasHug Oct 7, 2024 •

edited

Loading

scotts Oct 7, 2024 •

edited

Loading

NicolasHug Oct 7, 2024

NicolasHug Oct 7, 2024 •

edited

Loading

scotts Oct 7, 2024

scotts Oct 7, 2024

scotts Oct 7, 2024

scotts Oct 7, 2024

scotts Oct 7, 2024

		# Important note: sampling_range_end defines the upper bound of where a clip
		# can start, not where a clip can end.

Samplers: add support for edge-case policies #241

Samplers: add support for edge-case policies #241

Conversation

NicolasHug commented Oct 4, 2024 • edited Loading

NicolasHug Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

scotts Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

ahmadsharif1 Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahmadsharif1 Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

NicolasHug Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scotts Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

scotts Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Oct 4, 2024 •

edited

Loading

NicolasHug Oct 7, 2024 •

edited

Loading

scotts Oct 7, 2024 •

edited

Loading

ahmadsharif1 Oct 7, 2024 •

edited

Loading

ahmadsharif1 Oct 7, 2024 •

edited

Loading

NicolasHug Oct 7, 2024 •

edited

Loading

scotts Oct 7, 2024 •

edited

Loading

NicolasHug Oct 7, 2024 •

edited

Loading

scotts Oct 7, 2024 •

edited

Loading

NicolasHug Oct 7, 2024 •

edited

Loading