New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Speed-up samplers by avoiding backwards seeks #245

Merged

NicolasHug merged 29 commits into main from samplers_fast

Oct 7, 2024

Member

NicolasHug commented Oct 7, 2024 •

edited

Loading

We now first sort and dedup the frame indices to be decoded within the sampler (see code for details). We can still implement this in C++ to optimize this further but this already leads to strong speed-ups.

Benchmark results - TL;DR: 5X faster when num_clips is large.

Using the following values:
sampler = clips_at_random_indices
num_frames_per_clip = 10
num_indices_between_frames = 2


num_clips = 1
when num_clips=1 there should be no speed-up.
We just need to make sure the new logic didn't add any overhead
main: med = 92.16ms +- 9.36
PR:   med = 89.27ms +- 7.90


num_clips = 50
With num_clips = 50 there are potentially a lot of overlap and backwards-seeks.
We expect significant speed-ups.
main: med = 1527.26ms +- 839.48
PR:   med = 331.37ms +- 170.58

Benchmark code:

from torchcodec.decoders import VideoDecoder
from torchcodec.samplers import clips_at_random_indices
import torch
from time import perf_counter_ns

def bench(f, *args, num_exp=100, warmup=0, **kwargs):

    for _ in range(warmup):
        f(*args, **kwargs)

    times = []
    for _ in range(num_exp):
        start = perf_counter_ns()
        f(*args, **kwargs)
        end = perf_counter_ns()
        times.append(end - start)
    return torch.tensor(times).float()

def report_stats(times, unit="ms"):
    mul = {
        "ns": 1,
        "µs": 1e-3,
        "ms": 1e-6,
        "s": 1e-9,
    }[unit]
    times = times * mul
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}{unit} +- {std:.2f}")
    return med


def sample():
    decoder = VideoDecoder("test/resources/nasa_13013.mp4")
    clips_at_random_indices(
        decoder,
        num_clips=1,
        num_frames_per_clip=10,
        num_indices_between_frames=2,
    )

times = bench(sample, num_exp=30, warmup=2)
report_stats(times, unit="ms")

NicolasHug added 16 commits

September 24, 2024 03:49


          Add Random index-based clip sampler

c090e44


          Merge branch 'main' of github.com:pytorch/torchcodec into samplerz

6b52cfd


          Basic linear sampler

c618160


          Merge branch 'main' of github.com:pytorch/torchcodec into samplerzz

b5545a9


          Add tests

2c5a559


          Handle edge case when num_clips is larger than available sampling range

ce46196

WIP

5fed662


          Address comments


          Merge branch 'samplerzz' into samplers_edge_case

e86e017


          Merge branch 'main' of github.com:pytorch/torchcodec into samplers_ed…

63462e9

…ge_case


          Samplers: add support for edge-case policies

a97afe7


          Refactor + comments

83c6763


          Add tests

71a839a


          Minor clip_span refactoring

a333e9b


          Don't add defaults to private abstract sampler

08833b0


          Speed-up samplers by avoiding backwards seeks

3dcbe1e

facebook-github-bot added the CLA Signed label

NicolasHug added 8 commits

October 7, 2024 07:50


          abstract -> generic

054b72e


          typo fix

2bb1d58


          Typo

9b93214


          Merge branch 'samplers_edge_case' into samplers_fast

7a500de


          Comment

75945a8


          Comment

0c5c537


          Merge branch 'main' of github.com:pytorch/torchcodec into samplers_fast

5585b46


          Merge branch 'main' of github.com:pytorch/torchcodec into samplers_fast

523dff9

NicolasHug marked this pull request as ready for review

October 7, 2024 16:15


          Fix merge

NicolasHug commented

View reviewed changes

src/torchcodec/samplers/_implem.py Show resolved Hide resolved

NicolasHug commented

View reviewed changes

src/torchcodec/samplers/_implem.py

+                          decoded_frame = decoder.get_frame_at(index=frame_index)
+                      previous_decoded_frame = decoded_frame
+                      all_decoded_frames[j] = decoded_frame
                   all_clips: list[list[Frame]] = chunk_list(
                       all_decoded_frames, chunk_size=num_frames_per_clip
                   )

Member Author

NicolasHug Oct 7, 2024

Note that we don't have to chunk the clips. The implementation already allows us to return a single 5D FrameBatch instead of a list[4D FrameBatch]. I'll just leave this for another PR so we can discuss.

ahmadsharif1 approved these changes

View reviewed changes

src/torchcodec/samplers/_implem.py

+                          and frame_index == all_clips_indices_sorted[i - 1]
+                      ):
+                          # Avoid decoding the same frame twice.
+                          decoded_frame = previous_decoded_frame

Contributor

ahmadsharif1 Oct 7, 2024

This is setting it to the same python object, right?

Will there be any issues with that? Example, if the user modifies that tensor or something else in FrameBatch -- they will modify both entries in the list, right?

src/torchcodec/samplers/_implem.py Outdated

-                      decoder.get_frame_at(index) for index in all_clips_indices
-                  ]
+                  all_clips_indices_sorted, argsort = zip(
+                      *sorted((j, i) for (i, j) in enumerate(all_clips_indices))

Contributor

ahmadsharif1 Oct 7, 2024

Nit: i, j makes both look like indexes in the same range. Maybe call them batch_index, frame_index?

NicolasHug commented

View reviewed changes

src/torchcodec/samplers/_implem.py

+                          and frame_index == all_clips_indices_sorted[i - 1]
+                      ):
+                          # Avoid decoding the same frame twice.
+                          decoded_frame = previous_decoded_frame

Member Author

NicolasHug Oct 7, 2024

To be slightly safer w.r.t. future changes this should be

decoded_frame = copy(previous_decoded_frame)

but we don't implement __copy__ on Frame.

Note that a copy still happens within to_framebatch, so this is currently safe, but admittedly subject to an implementation detail that will change.

We can either:

be OK with this since we'll re-implement it in C++ anyway
implement __copy__.

LMK.

Contributor

ahmadsharif1 Oct 7, 2024

I am OK with this as-is.

Member Author

NicolasHug Oct 7, 2024

OK, I'll add a comment in to_framebatch() so we don't accidentally mess it up.

Contributor

ahmadsharif1 commented Oct 7, 2024

optionally you could also check in the benchmark code because we probably want to track the performance of this and make sure it doesn't regress.

NicolasHug added 4 commits

October 7, 2024 09:39


          slightly better index name

6e18e07


          Add note

6378a34


          Add sampler benchmarking code

9ae87cc


          Revert "Add sampler benchmarking code"

5eba3e4

This reverts commit 9ae87cc.

Member Author

NicolasHug commented Oct 7, 2024

Sounds good, let me do that in another PR. I tried doing it here but this created a lot of undesirable changes since we already have a benchmark_samplers.py file, which I think we should remove.

NicolasHug merged commit b65882e into main

22 checks passed

NicolasHug deleted the samplers_fast branch

October 7, 2024 17:25

NicolasHug mentioned this pull request

New samplers benchmark #248

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels