-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed-up samplers by avoiding backwards seeks #245
Conversation
decoded_frame = decoder.get_frame_at(index=frame_index) | ||
previous_decoded_frame = decoded_frame | ||
all_decoded_frames[j] = decoded_frame | ||
|
||
all_clips: list[list[Frame]] = chunk_list( | ||
all_decoded_frames, chunk_size=num_frames_per_clip | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we don't have to chunk the clips. The implementation already allows us to return a single 5D FrameBatch
instead of a list[4D FrameBatch]
. I'll just leave this for another PR so we can discuss.
and frame_index == all_clips_indices_sorted[i - 1] | ||
): | ||
# Avoid decoding the same frame twice. | ||
decoded_frame = previous_decoded_frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is setting it to the same python object, right?
Will there be any issues with that? Example, if the user modifies that tensor or something else in FrameBatch -- they will modify both entries in the list, right?
src/torchcodec/samplers/_implem.py
Outdated
decoder.get_frame_at(index) for index in all_clips_indices | ||
] | ||
all_clips_indices_sorted, argsort = zip( | ||
*sorted((j, i) for (i, j) in enumerate(all_clips_indices)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: i, j makes both look like indexes in the same range. Maybe call them batch_index, frame_index?
and frame_index == all_clips_indices_sorted[i - 1] | ||
): | ||
# Avoid decoding the same frame twice. | ||
decoded_frame = previous_decoded_frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be slightly safer w.r.t. future changes this should be
decoded_frame = copy(previous_decoded_frame)
but we don't implement __copy__
on Frame.
Note that a copy still happens within to_framebatch
, so this is currently safe, but admittedly subject to an implementation detail that will change.
We can either:
- be OK with this since we'll re-implement it in C++ anyway
- implement
__copy__
.
LMK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with this as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll add a comment in to_framebatch()
so we don't accidentally mess it up.
optionally you could also check in the benchmark code because we probably want to track the performance of this and make sure it doesn't regress. |
Sounds good, let me do that in another PR. I tried doing it here but this created a lot of undesirable changes since we already have a |
We now first sort and dedup the frame indices to be decoded within the sampler (see code for details). We can still implement this in C++ to optimize this further but this already leads to strong speed-ups.
Benchmark results - TL;DR: 5X faster when num_clips is large.
Benchmark code: