v7 #9

anothermartz · 2023-10-17T00:04:46Z

anothermartz
Oct 17, 2023
Maintainer

v7 is here!

What’s new:

[Faster] Processing runs a bit faster! 📈
[Faster] Particularly the ‘Improved’ quality option which has a knock-on effect for 'Enhanced'. ⏩
[New] ‘Experimental’ quality option that only upscalse the mouth area. 🥼 Except it doesn’t work very well. 👎
[Changed] Ported code over to this repository instead of relying on another repository. 📦
[Removed] Removed redundant code and folders. 🗑️

Speed ups:

I figured out an optimization for my mask creating function - Why was I tracking where the mouth was on every frame when it was already looking at a cropped image of a face? The mouth is basically going to be in the same place on each frame within that crop! So now I only detect the mouth and create the mask on the first frame, from then on it just uses the same mask, saving time! 🚀

So much time actually, that it's almost the same speed as "Fast" - I'll likely just drop that in the next verson!

If you find a clip where the mask isn’t following the subject’s mouth properly, you can revert this optimization by ticking the “mouth_tracking” box.

I also improved the overall processing speed by writing directly to .mp4 instead of .avi first. Strangely, I remember changing it from .mp4 to .avi and noticed a speed increase, but now checking again I see that it’s faster to write to .mp4, which also is more intuitive, so I don’t know what happened before. Well done me for undoing what I previously made worse. 🥇

Experimental quality:

This version intended to introduce a new way of upscaling by only applying gfpgan to the mouth area instead of the whole face, saving time. (suggested here: #8)

However, I discovered that some frames were not being upscaled due to gfpgan not recognizing the cropped image as a mouth. In order to rectify this, I had to increase the size of the crop to include more of the face, to the point where the difference between the larger crop and the full face was negligible. At this size, the increase in gfpgan processing speed was offset so much by the time it took to detect and crop the mouth that it resulted in an overall slower processing time than when I just upscaled the whole face! 😔

I also guessed that processing gfpgan outside the wav2lip bounding box would smooth out the harsh lines typically found on the chin, but unfortunately, that too was a false prediction. 😞

Still, I have left this failed method in as the “Experimental” quality - feel free to try it but personally I think it’s a bust! 💥

Other:

I finally merged the code into this repository and removed a bunch of code and folders that didn’t need to be there, as well as imports.

Theoretically this will increase setup and inference time, but practically speaking it actually doesn’t make a noticeable difference. Still, it’s better than having random unused code lying around.🧹

I also made the video player scale to your video size up to 1280 pixels wide. 🎞️

What’s next?

I intend to make this code possible to run locally, supposedly someone on the discord figured this out already, but I’d like to make it easy. It is after all, Easy-Wav2Lip!

Feel free to leave me some feedback or suggestions here on GitHub or Discord! 💖

This discussion was created from the release v7.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v7 #9

{{title}}

Replies: 0 comments

Select a reply

v7 #9

anothermartz Oct 17, 2023 Maintainer

v7 is here!

What’s new:

Speed ups:

Experimental quality:

Other:

What’s next?

Replies: 0 comments

anothermartz
Oct 17, 2023
Maintainer