-
Notifications
You must be signed in to change notification settings - Fork 0
Home
HD Instant Replay Performance Benchmarking
- 2x Opteron 4180 2.6GHz
- 16GB RAM (8GB per CPU).
- Asus KCMA-D8 motherboard.
- Testing with complete sports intro (1,978 frames) at 1920x1080.
- Sending RGB data as YUV data (because FFmpeg is stupid). UYVY (4:2:2) -> YUV (4:4:4) conversion bypassed. Used libjpeg_turbo for testing.
- Single thread: avg 45.0 fps.
- Two threads: (both CPU saturated) 47.0 fps, 45.0 fps
- Eight threads: 38.9, 39.3, 39.3, 39.4, 39.4, 38.8, 39.7, 39.0
- Twelve threads: average 39.1 fps
- Twelve threads: average 37.1 fps. Probably because cores can't "cheat" and share L3 cache data in this test.
- One core was dangerously close to dropping frames, at an average speed of 32 fps.
- The processor is fast. My code is not optimized.
- Average bitrate on a "random sample" of sports footage (i.e. the sports intro): 31,085 Kbps - at a rather low jpeg quality setting. Of course, M-JPEG is VBR...
Using some preliminary openreplay2 code. Conversion from CbYCrY422 packed to YCbCr422 planar, followed by "raw" libjpeg compression. Obtained 57.9 fps from one core on the Opteron box.
Split sports intro into 60-frame pieces. Start all encoding jobs simultaneously and measure runtime ((time tests/mjpeg_422_encode < $i > $i.mjpg) > $i.time 2>&1
). Take maximum time as time to encode the entire sports intro. Result: Entire sports intro encoded to M-JPEG in 3.763 sec. M-JPEG segments were reassembled and the video was viewed to confirm correct encoding. This corresponds to an average M-JPEG encode rate of just over 525 frames per second. This rate is sufficient to encode over sixteen HD cameras simultaneously. Noting amount of CPU time used... average encode rate seems to be about 60 frames per CPU second. (In theory, this means nearly twenty-four cameras could be supported.)
Further, all of this is without any SSE optimization to the unpack routine. That could improve performance; the potential amount of the improvement is as yet unknown.
- libjpeg wants planar image data. An SSE routine should be constructed to convert packed UYVY data to planar YUV 4:2:2 data. This should improve speed (no resampling in libjpeg) and also image quality. See the section in the manual on "raw (downsampled) data".
- Some thought should be given to preview; a good way to do that remains unknown.