De-interlacing

luc

unread,

Mar 15, 2012, 7:54:27 PM3/15/12

to pixfc-sse

Hi,

Ever thought about integrating de-interlacing algorithms in the
library ? That would be pretty cool (for me at least).
There are some in ffmpeg, but it's a bit a pain to integrate into an
application.
Just a thought...

Luc

Pix FC

unread,

Mar 15, 2012, 11:40:16 PM3/15/12

to pixf...@googlegroups.com

Hi Luc,
Yes I have actually, but I thought I d complete the set of supported pixel formats / supported CPUs before launching into something else.
I am currently looking into supporting 10-bit RGB and 10-bit YUV as well as adding support for NEON instructions on ARM cpus (they be ever more present now, with all the mobile phones, tablets and other beagle boards using them ...). And also, I have to admit I am not too familiar with de-interlacing algorithms. do you know of any good literature on that topic ?
Have you had a look at yadif (http://avisynth.org.ru/yadif/yadif.html) ?
I only looked at it a while ago, and it didnt seem too complex to reuse.

Frank.

luc

unread,

Mar 16, 2012, 10:41:19 AM3/16/12

to pixfc-sse

That makes sense. I might have use for 10bit video too (with HD video
capture card).
Thanks for the link,

Luc

Merlin Miller

unread,

Mar 28, 2012, 8:05:52 PM3/28/12

to pixf...@googlegroups.com

Frank,

Have you done enough work on the 10bit conversions to have some feel on what the performance difference would be for 10-bit versus 8-bit? I would love to see an optimized chroma conversion from the V210 FourCC format to RGB or the other YUV420 formats.

Also, I was wondering what your "slow" resampler algorithm does for a filter - i.e. what kind of filter and how many taps are you doing?

Also I am curious, when I run the unit tester, I don't see the CPU utilization (Windows platform) increase. Any thoughts? I know that it is consuming CPU resources because when I run multiple copies of unit-tester in parallel, they all slow down proportional to the number running.

Thanks,

Merlin

Pix FC

unread,

Mar 29, 2012, 2:19:47 AM3/29/12

to pixf...@googlegroups.com

Hi Merlin,
I havent done much on the 10-bit side of things yet, but here's my take on it:
Conversion routines are roughly organised into the following blocks: unpacking - upsampling (if required) - conversion - downsampling(if required) - packing. The only blocks I need to write to support 10-bit formats are the packing & unpacking ones, and I can safely re-use existing upsampling, conversion and downsampling ones. The major pain is with 10-bit formats is that values are not aligned on byte boundary, and require more bit shuffling to pack / unpack values into useful vectors. This can be mitigated if SSSE3 instructions are available. In the end, I think looking at the timings of RGB24 conversions can give a rough idea of how 10-bit conversions will perform because 8-bit RGB24 formats packing/unpacking also require a lot of shuffling.

Regarding resampling, I followed what rec601 requires. When you use the PixFcFlag_NNbResamplingOnly flag, nearest neighbour resampling is used and the code is fairly straight forward: downsampling drops values, upsampling reuses values from the nearest neighour. Without that flag, resampling is done like this:
- when upsampling, recreate missing value by averaging values of current and next neighbours,
- when downsampling, use 1/4*previous + 1/2*current + 1/4*next
There is a good explanation in Charles Poynton's "Merging Computing with Studio Video: Converting Between R'G'B' and 4:2:2" paper (which can be found at http://www.poynton.com/papers/Discreet_Logic/index.html ). Check figure 7. I hope this answers your question.

About the CPU usage, i kind of thrilled to hear you cant see much, as that's the point of pixfc ! Are the conversion timings acceptable though ?

Frank

Pix FC

unread,

Apr 2, 2012, 6:47:22 PM4/2/12

to Merlin Miller, pixf...@googlegroups.com

[please keep mailing list CC'd]

Merlin,

On 3 April 2012 06:26, Merlin Miller <merlin...@gmail.com> wrote:
> Frank,
>

> I haven't gotten a good feel for the performance yet, but from what I
> tested on an Ivy Bridge solution, I see that the "RGB24 to/from
> YUV422P - bt.709 - SSE2 / SSSE3" took approximately 2 seconds on
> average. If I understand the benchmark you are doing a 1920x1080 image
> 100 times, so that means you could do approximately 50 frames/sec real
> time, which would mean that I couldn't handle one stream of 1080p60
> real time input or output. Am I mistaken in my interpretation?

50 frames per second sounds very low for your platform.
If that number "2" comes from the output of "unit-testing", then this number is the average time in milliseconds to convert one 1920x1080 frame (over 100 runs).
Which means you can convert about 400 - 500 frames per seconds, and that's more like what I would expect you to see.
I just realised this is not obvious at all, and I need to make this more explicit in the output. Also, did you download the 32-bit or 64-bit version of PixFC ?

>
> Also, since one can't look at CPU utilization to see if the SIMD
> engines are being used fully, what kind of tool do you use to
> understand the efficiency of the algorithms? Intel's Parallel Studio?

I wish I had time and money to learn how to use Parallel Studio. But I have neither of those right now, so instead, I disassemble and check each conversion routine. I have a couple of useful shell scripts to help me out. One of them tells me how many instructions are used for each conversion. With this, the knowledge of the number of cycles per conversion and the CPU speed on my test platforms, I can get a pretty good indication of whether conversion routines are performing as they should, or if there is something wrong. One of the shell scripts can also count occurrences of a given instruction. I usually check the number of MOVDQA/U for unnecessary and unaligned memory loads (sign that there is probably room for further optimisation). Last, I have a shell script to compare the number of instructions between two versions of a conversion routine, to make sure I havent messed things up.

Thanks
Frank

>
> Thanks,
> Merlin

>> found athttp://www.poynton.com/papers/Discreet_Logic/index.html). Check

>> figure 7. I hope this answers your question.
>>
>> About the CPU usage, i kind of thrilled to hear you cant see much, as
>> that's the point of pixfc ! Are the conversion timings acceptable though ?
>>
>> Frank
>>

Merlin Miller

unread,

Apr 2, 2012, 8:40:50 PM4/2/12

to Pix FC, pixf...@googlegroups.com

Thanks Frank for the clarifications. Yes, it was not clear what the benchmark numbers were telling me.