[please keep mailing list CC'd]
Merlin,
> I haven't gotten a good feel for the performance yet, but from what I
> tested on an Ivy Bridge solution, I see that the "RGB24 to/from
> YUV422P - bt.709 - SSE2 / SSSE3" took approximately 2 seconds on
> average. If I understand the benchmark you are doing a 1920x1080 image
> 100 times, so that means you could do approximately 50 frames/sec real
> time, which would mean that I couldn't handle one stream of 1080p60
> real time input or output. Am I mistaken in my interpretation?
50 frames per second sounds very low for your platform.
If that number "2" comes from the output of "unit-testing", then this number is the average time in milliseconds to convert one 1920x1080 frame (over 100 runs).
Which means you can convert about 400 - 500 frames per seconds, and that's more like what I would expect you to see.
I just realised this is not obvious at all, and I need to make this more explicit in the output. Also, did you download the 32-bit or 64-bit version of PixFC ?
>
> Also, since one can't look at CPU utilization to see if the SIMD
> engines are being used fully, what kind of tool do you use to
> understand the efficiency of the algorithms? Intel's Parallel Studio?
I wish I had time and money to learn how to use Parallel Studio. But I have neither of those right now, so instead, I disassemble and check each conversion routine. I have a couple of useful shell scripts to help me out. One of them tells me how many instructions are used for each conversion. With this, the knowledge of the number of cycles per conversion and the CPU speed on my test platforms, I can get a pretty good indication of whether conversion routines are performing as they should, or if there is something wrong. One of the shell scripts can also count occurrences of a given instruction. I usually check the number of MOVDQA/U for unnecessary and unaligned memory loads (sign that there is probably room for further optimisation). Last, I have a shell script to compare the number of instructions between two versions of a conversion routine, to make sure I havent messed things up.
Thanks
Frank
>
> Thanks,
> Merlin
>> found athttp://
www.poynton.com/papers/Discreet_Logic/index.html). Check
>> figure 7. I hope this answers your question.
>>
>> About the CPU usage, i kind of thrilled to hear you cant see much, as
>> that's the point of pixfc ! Are the conversion timings acceptable though ?
>>
>> Frank
>>