Out of curiosity, what functions are you comparing here?
I appreciate any analysis pointers as we strive to optimize ProjectNe10 and move towards a more C intrinsic based solution.--
Matthew Du Puy | Software Engineer | ARM Strategic Alliances
community.arm.com/people/mattdupuy
community.arm.com/groups/android-community
Dear Matthew,Details on the benchmarks from the title page will show up if you hover the mouse on a picture:The plot you refer to demonstrates performance of in-cache single-precision vector addition (function yepCore_Add_V32fV32f_V32f) on Cortex-A8.In my opinion, the main performance issue with Ne10 is that it doesn't hide the latency of instructions (BTW, Yeppp! is also not perfect here, this kernel could perform at 0.75 CPE is better optimized).Regards,Marat
--
You received this message because you are subscribed to the Google Groups "yeppp-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to yeppp-discus...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.