[ARMv8] [VP8] Port libvpx to ARMv8 by rewrite NEON assembly to using NEON intrinsics

327 views
Skip to first unread message

James Yu

unread,
Dec 19, 2013, 2:20:51 AM12/19/13
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
Hi Folks,

I'm working on porting libvpx to running on ARMv8 architecture. 
The way is rewrite NEON assembly to NEON intrinsics C code, it will support ARMv7 and ARMv8 both by compiled by ARMv7 or ARMv8 toolchain, instead of hand write ARMv8 NEON assembly.

I'm using arm gcc 4.8 toolchain maintain by Linaro. but I guess any NEON supported toolchain also can work
gcc-linaro-arm-linux-gnueabihf-4.8-2013.08_linux

By below configuration
CROSS=arm-linux-gnueabihf- ../libvpx/configure --target=armv7-linux-gcc --disable-vp9 --prefix=$HOME/dev/libvpx --enable-shared --enable-unit-tests --enable-decode-perf-tests --as=auto --cpu=cortex-a8

I separate libvpx 2 portion, VP8 and VP9. Starting from VP8 codec and had push my code for review. 
Please anyone who working on ARM arch help to review my patch chain?
"VP8 for ARMv8 by using NEON intrinsics. 01/15"
to
"VP8 for ARMv8 by using NEON intrinsics. 15/15" (15/15 failed in AltRefTest)
and
"VP8 for ARMv8 by using NEON intrinsics. remove file" (Remove "seems" unused asm file. If needed, I'll rewrite it.)

I've test it on my ARM Cortex-A8 platform. It works fine. Only the performance is reduce around 10% compare to hand write assembly code. It should caused by toolchain to generate assembly, but it's acceptable trade off for support ARMv7 and v8 architecture by same code. Or maybe we chose to use asm file in ARMv7, and intrinsics code on ARMv8.

I think I still have lots of work to do for porting libvpx to ARMv8. Please help and guide me to achieve this target.

Thanks and Best Regards,
James

James Zern

unread,
Dec 19, 2013, 2:36:34 AM12/19/13
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
Hi,

On Wed, Dec 18, 2013 at 11:20 PM, James Yu <jame...@linaro.org> wrote:
> Hi Folks,
>
> I'm working on porting libvpx to running on ARMv8 architecture.

Great!

> The way is rewrite NEON assembly to NEON intrinsics C code, it will support
> ARMv7 and ARMv8 both by compiled by ARMv7 or ARMv8 toolchain, instead of
> hand write ARMv8 NEON assembly.
>
> I'm using arm gcc 4.8 toolchain maintain by Linaro. but I guess any NEON
> supported toolchain also can work
> gcc-linaro-arm-linux-gnueabihf-4.8-2013.08_linux
>
> By below configuration
> CROSS=arm-linux-gnueabihf- ../libvpx/configure --target=armv7-linux-gcc
> --disable-vp9 --prefix=$HOME/dev/libvpx --enable-shared --enable-unit-tests
> --enable-decode-perf-tests --as=auto --cpu=cortex-a8
>
> I separate libvpx 2 portion, VP8 and VP9. Starting from VP8 codec and had
> push my code for review.
> [...]

I saw the patchset, but you abandoned it before I could comment. The
initial failures are due to the fact that you haven't submitted code
to our repo before and I don't see an individual or corporate
contributor agreement [1].
The first step would be to sign that, then upload a trivial change.
After that lands we can get into the full set of patches.

[1] http://www.webmproject.org/code/contribute/submitting-patches/#contributor-agreement

James Yu

unread,
Dec 19, 2013, 4:02:54 AM12/19/13
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
Sorry, change reply account.

Hi James,

Yes, this is the first time upload. Thank you.
I've sign the agreement and upload the patchset again. Please have a look. Any more thing I need to do?

Regards,
James 

James Zern

unread,
Dec 19, 2013, 3:28:46 PM12/19/13
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
No the cla went through, thanks for doing that. The next step is to
land a small commit, then afterward the tests will be run for you when
you upload a new patchset.

James Yu

unread,
Dec 19, 2013, 11:01:50 PM12/19/13
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
Is that means the verify failed (build failed) caused by no cla?
How long will cla take usually and how could I know cla process finished?
I just sign it again. 
 

James Zern

unread,
Dec 19, 2013, 11:08:28 PM12/19/13
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
The cla has already cleared. Your first change requires an approved
committer to get it past the tests. After that any patchset you upload
will invoke the tests normally.

James Yu

unread,
Dec 19, 2013, 11:28:45 PM12/19/13
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
Okay, I see. Thanks, James. 

James Yu

unread,
Jan 5, 2014, 8:18:11 AM1/5/14
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
Hi James,

I've upload more patch for VP8 common part and VPX scale. But seems need someone to past the tests like you said.
Please help guide me to send request to reviewer for review. Thanks.

Regards,
James Yu 

James Zern

unread,
Jan 7, 2014, 2:48:14 PM1/7/14
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
Sorry for the delay, we're coming off some holidays in the US. We'll
get to these this week.

James Yu

unread,
Jan 7, 2014, 11:18:15 PM1/7/14
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
Thanks, James. :-)
 

Johann Koenig

unread,
Jan 8, 2014, 2:08:29 PM1/8/14
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
On Tue, Jan 7, 2014 at 8:18 PM, James Yu <jame...@linaro.org> wrote:
> Thanks, James. :-)

I've followed up on your first patch here:
https://gerrit.chromium.org/gerrit/#/c/68165/

In general the code style looks good and I have just some basic nits.
I ran the first patch through the tests and it looks good but I was
wondering if you have any numbers wrt performance change on existing
devices.

Can you give a quick overview of the patches? It looks like:
VP8 for ARMv8 by using NEON intrinsics. [01 - 17] deal with code in
vp8/common/arm/neon

I've seen the issue mentioned here pop up:
https://gerrit.chromium.org/gerrit/#/c/68179/10
but I'm surprised you only see it with this patch. I'll look into it.

I'm not sure what this patch does:
https://gerrit.chromium.org/gerrit/#/c/68280/
What tests did you run to decide that the build_intra_predictors
functions are not needed? It looks like we don't list them in
test/intrapred_test.cc but should.
vp8 push and pop are used for functions which don't store q4-q7 (which
we shouldn't have many of and if they are replaced with intrinsics
then we no longer need these functions)

VPX scale for ARMv8 by using NEON intrinsics. 1 looks to deal with vpx/arm/neon

Again, I'm not sure why these are removed:
https://gerrit.chromium.org/gerrit/#/c/68263/5

VP8 encoder for ARMv8 by using NEON intrinsics. [1 - 3]
vp8/encoder/arm/neon functions
But again, I see vp8_get4x4sse_cs_neon is made into a no-op. How did
you determine it wasn't necessary? I'm willing to admit that our gtest
implementation may be incomplete.

James Yu

unread,
Jan 9, 2014, 2:34:22 AM1/9/14
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com


On Thursday, January 9, 2014 3:08:29 AM UTC+8, Johann wrote:
On Tue, Jan 7, 2014 at 8:18 PM, James Yu <jame...@linaro.org> wrote:
> Thanks, James. :-)

I've followed up on your first patch here:
https://gerrit.chromium.org/gerrit/#/c/68165/

In general the code style looks good and I have just some basic nits.
I ran the first patch through the tests and it looks good but I was
wondering if you have any numbers wrt performance change on existing
devices.

Hi Johann,

Thanks for looking at it. It's welcome any kind of comments :-).
I've not yet to have a precisely performance comparison. But I test roughly before, the performance is like you said seems slight performance loss. 
It's reasonable to have a complete performance compare after all VP8 related neon functions re-witten.
 

Can you give a quick overview of the patches? It looks like:
VP8 for ARMv8 by using NEON intrinsics. [01 - 17] deal with code in
vp8/common/arm/neon

I've seen the issue mentioned here pop up:
https://gerrit.chromium.org/gerrit/#/c/68179/10
but I'm surprised you only see it with this patch. I'll look into it.

I post the test result of master and with my patches branch comparison below. 
It've been fixed this issue somewhere. I don't know the root cause.
 
I'm not sure what this patch does:
https://gerrit.chromium.org/gerrit/#/c/68280/
What tests did you run to decide that the build_intra_predictors
functions are not needed? It looks like we don't list them in
test/intrapred_test.cc but should.
That's why I have no idea how to test my code :p.
If you can add them into test suite. I will add my code later.
 
vp8 push and pop are used for functions which don't store q4-q7 (which
we shouldn't have many of and if they are replaced with intrinsics
then we no longer need these functions)

VPX scale for ARMv8 by using NEON intrinsics. 1 looks to deal with vpx/arm/neon

Again, I'm not sure why these are removed:
https://gerrit.chromium.org/gerrit/#/c/68263/5

VP8 encoder for ARMv8 by using NEON intrinsics. [1 - 3]
vp8/encoder/arm/neon functions
But again, I see vp8_get4x4sse_cs_neon is made into a no-op. How did
you determine it wasn't necessary? I'm willing to admit that our gtest
implementation may be incomplete.
Same reason above.


Here are the test result of VP8 related test case.
master branch --------------------------------------------------------------------------------
[----------] Global test environment tear-down
[==========] 135 tests from 7 test cases ran. (1946167 ms total)
[  PASSED  ] 122 tests.
[  FAILED  ] 13 tests, listed below:
[  FAILED  ] VP8/CQTest.LinearPSNRIsHigherForCQLevel/0, where GetParam() = (0x6ec58, 4)
[  FAILED  ] VP8/CQTest.LinearPSNRIsHigherForCQLevel/1, where GetParam() = (0x6ec58, 12)
[  FAILED  ] VP8/CQTest.LinearPSNRIsHigherForCQLevel/2, where GetParam() = (0x6ec58, 20)
[  FAILED  ] VP8/CQTest.LinearPSNRIsHigherForCQLevel/3, where GetParam() = (0x6ec58, 28)
[  FAILED  ] VP8/CQTest.LinearPSNRIsHigherForCQLevel/4, where GetParam() = (0x6ec58, 36)
[  FAILED  ] VP8/CQTest.LinearPSNRIsHigherForCQLevel/5, where GetParam() = (0x6ec58, 44)
[  FAILED  ] VP8/CQTest.LinearPSNRIsHigherForCQLevel/6, where GetParam() = (0x6ec58, 52)
[  FAILED  ] VP8/DatarateTest.BasicBufferModel/3, where GetParam() = (0x6ec6c, 3)
[  FAILED  ] VP8/DatarateTest.BasicBufferModel/4, where GetParam() = (0x6ec6c, 4)
[  FAILED  ] VP8/KeyframeTest.TestRandomVideoSource/3, where GetParam() = (0x6ecb8, 3)
[  FAILED  ] VP8/KeyframeTest.TestRandomVideoSource/4, where GetParam() = (0x6ecb8, 4)
[  FAILED  ] VP8/KeyframeTest.TestAutoKeyframe/3, where GetParam() = (0x6ecb8, 3)
[  FAILED  ] VP8/KeyframeTest.TestAutoKeyframe/4, where GetParam() = (0x6ecb8, 4)

13 FAILED TESTS

real 32m26.310s
user 32m16.160s
sys 0m5.550s
-------------------------------------------------------------------------------------------------

With my patches
-------------------------------------------------------------------------------------------------
[----------] Global test environment tear-down
[==========] 135 tests from 7 test cases ran. (1833471 ms total)
[  PASSED  ] 132 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] VP8/CQTest.LinearPSNRIsHigherForCQLevel/5, where GetParam() = (0x6ec58, 44)
[  FAILED  ] VP8/KeyframeTest.TestRandomVideoSource/3, where GetParam() = (0x6ecb8, 3)
[  FAILED  ] VP8/KeyframeTest.TestRandomVideoSource/4, where GetParam() = (0x6ecb8, 4)

 3 FAILED TESTS

real 30m33.511s
user 30m24.780s
sys 0m5.640s
-------------------------------------------------------------------------------------------------
It seems have some changes for the result of master branch. But I don't understand the reason.

-James


 

Johann Koenig

unread,
Jan 9, 2014, 12:40:49 PM1/9/14
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
On Wed, Jan 8, 2014 at 11:34 PM, James Yu <jame...@linaro.org> wrote:
>> I'm not sure what this patch does:
>> https://gerrit.chromium.org/gerrit/#/c/68280/
>> What tests did you run to decide that the build_intra_predictors
>> functions are not needed? It looks like we don't list them in
>> test/intrapred_test.cc but should.
>
> That's why I have no idea how to test my code :p.
> If you can add them into test suite. I will add my code later.

Ahh, they were disabled here:
https://gerrit.chromium.org/gerrit/#/c/19084/
https://gerrit.chromium.org/gerrit/#/c/19185/

If you were interested in updating them I would appreciate it :) but
since they're currently disabled they can be skipped.

Those other failures look new. I'll dig into them.

James Yu

unread,
Jan 10, 2014, 1:31:53 AM1/10/14
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
Maybe I can do that after make VP8/VP9 support ARMv8 architecture. :-)
 

Those other failures look new. I'll dig into them.
Thanks. 

- James Yu

James Yu

unread,
Jan 19, 2014, 11:04:14 PM1/19/14
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
Hi Johann and James,

I saw all others NEON intrinsics patches are still stock at there. Does anything I can help?
Please tell me what's next step for it. Thanks.

-James
 
- James Yu

Johann Koenig

unread,
Jan 21, 2014, 2:10:49 PM1/21/14
to codec...@webmproject.org, Tom Gall, Vivien Kuan, Mollie Wu, hiroshi...@tw.fujitsu.com
On Sun, Jan 19, 2014 at 8:04 PM, James Yu <jame...@linaro.org> wrote:
> I saw all others NEON intrinsics patches are still stock at there. Does
> anything I can help?
> Please tell me what's next step for it. Thanks.

There are some unit test failures which need to be fixed before I can
go back to reviewing your code. I was hoping to have them done but
they are taking longer than I wanted. I will let you know when I can
start reviewing the code.
Reply all
Reply to author
Forward
0 new messages