This more closely matches the behavior of the decoder where the above row may refer to the reference frame if available [1].
For context, see some of the recent Arm changes and reverts [2].
Unfortunately this was lost along the way [3][4] and there appears to be some issues in the ssse3 which don't surface in the test vectors. Note AV1 still has the original loop which didn't do extension [5].
commit 100ca0356ddf67e92da35699d92bc180429d0bc1 Author: George Steed <george...@arm.com> Date: Fri Mar 17 20:00:24 2023
Randomize second half of above_row_ in intrapred tests for Neon
The existing tests duplicate `above_row_[block_size - 1]` after the first `block_size` elements, which can lead to tests incorrectly passing due to differing behaviour when calculating the average for the last elements of the output.
This change adjusts the above array setup to be fully random instead, allowing us to catch such issues here rather than in other larger tests like the external MD5 tests.
It doesn't appear that other architectures are fully clean with this change so restrict it to just Neon for now until they are fixed.
commit 911d6e165eb19e03ec1532fa20098b10ad402e39 Author: George Steed <george...@arm.com> Date: Fri Mar 17 19:55:17 2023
Allow non-uniform above array in d63 predictor Neon impl
The existing standard bitdepth implementation doesn't appear to manifest as a failure in any of the predictor or MD5 tests, but it does rely on the predictor tests filling the second `bs` elements of the `above` input array with copies of `above[bs - 1]` in order to match the C implementation.
This patch adjusts the Neon implementation to correctly match the C implementation in the case where the elements of the `above` array all differ.
The geomean of performance for the predictor is approximately a 2% slowdown compared to the previous vectorized implementation. This is still considerably faster than the unspecialized naive C implementation.
commit 3eb3781589d30874634cab8952dec4ea883eb82a Author: George Steed <george...@arm.com> Date: Fri Mar 17 17:59:26 2023
Allow non-uniform above array in d45 predictor Neon impl
The existing implementation doesn't appear to manifest as a failure in
any of the predictor or MD5 tests, but it does rely on the predictor tests filling the second `bs` elements of the `above` input array with copies of `above[bs - 1]` in order to match the C implementation.
This patch adjusts the Neon implementation to correctly match the C implementation in the case where the elements of the `above` array all differ.
Performance of the predictor is mostly unchanged, except for the 32x32 block size where it appears to have gotten about 40% faster when compiled with clang-15.
commit 25825f6a78a267f99c4c6ba7988fc4d79c8cb19d Author: George Steed <george...@arm.com> Date: Thu Mar 09 23:46:31 2023
Allow non-uniform above array in highbd d45 predictor Neon impl
The existing implementation doesn't appear to manifest as a failure in any of the predictor or MD5 tests, but it does rely on the predictor tests filling the second `bs` elements of the `above` input array with copies of `above[bs - 1]` in order to match the C implementation.
This patch adjusts the Neon implementation to correctly match the C implementation in the case where the elements of the `above` array all differ.
Performance of the predictor is mostly unchanged, except for the 16x16 block size where it appears to have gotten marginally faster across most compiler/micro-architecture combinations.