Hello AV1 folks,
We just released dav1d 0.7.0, a major version of the fastest AV1 decoder.
- 10% faster on Intel CPUs with 25% less RAM, the assembly is finished for 8bit
- ARM64 assembly mostly done for 10/12bit in addition to 8bit
- dav1d is, in average, twice as fast as gav1 on ARM CPU and 4 times faster for 10b
- 1080p AV1 decodable real-time with 2 little-core on Pixel 1
By rewriting ref_mv, (which was the last part from libaom), the complete decoder is around 8-12% faster than before, (tested on Haswell) while using around 25% less RAM.
In addition to small optimizations added this time (film grain, scale,...),
the decoder is now fully optimized on x86 32-bit and 64bit (AVX2 and
SSSE3) and it's unlikely to get much faster. We added more AVX-512
optimizations.
We're still at 3x-5x the speed of aomdec, the reference decoder.
The
decoder got all important optimizations for 10-bit/12-bit decoding on
ARM64, in addition to the existing 8-bit optimizations.
The only missing optimizations is film grain, but one should do that on the GPU. :D
Note: Except SGR and inverse transforms, all the optimizations were done for both 10-bit and 12-bit.
dav1d is now, in average, more than twice faster than libgav1 on ARM64 (1.8x-2.5x), and on Android, which is the focus of libgav1.
For 10-bit decoding, dav1d is between 2.5x and 5x faster than libgav1 on the same ARM64 platforms.
What
you should mostly read, is the scalability of the threading on Android
of both libgav1 and dav1d on LITTLE.big architecture, where Nathan tried
numerous threading options and cores-assignments on Android devices
(Pixel 1,2,3 and Mi 9T Pro).
We also focused on testing the decoding using only the LITTLE cores, to reduce power consumption.
For
example, on the Google Pixel 1, with a snapdragon 821, which has 2
LITTLE cores and 2 Big cores, dav1d is able to decode Chimera 1080p 8bit
on the 2 LITTLE cores!
This is quite impressive.
All of this is detailed on:
and the data can be found:
In
general, AV1 decoding is ready on numerous mobile phones, where even
middle-range phones from 5years ago can decode 720p and more recent can
decode 1080p in software without taking all the CPUs.
iOS phones are even faster for decoding AV1.
On
the consoles, which a few of you care about, dav1d is able to decode
1440p30 on the original Xbox, without doing any specific work, and
without using more than half of the CPU.
The next steps for dav1d will be ARM32 and x86 optimizations for 10-bit decoding, I guess.
And if people care about some research, finish the work on GPGPU.
I hope you enjoy that release,
jb, for VideoLAN and dav1d.