Vector math benchmarks

123 views
Skip to first unread message

John McCutchan

unread,
Jun 17, 2012, 1:38:14 AM6/17/12
to General Dart Discussion
Hi,

Following up on my earlier mails about Float32Array/List performance I have done some initial benchmarking of 4x4 matrix multiply, 4x4 matrix inverse, and 4D vector transform by 4x4 matrix. The benchmark compares DartVectorMath, glmatrix.dart, and a new feature to the DartVM I've been working on- Float32List SIMD (SSE) operations:

Each benchmark is averaged over 10 runs of 20,000 iterations. See the raw results below.

Executive summary:

glmatrix.dart performance is _really_ bad, I'm guessing that this is largely a result of the Float32List performance issue I pointed out in the other thread.
DartVectorMath performance is okay. It actually beats the SIMD backend for vector transforms.
SIMD performance is 2x faster for matrix * matrix, 4x faster for matrix inverse, and 50% slower for vector transform.

Thanks,
John

Starting benchmark
Clock frequency: 1000000
=============================================
Matrix Multiplication
=============================================
Avg: 14.59 ms Min: 10.161 ms Max: 22.927 ms (Avg: 14590 Min: 10161 Max: 22927)
=============================================
Matrix Multiplication SIMD
=============================================
Avg: 8.702 ms Min: 8.475 ms Max: 9.217 ms (Avg: 8702 Min: 8475 Max: 9217)
=============================================
Matrix Multiplication glmatrix.dart
=============================================
Avg: 283.353 ms Min: 272.062 ms Max: 287.988 ms (Avg: 283353 Min: 272062 Max: 287988)
=============================================
mat4x4 inverse
=============================================
Avg: 28.289 ms Min: 21.019 ms Max: 34.891 ms (Avg: 28289 Min: 21019 Max: 34891)
=============================================
mat4x4 inverse SIMD
=============================================
Avg: 7.107 ms Min: 6.89 ms Max: 7.754 ms (Avg: 7107 Min: 6890 Max: 7754)
=============================================
mat4x4 glmatrix.dart
=============================================
Avg: 318.909 ms Min: 315.435 ms Max: 325.831 ms (Avg: 318909 Min: 315435 Max: 325831)
=============================================
vector transform
=============================================
Avg: 4.324 ms Min: 2.811 ms Max: 14.859 ms (Avg: 4324 Min: 2811 Max: 14859)
=============================================
vector transform SIMD
=============================================
Avg: 6.415 ms Min: 6.204 ms Max: 7.006 ms (Avg: 6415 Min: 6204 Max: 7006)
=============================================
vector transform glmatrix.dart
=============================================
Avg: 144.431 ms Min: 138.263 ms Max: 153.798 ms (Avg: 144431 Min: 138263 Max: 153798)



--
John McCutchan <jo...@johnmccutchan.com>

Peter Jakobs

unread,
Jun 17, 2012, 5:40:03 AM6/17/12
to General Dart Discussion
Thanks for the benchmark, the performance of my glmatrix port is
depressing bad...
Im curiouse how well it performs on the dartium-client with
Float32Array instead of the Float32List.
> John McCutchan <j...@johnmccutchan.com>

Srinivas JONNALAGADDA

unread,
Jun 17, 2012, 6:33:16 AM6/17/12
to mi...@dartlang.org
        What do you think, John, is causing the large sigma in the readings for DartVectorMath?  Or, is it just outlier effect?  Thanks.

                                                            Greetings,
                                                                    JS
____

John McCutchan

unread,
Jun 17, 2012, 1:12:20 PM6/17/12
to Peter Jakobs, General Dart Discussion
Hi Peter,

Don't get discouraged. The Dart VM is still very young and if I'm correct the performance problem can be fixed, bringing your port to parity (maybe faster) with DartVectorMath.

John
--
John McCutchan <jo...@johnmccutchan.com>

John McCutchan

unread,
Jun 17, 2012, 1:15:41 PM6/17/12
to Srinivas JONNALAGADDA, mi...@dartlang.org
Hi JS,

I think the sigma (standard deviation) of DartVectorMath performance comes from the VM running the first couple passes as unoptimized, the dart VM has a call count threshold before optimizing the function. If you pass the disassemble option to the VM you will get both the unoptimized and optimized versions of each function. 

John
--
John McCutchan <jo...@johnmccutchan.com>
Reply all
Reply to author
Forward
0 new messages