NumericExtensions are significantly updated recently.
The major changes:
- Reinstate/introduce reduce, mapreduce, reducedim, mapreducedim methods using functors
- All map, reducedim, and mapreducedim functions and functions derived thereon now support both contiguous arrays and strides views (those from
ArrayViews.jl) with very efficient implementation.
- the PDMat part is separated to a new package
PDMats.jl
Internal implementation strategy is recursion (computation over N-D arrays are decomposed into computations over (N-1)-D slices) + highly optimized 1D/2D kernels.
Here is some benchmarks obtained by running computations over 1000x1000 matrices
a_sub = sub(a, 1:999, :)
a_view = view(a, 1:999, :) # using ArrayViews
for sum:
dim = 1: sum(a_sub, dim) => 0.1314s sum(a_view, dim) => 0.0374s | gain = 3.5168x
dim = 2: sum(a_sub, dim) => 0.1740s sum(a_view, dim) => 0.0574s | gain = 3.0286x
for sumabs:
dim = 1: sum(abs(a_sub), dim) => 0.6341s sumabs(a_view, dim) => 0.0286s | gain = 22.1343x
dim = 2: sum(abs(a_sub), dim) => 0.6331s sumabs(a_view, dim) => 0.0639s | gain = 9.9013x
Note: the ArrayViews package now provides a ``ellipview`` function, as
ellipview(a, i) # equivalent to view(a, ..., i), e.g. view(a, :,:,i) when ndims(a) == 3
Some thoughts:
Tim's Cartesian machinery is very nice and has been in the Julia Base. It can be used to express generic algorithms using a small number of lines of codes, while producing reasonable run-time performance. However, for writing really high-performance implementation, recursion + highly optimized kernels is still faster (in run time), and this strategy allows one to plug-in SIMD, BLAS or other external functions for optimal performance (through multiple dispatch).
Consider the following code:
sumabs(view(a, 1:500, 1:2:5, :), (1, 2))
The strategy implemented in NumericExtensions will invoke ``BLAS.asum`` when performing computation along each column. That's why you will see over 20x speed gain above.
- Dahua