NumericExtensions extended

91 views
Skip to first unread message

Dahua

unread,
Feb 16, 2014, 2:46:45 PM2/16/14
to juli...@googlegroups.com
NumericExtensions are significantly updated recently.

The major changes:
- Reinstate/introduce reduce, mapreduce, reducedim, mapreducedim methods using functors
- All map, reducedim, and mapreducedim functions and functions derived thereon now support both contiguous arrays and strides views (those from ArrayViews.jl) with very efficient implementation.
- the PDMat part is separated to a new package PDMats.jl

Internal implementation strategy is recursion (computation over N-D arrays are decomposed into computations over (N-1)-D slices) + highly optimized 1D/2D kernels. 

Here is some benchmarks obtained by running computations over 1000x1000 matrices  
a_sub = sub(a, 1:999, :)
a_view = view(a, 1:999, :)   # using ArrayViews

for sum:
  dim = 1:  sum(a_sub, dim) => 0.1314s   sum(a_view, dim) => 0.0374s   |  gain = 3.5168x
  dim = 2:  sum(a_sub, dim) => 0.1740s   sum(a_view, dim) => 0.0574s   |  gain = 3.0286x
for sumabs:
  dim = 1:  sum(abs(a_sub), dim) => 0.6341s   sumabs(a_view, dim) => 0.0286s   |  gain = 22.1343x
  dim = 2:  sum(abs(a_sub), dim) => 0.6331s   sumabs(a_view, dim) => 0.0639s   |  gain = 9.9013x

Note: the ArrayViews package now provides a ``ellipview`` function, as
ellipview(a, i)  # equivalent to view(a, ..., i), e.g. view(a, :,:,i) when ndims(a) == 3

Some thoughts:

Tim's Cartesian machinery is very nice and has been in the Julia Base. It can be used to express generic algorithms using a small number of lines of codes, while producing reasonable run-time performance. However, for writing really high-performance implementation, recursion + highly optimized kernels is still faster (in run time), and this strategy allows one to plug-in SIMD, BLAS or other external functions for optimal performance (through multiple dispatch). 

Consider the following code:
sumabs(view(a, 1:500, 1:2:5, :), (1, 2))
The strategy implemented in NumericExtensions will invoke ``BLAS.asum`` when performing computation along each column. That's why you will see over 20x speed gain above.

- Dahua

Reply all
Reply to author
Forward
0 new messages