Package NumericFunctors.jl

551 views
Skip to first unread message

Dahua

unread,
Jun 20, 2013, 12:55:26 AM6/20/13
to juli...@googlegroups.com
I rewrote the computation part from MLBase, and create a new package NumericFunctors.jl

This package provides carefully optimized functions for vectorized map, reduction, and map-reduce computation. This is a substantial rewrite. The package has following features:
  • provide a large set of pre-defined functors that cover most typical mathematical computation, and allow user to define customized functors.
  • Functions for element-wise mapping, with different syntax to support different behavior: creating result array, writing to preallocated array, updating an argument array
  • Functions for reduction, map-reduction, reduction/map-reduction along specific dimension(s)
  • Both map, reduction, and map-reduction supports arrays with arbitrary number of dimensions (vector, matrix, cube, ...)
  • Efficient inplace broadcasting computation
Julia already provide many of such functionalities, but it has not been optimized for performance yet. At this moment, the functions provided in this package often yield 2x - 30x speed-up as compared to Julia counter parts (Readme file provides detailed benchmarks).

Such improvement is due to extensive use of performance optimization techniques, such as 
  • perform computation in cache-friendly order
  • complete computation in a single-pass with tight loops, without creating temporary arrays
  • inline computational kernels through typed functors (use of typed functors instead of functions will triggered specialized compilation of methods that would inline the functor evaluation)
  • use linear indexing with pre-computed offset in inner loops
  • exploit opportunities to use BLAS
Overall, this package offers high performance computational support for many practical applications.

Note: when function arguments can be typed and inlined, we don't have to introduce functors to get performance. However, for now, this is a feasible way to get both performance & extensibility.

Viral Shah

unread,
Jun 20, 2013, 1:53:47 AM6/20/13
to juli...@googlegroups.com
This is fantastic! As already discussed in another thread, many of these things should eventually migrate to Base.

-viral

Tim Holy

unread,
Jun 20, 2013, 6:38:54 AM6/20/13
to juli...@googlegroups.com
You're not joking when you say it's a big rewrite. This is much more general.
Very impressive!

--Tim

On Wednesday, June 19, 2013 09:55:26 PM Dahua wrote:
> I rewrote the computation part from MLBase, and create a new package
> NumericFunctors.jl <https://github.com/lindahua/NumericFunctors.jl>.
>
> This package provides carefully optimized functions for vectorized map,
> reduction, and map-reduce computation. This is a substantial rewrite. The
> package has following features:
>
> - provide a large set of pre-defined functors that cover most typical
> mathematical computation, and allow user to define customized functors.
> - Functions for element-wise mapping, with different syntax to support
> different behavior: creating result array, writing to preallocated array,
> updating an argument array
> - Functions for reduction, map-reduction, reduction/map-reduction along
> specific dimension(s)
> - Both map, reduction, and map-reduction supports arrays with arbitrary
> number of dimensions (vector, matrix, cube, ...)
> - Efficient inplace broadcasting computation
>
> Julia already provide many of such functionalities, but it has not been
> optimized for performance yet. At this moment, the functions provided in
> this package often yield 2x - 30x speed-up as compared to Julia counter
> parts (Readme file provides detailed benchmarks).
>
> Such improvement is due to extensive use of performance optimization
> techniques, such as
>
> - perform computation in cache-friendly order
> - complete computation in a single-pass with tight loops, without
> creating temporary arrays
> - inline computational kernels through typed functors (use of typed
> functors instead of functions will triggered specialized compilation of
> methods that would inline the functor evaluation)
> - use linear indexing with pre-computed offset in inner loops
> - exploit opportunities to use BLAS

Dahua

unread,
Jun 20, 2013, 9:11:23 AM6/20/13
to juli...@googlegroups.com
One of the reasons I created this is to prepare for migration to the Base.

Before I invest efforts to making a pull request, I would like to have some discussions here. One of the reasons I hesitate doing this is that the functors are no longer needed when inlining of function arguments is implemented at the compiler part and we can obtain the return type of functions programmatically (both are on the list of #3440). However, I am not sure how soon this may happen.

Currently, there are four plans:

(A) Start the migration now, which involves 
  • Introduce the Functor class to the base -- so that user can define its own functors
  • Extend the map and map! function to accept functors
  • Rewrite the reduction related function to do computation in a cache-friendly way and accept functors as arguments    
(B) Start the migration, except that functors are used as an internal facilities to support implementation instead of being exposed for general use. 

(C) Wait until the aforementioned improvement happens, then do the migration in smaller scale:
  • the functions like map and broadcast automatically gets much better performance when functions are passed in
  • still have to rewrite the reduction functions so that they perform computation in a cache friendly order.
  • still useful to add methods such as sum!(output, x, dim) to allow writing the results of reduction along specific dimensions to a pre-allocated storage
(D) Currently, the functions in NumericFunctors.jl use prefix ``v`` to avoid clash with base functions (e.g. vsum, vmax, vmap, vreduce, etc). I tried to simply import the base functions and extend them to accept functors, but run into various kinds of ambiguities (major reason being that the function argument is declared as Any instead of Function). To address this, I can:
  • Do some clean up of the declaration of relevant base functions (e.g. ``map``, ``reduce``, ``sum``, etc), making them to be easier to extend
  • rename the functions in NumericFunctors to those without v-prefix (e.g. ``vmap`` ==> ``map``), so that they are extensions of the base functions
In this way, people can get performance transparently
using NumericFunctors  

x
= sum(x, 2)  # the using statement above triggers a 5x speed-up

Any thoughts about which plan work better ?

Dahua

unread,
Jun 20, 2013, 11:23:13 AM6/20/13
to juli...@googlegroups.com
Please do not depend on the v-prefix function names such as vmap, vreduce, vsum yet. 

I am working on Plan (D) -- I am going to use the Base names such as map, reduce, and sum, etc and define such functions as extensions/specialized versions of the base functions. This makes it much easier and clean to use them.

I expect to finish this change by today.

Tim Holy

unread,
Jun 20, 2013, 11:33:07 AM6/20/13
to juli...@googlegroups.com
I like that choice, personally.

--Tim

On Thursday, June 20, 2013 08:23:13 AM Dahua wrote:
> Please do not depend on the v-prefix function names such as vmap, vreduce,
> vsum yet.
>
> I am working on Plan (D) -- I am going to use the Base names such as map,
> reduce, and sum, etc and define such functions as extensions/specialized
> versions of the base functions. This makes it much easier and clean to use
> them.
>
> I expect to finish this change by today.
>
> On Thursday, June 20, 2013 8:11:23 AM UTC-5, Dahua wrote:
> > One of the reasons I created this is to prepare for migration to the Base.
> >
> > Before I invest efforts to making a pull request, I would like to have
> > some discussions here. One of the reasons I hesitate doing this is that
> > the
> > functors are no longer needed when inlining of function arguments is
> > implemented at the compiler part and we can obtain the return type of
> > functions programmatically (both are on the list of
> > #3440<https://github.com/JuliaLang/julia/issues/3440>). However, I am not
> > sure how soon this may happen.
> >
> > Currently, there are four plans:
> >
> > (A) Start the migration now, which involves
> >
> > - Introduce the Functor class to the base -- so that user can define
> > its own functors
> > - Extend the map and map! function to accept functors
> > - Rewrite the reduction related function to do computation in a
> > cache-friendly way and accept functors as arguments
> >
> > (B) Start the migration, except that functors are used as an internal
> > facilities to support implementation instead of being exposed for general
> > use.
> >
> > (C) Wait until the aforementioned improvement happens, then do the
> >
> > migration in smaller scale:
> > - the functions like map and broadcast automatically gets much better
> > performance when functions are passed in
> > - still have to rewrite the reduction functions so that they perform
> > computation in a cache friendly order.
> > - still useful to add methods such as sum!(output, x, dim) to allow
> > writing the results of reduction along specific dimensions to a
> > pre-allocated storage
> >
> > (D) Currently, the functions in NumericFunctors.jl use prefix ``v`` to
> > avoid clash with base functions (e.g. vsum, vmax, vmap, vreduce, etc). I
> > tried to simply import the base functions and extend them to accept
> > functors, but run into various kinds of ambiguities (major reason being
> > that the function argument is declared as Any instead of Function). To
> >
> > address this, I can:
> > - Do some clean up of the declaration of relevant base functions (e.g.
> > ``map``, ``reduce``, ``sum``, etc), making them to be easier to extend
> > - rename the functions in NumericFunctors to those without v-prefix

Dahua

unread,
Jun 20, 2013, 9:07:28 PM6/20/13
to juli...@googlegroups.com
The modification has been implemented. 

Now the package provides extended/specialized versions of map, map!, reduce, mapreduce, sum, max, min, etc to accept functor arguments, and additionally provides mapreduce!, reduce!, sum!, max!, min!, asum!, sqsum!, amax!, amin!, ... and a bunch of additional functions. 

These changes make the changes much easier. For example,

From
sum(abs2(x), 2)
to 
using NumericFunctors

sum
(abs2(x), 2)  # the sum method being called here has been
                 
# replaced by a new specialized version

You get about 5x speed up. Furthermore, if you write
using NumericFunctors

sqsum
(x, 2)

Then the speed-up becomes 10x.

Document has been updated to reflect new changes.

Dahua

unread,
Jun 21, 2013, 6:51:14 PM6/21/13
to juli...@googlegroups.com
A lot of updates lately:

-  add a bunch of improved statistics functions: mean, var, std, entropy, logsumexp, softmax, ...  All these functions have a version (e.g. mean!, var!, ...) that allows write results to pre-allocated arrays when computing stats along specific dimensions.

  It is worth noting that using this package,  var(x, 1) and var(x, 2) is about 30 - 50 times faster than the builtin one.

- add a view function, which creates a shared-memory array view of a contiguous part. Unlike the sub function, this view function creates a shared-memory array using pointer_to_array, which is much faster than the SubArray instance in element indexing.

Should I start migrating part of the functions into the Base? The first step might be to migrate the improved versions of reduction/statistics functions. Is it a good plan?

Viral Shah

unread,
Jun 21, 2013, 9:47:44 PM6/21/13
to juli...@googlegroups.com

Yes please start migrating. A PR will help sort things out.

-viral

Viral Shah

unread,
Jun 22, 2013, 5:19:15 AM6/22/13
to juli...@googlegroups.com
I see that or views, you are directly returning a Ptr.

There has been some discussion on views, and to make getindex() operations return views by default. One thought I have is to update the subarray implementation so that it internally stores pointers to array offsets.

Subarrays implemented this way do not need to be restricted to contiguous data only, but could also be extended to strided data. For those getindex operations that return strided data, we return subarrays, and in other cases, we can allocate a new array. Basically, indexing operations such as A[x,y] where x is a contiguous set of rows, and y is a contiguous set of columns can be made to return subarrays. This also maps well to calling BLAS and LAPACK.

-viral

Dahua

unread,
Jun 22, 2013, 1:43:43 PM6/22/13
to juli...@googlegroups.com
Improved subarray implementation is being discussed in https://github.com/JuliaLang/julia/issues/3496

Dahua

unread,
Jun 22, 2013, 1:47:40 PM6/22/13
to juli...@googlegroups.com
The scope of this package has been substantially extended than I had initially thought, which now provides high performance computational support in a variety of aspects.

Hence, the name ``NumericFunctors`` has no longer appropriately reflect this, and I am considering renaming the package. Here are some of the names that come off my head:

FastComputation.jl
FastNumerics.jl

NumericSupport.jl
ComputationSupport.jl

NumericExtensions.jl

Any ideas?

John Myles White

unread,
Jun 22, 2013, 1:51:36 PM6/22/13
to juli...@googlegroups.com
NumericExtensions.jl seems best to me since it suggests that much of this material will eventually belong in Base.

 -- John

Stefan Karpinski

unread,
Jun 22, 2013, 2:04:04 PM6/22/13
to Julia Dev
Yes, that seems like the best name to me as well.

Dahua

unread,
Jun 22, 2013, 10:23:48 PM6/22/13
to juli...@googlegroups.com
The package has been renamed to NumericExtensions as of version 0.2.5.

Dahua

unread,
Jun 23, 2013, 1:24:05 AM6/23/13
to juli...@googlegroups.com
New updates introduce an unsafe_view function to get contiguous slices/blocks of an array. These views are immutable instances, which do not do bound checking and use low-level function & pointer arithmetics to access elements. Benchmark shows that they have better performance than Array in element indexing:

Here are results:
1D indexing throughput:
   builtin array:  8.1082 Gbyte/s
   unsafe_view:   13.6659 Gbyte/s  |  gain = 1.685x
2D indexing throughput:
   builtin array:  4.9298 Gbyte/s
   unsafe_view:    6.5563 Gbyte/s  |  gain = 1.330x
3D indexing throughput:
   builtin array:  3.2948 Gbyte/s
   unsafe_view:    4.6223 Gbyte/s  |  gain = 1.403x

In performance-critical functions, one may use the unsafe_view function in NumericExtensions.jl to get efficiency. The usage is just like sub, but also 200x faster. Examples:
unsafe_view(a, :, j)
unsafe_view
(a, :, j0:j1)
unsafe_view
(a, i0:i1,j)
unsafe_view
(a, :, :, k0:k1)
...

This function should be used with caution, preferably only in local context where you are sure that the parent array would not go away and that the index does not go out of bound.

In the mean time, potential approaches to improving the sub function are being discussed in issue #3496.

Dahua

unread,
Jul 28, 2013, 11:12:45 PM7/28/13
to juli...@googlegroups.com
Lately, I added/extended functions for vector norms, normalization, and scanning (cumsum, cummax, cummin). 

In particular, the new scanning functions use highly optimized implementation and supports inplace computation.

Please refer to the following pages for details:
Reply all
Reply to author
Forward
0 new messages