in-place vector/matrix operations

405 views
Skip to first unread message

S Wade

unread,
Dec 1, 2013, 10:41:29 AM12/1/13
to julia-users
Hi all,

For some performance sensitive neural network computations that I'm doing, I'm finding the need to manually define `exp!(vector)` and `log!(vector)` to avoid allocations.

Would it make sense to change the `@vectorize_1arg` macro in operators.jl to automatically define these or does this cause unforseen-by-me consequences that make it a bad idea?

It looks like it would define '!' variants of:

iround, itrunc, ifloor, iceil, abs, abs2, angle, isnan, isinf, isfinite, sinpi, cospi, sinc, cosc, radians2degrees, degrees2radians, sind, cosd, tand, exp10, round, floor, gamma, lfact, exponent, airy, airyprime, airyai, airyaiprime, airybi, airybiprime, digamma, trigamma, invdigamma, eta, zeta, erfinv, erfcinv, asin*, acos*, atan*, acsc*, acot*, log*, bessel*, ceil, trunc, significand, hypot, etc.

thanks,
wade

Viral Shah

unread,
Dec 1, 2013, 11:21:56 AM12/1/13
to julia...@googlegroups.com
IIRC, Jeff had reservations against this as it would increase the number of functions significantly in base. I think it may be unavoidable to do this, until we have a better solution.

-viral

Steven G. Johnson

unread,
Dec 1, 2013, 2:22:44 PM12/1/13
to julia...@googlegroups.com
I have reservations about this as well. If you find yourself needing to define exp! and log! for performance reasons, probably you want to devectorize completely and just use loops.   Rarely does a performance-critical routine perform exp only .... you are usually doing some other computations as well, and so you should combine all of those computations into a single loop if you care about performance.

Matters are different with matrix operations, fft, sort, etcetera, since those are complicated and expensive enough that they cannot easily be inlined and combined with other computations.  So in those cases it makes sense to provide in-place variants.

John Myles White

unread,
Dec 1, 2013, 3:25:49 PM12/1/13
to julia...@googlegroups.com
I think we need to make it a priority to agree on a general abstraction for handling these kind of in-place operations.

— John

Diego Javier Zea

unread,
Dec 1, 2013, 6:36:23 PM12/1/13
to julia...@googlegroups.com
Maybe its more easy make the mutating in-place functions the default and use copy, or a list comprehension, in a explicit way when a non mutating operation is required (?)

S Wade

unread,
Dec 1, 2013, 6:43:47 PM12/1/13
to julia-users
Hi Steve,

That makes sense.  I'm was using loops but for clarity, I've factored out exp!() functions, especially when computing softmax operations like this:

exp!(vec)
total = sum(vec)
vec /= total

thanks,
wade

Steven G. Johnson

unread,
Dec 1, 2013, 8:16:43 PM12/1/13
to julia...@googlegroups.com
If it is performance-critical, the separate sum and exp functions are certainly suboptimal (they double the cache pressure) compared to a single fused loop.

Milan Bouchet-Valat

unread,
Dec 2, 2013, 8:16:29 AM12/2/13
to julia...@googlegroups.com
Isn't this a perfect case for Dahua Lin's Devectorize package?

https://github.com/lindahua/Devectorize.jl

S Wade

unread,
Dec 2, 2013, 2:18:03 PM12/2/13
to julia-users
So, I thought I'd give that a try to see how much difference it would make in the simple case of softmax().  I'm getting really weird results that I don't understand: combining the sum and exp() in the same loop is ~4x worse in my test.  I must be doing something wrong; any help would be appreciated.  Here's the code:

# separate total
function softmax!{T <: Real}(vec :: Vector{T})
  for i = 1:length(vec)
    vec[i] = exp(vec[i])
  end
  total = sum(vec)
  vec /= total
  return vec
end

# total + exp in same loop                                                                                                                                                                             
function sm2!{T <: Real}(vec :: Vector{T})
  total = 0
  for i = 1:length(vec)
    vec[i] = exp(vec[i])
    total += vec[i]
  end
  vec /= total
  return vec
end

function test_softmax()
  println("-----------------------------------------------------------------------")
  vecs = [ randn(52) for x = 1:1_000_000 ]
  tmp = deepcopy(vecs)
  
  tic()
  for vec in vecs
    softmax!(vec)
  end
  toc()

  tic()
  for vec in tmp
    sm2!(vec)
  end
  toc()
end

test_softmax()

Here's what I see in terms of output:

-----------------------------------------------------------------------                                                                                                       
elapsed time: 1.290004495 seconds                                                                                                                                             
elapsed time: 7.173124821 seconds  

Billou Bielour

unread,
Dec 2, 2013, 2:31:13 PM12/2/13
to julia...@googlegroups.com
Try to replace:

total = 0

By

total = 0.0

S Wade

unread,
Dec 2, 2013, 2:40:44 PM12/2/13
to julia-users
Wow, thanks.  I wouldn't have guessed that. Now the times look like this:

-----------------------------------------------------------------------                                                                                                       
elapsed time: 1.285840271 seconds                                                                                                                                             
elapsed time: 1.531912627 seconds

Still worse while combining, but much closer.

thanks,
wade

Andreas Noack Jensen

unread,
Dec 2, 2013, 2:46:03 PM12/2/13
to julia...@googlegroups.com
This one avoids copying vec

function sm3!(vec::Vector)
  total = zero(exp(vec[1]))
  lv = length(vec)
  for i = 1:lv
    tmp = exp(vec[i])
    total += tmp
    vec[i] = tmp
  end
  for i = 1:lv
    vec[i] /= total
  end
  return vec
end


2013/12/2 S Wade <swade...@gmail.com>



--
Med venlig hilsen

Andreas Noack Jensen

Kevin Squire

unread,
Dec 2, 2013, 2:48:17 PM12/2/13
to julia...@googlegroups.com
And as Milan pointed out above, Devectorize.jl can probably do most of this for you automatically.

Cheers, Kevin

S Wade

unread,
Dec 2, 2013, 2:52:52 PM12/2/13
to julia-users
Thanks Adreas,

This is about 10% faster.  Is the purpose of "total = zero(exp(vec[1]))" just to ensure that the types are correct?

BTW, which statements in sm2! caused copying?

Andreas Noack Jensen

unread,
Dec 2, 2013, 3:22:22 PM12/2/13
to julia...@googlegroups.com
Yes, it was to ensure the correct type. The line vec /= total makes a copy of vec.


2013/12/2 S Wade <swade...@gmail.com>

Steven G. Johnson

unread,
Dec 2, 2013, 4:33:31 PM12/2/13
to julia...@googlegroups.com


On Monday, December 2, 2013 2:46:03 PM UTC-5, Andreas Noack Jensen wrote: 
  for i = 1:lv
    vec[i] /= total
  end


Probably better to do
     scale = 1/total

     for i = 1:lv
         vec[i] *= scale
     end
since multiplications are faster than divisions.

Simon Kornblith

unread,
Dec 2, 2013, 5:35:49 PM12/2/13
to julia...@googlegroups.com
If you want to cheat, there is a softmax! function in NumericExtensions (https://github.com/lindahua/NumericExtensions.jl) that has all of these optimizations.

Simon
Reply all
Reply to author
Forward
0 new messages