in-place vector/matrix operations

S Wade

unread,

Dec 1, 2013, 10:41:29 AM12/1/13

to julia-users

Hi all,

For some performance sensitive neural network computations that I'm doing, I'm finding the need to manually define `exp!(vector)` and `log!(vector)` to avoid allocations.

Would it make sense to change the `@vectorize_1arg` macro in operators.jl to automatically define these or does this cause unforseen-by-me consequences that make it a bad idea?

It looks like it would define '!' variants of:

iround, itrunc, ifloor, iceil, abs, abs2, angle, isnan, isinf, isfinite, sinpi, cospi, sinc, cosc, radians2degrees, degrees2radians, sind, cosd, tand, exp10, round, floor, gamma, lfact, exponent, airy, airyprime, airyai, airyaiprime, airybi, airybiprime, digamma, trigamma, invdigamma, eta, zeta, erfinv, erfcinv, asin*, acos*, atan*, acsc*, acot*, log*, bessel*, ceil, trunc, significand, hypot, etc.

thanks,

wade

Viral Shah

unread,

Dec 1, 2013, 11:21:56 AM12/1/13

to julia...@googlegroups.com

IIRC, Jeff had reservations against this as it would increase the number of functions significantly in base. I think it may be unavoidable to do this, until we have a better solution.

-viral

Steven G. Johnson

unread,

Dec 1, 2013, 2:22:44 PM12/1/13

to julia...@googlegroups.com

I have reservations about this as well. If you find yourself needing to define exp! and log! for performance reasons, probably you want to devectorize completely and just use loops. Rarely does a performance-critical routine perform exp only .... you are usually doing some other computations as well, and so you should combine all of those computations into a single loop if you care about performance.

Matters are different with matrix operations, fft, sort, etcetera, since those are complicated and expensive enough that they cannot easily be inlined and combined with other computations. So in those cases it makes sense to provide in-place variants.

John Myles White

unread,

Dec 1, 2013, 3:25:49 PM12/1/13

to julia...@googlegroups.com

I think we need to make it a priority to agree on a general abstraction for handling these kind of in-place operations.

— John

Diego Javier Zea

unread,

Dec 1, 2013, 6:36:23 PM12/1/13

to julia...@googlegroups.com

Maybe its more easy make the mutating in-place functions the default and use copy, or a list comprehension, in a explicit way when a non mutating operation is required (?)

S Wade

unread,

Dec 1, 2013, 6:43:47 PM12/1/13

to julia-users

Hi Steve,

That makes sense. I'm was using loops but for clarity, I've factored out exp!() functions, especially when computing softmax operations like this:

exp!(vec)

total = sum(vec)

vec /= total

thanks,

wade

Steven G. Johnson

unread,

Dec 1, 2013, 8:16:43 PM12/1/13

to julia...@googlegroups.com

If it is performance-critical, the separate sum and exp functions are certainly suboptimal (they double the cache pressure) compared to a single fused loop.

Milan Bouchet-Valat

unread,

Dec 2, 2013, 8:16:29 AM12/2/13

to julia...@googlegroups.com

Isn't this a perfect case for Dahua Lin's Devectorize package?

https://github.com/lindahua/Devectorize.jl

S Wade

unread,

Dec 2, 2013, 2:18:03 PM12/2/13

to julia-users

So, I thought I'd give that a try to see how much difference it would make in the simple case of softmax(). I'm getting really weird results that I don't understand: combining the sum and exp() in the same loop is ~4x worse in my test. I must be doing something wrong; any help would be appreciated. Here's the code:

# separate total

function softmax!{T <: Real}(vec :: Vector{T})

for i = 1:length(vec)

vec[i] = exp(vec[i])

end

total = sum(vec)

vec /= total

return vec

end

# total + exp in same loop

function sm2!{T <: Real}(vec :: Vector{T})

total = 0

for i = 1:length(vec)

vec[i] = exp(vec[i])

total += vec[i]

end

vec /= total

return vec

end

function test_softmax()

println("-----------------------------------------------------------------------")

vecs = [ randn(52) for x = 1:1_000_000 ]

tmp = deepcopy(vecs)

tic()

for vec in vecs

softmax!(vec)

end

toc()

tic()

for vec in tmp

sm2!(vec)

end

toc()

end

test_softmax()

Here's what I see in terms of output:

-----------------------------------------------------------------------

elapsed time: 1.290004495 seconds

elapsed time: 7.173124821 seconds

Billou Bielour

unread,

Dec 2, 2013, 2:31:13 PM12/2/13

to julia...@googlegroups.com

Try to replace:

total = 0

By

total = 0.0

S Wade

unread,

Dec 2, 2013, 2:40:44 PM12/2/13

to julia-users

Wow, thanks. I wouldn't have guessed that. Now the times look like this:

-----------------------------------------------------------------------

elapsed time: 1.285840271 seconds

elapsed time: 1.531912627 seconds

Still worse while combining, but much closer.

thanks,

wade

Andreas Noack Jensen

unread,

Dec 2, 2013, 2:46:03 PM12/2/13

to julia...@googlegroups.com

This one avoids copying vec

function sm3!(vec::Vector)

total = zero(exp(vec[1]))

lv = length(vec)

for i = 1:lv

tmp = exp(vec[i])

total += tmp

vec[i] = tmp

end

for i = 1:lv

vec[i] /= total

end

return vec

end

2013/12/2 S Wade <swade...@gmail.com>

--
Med venlig hilsen

Andreas Noack Jensen

Kevin Squire

unread,

Dec 2, 2013, 2:48:17 PM12/2/13

to julia...@googlegroups.com

And as Milan pointed out above, Devectorize.jl can probably do most of this for you automatically.

Cheers, Kevin

S Wade

unread,

Dec 2, 2013, 2:52:52 PM12/2/13

to julia-users

Thanks Adreas,

This is about 10% faster. Is the purpose of "total = zero(exp(vec[1]))" just to ensure that the types are correct?

BTW, which statements in sm2! caused copying?

Andreas Noack Jensen

unread,

Dec 2, 2013, 3:22:22 PM12/2/13

to julia...@googlegroups.com

Yes, it was to ensure the correct type. The line vec /= total makes a copy of vec.

2013/12/2 S Wade <swade...@gmail.com>

Steven G. Johnson

unread,

Dec 2, 2013, 4:33:31 PM12/2/13

to julia...@googlegroups.com

On Monday, December 2, 2013 2:46:03 PM UTC-5, Andreas Noack Jensen wrote:

for i = 1:lv
vec[i] /= total
end

Probably better to do
scale = 1/total

for i = 1:lv

vec[i] *= scale
end
since multiplications are faster than divisions.

Simon Kornblith

unread,

Dec 2, 2013, 5:35:49 PM12/2/13

to julia...@googlegroups.com

If you want to cheat, there is a softmax! function in NumericExtensions (https://github.com/lindahua/NumericExtensions.jl) that has all of these optimizations.

Simon

Reply all

Reply to author

Forward