Skip to first unread message

Sep 6, 2012, 10:46:01 PM9/6/12

to juli...@googlegroups.com

Greetings Julia Gurus:

I've just started using Julia. Nice language. I'm really impressed with the thought that has been put into it.

I am having a problem, and I was hoping someone could offer some guidance. I can't figure out why my Julia code

is not running faster. I am probably overlooking something simple, but I don't know what.

Here is my code:

% cat llst.jl

# log likelihood of a student's t distribution with arbitrary location and scale

function llst(A::Array{Float64}, mu::Float64, sigma::Float64, df::Float64)

n = numel(A)

c = n * (lgamma((df+1.)/2.) - 0.5 * log(df * pi) - lgamma(df/2.) - log(sigma))

# No difference found in execution time between reduce(f...) and sum(map(f...))

# s = -((df + 1.) / 2.) * reduce((ll, x) -> ll+log1p(((x - mu)/sigma)^2/df), 0., A)

s = -((df + 1.) / 2.) * sum(map(x -> log1p(((x - mu)/sigma)^2/df), A))

s + c

end

const vec = [1:1000]*.002

println("llst(vec,1.,2.,3.) = ", llst(vec,1.,2.,3.))

@time for i = 1:1000 llst(vec,1.,2.,3.) end

##

When I run this code on my computer, which is an ancient Mac Pro with 2.66 GHz Dual-core

Xeon processors running Mac OS 10.7.4, I get the following:

% julia

_

_ _ _(_)_ |

(_) | (_) (_) |

_ _ _| |_ __ _ | A fresh approach to technical computing

| | | | | | |/ _` | |

| | |_| | | | (_| | | Version 0.0.0+95599289.r17c9

_/ |\__'_|_|_|\__'_| | Commit 17c9d27eaa (2012-09-02 11:00:35)

|__/ |

julia> load("llst.jl")

llst(vec,1.,2.,3.) = -1748.2553729185386

elapsed time: 0.5310850143432617 seconds

##

For comparison, here is the original R code:

llst <- function(X, mu, sigma, df) {

# Computes the log likelihood that (X - mu)/sigma is distributed

# according to a t-distribution with df degrees of freedom.

N <- length(X);

C <- N * (lgamma((df+1)/2) - 0.5*log(df * pi) - lgamma(df/2) - log(sigma));

S <- -((df + 1)/2) * sum(log(1 + ((X-mu)/sigma)^2/df));

S+C

}

vec <- (1:1000)*0.002

cat(sprintf("llst(vec,1,2,3) = %f\n", llst(vec,1,2,3)))

print(system.time(for (i in 1:1000) llst(vec,1,2,3)))

##

When I run the R code, this is the output that I receive:

% R

R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"

Copyright (C) 2012 The R Foundation for Statistical Computing

ISBN 3-900051-07-0

Platform: i386-apple-darwin9.8.0/i386 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

> source("llst.R")

llst(vec,1,2,3) = -1748.255373

user system elapsed

0.086 0.011 0.096

##

As you can see from the above, the Julia code runs in 0.531

elapsed seconds, whereas the equivalent R code runs in 0.096

elapsed seconds. Why is that?

To try to get a better handle on this, I constructed the following

simplified example. The simplified code computes a random

vector of length N, and then sums it N times (for a total of N^2

additions). This is done for n=256,512,1024,..., 65,536. See

below.

% cat simple.jl

function simple(n::Int64)

vec = [rand() for i=1:n]

s = [sum(vec) for i=1:n]

end

for i=8:16 @time simple(2^i) end

% julia

_

_ _ _(_)_ |

(_) | (_) (_) |

_ _ _| |_ __ _ | A fresh approach to technical computing

| | | | | | |/ _` | |

| | |_| | | | (_| | | Version 0.0.0+95599289.r17c9

_/ |\__'_|_|_|\__'_| | Commit 17c9d27eaa (2012-09-02 11:00:35)

|__/ |

julia> load("simple.jl")

elapsed time: 0.015284061431884766 seconds

elapsed time: 0.0011169910430908203 seconds

elapsed time: 0.004406929016113281 seconds

elapsed time: 0.0424809455871582 seconds

elapsed time: 0.06986188888549805 seconds

elapsed time: 0.2790408134460449 seconds

elapsed time: 1.1160101890563965 seconds

elapsed time: 4.4562668800354 seconds

elapsed time: 17.841228008270264 seconds

##

Now, here is the same code in R:

% cat simple.R

simple <- function(n) {

vec <- as.double(runif(n))

s <- sapply(1:n, function(k) sum(vec))

}

for (i in 8:16) print(system.time(simple(2^i)))

% R

R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"

Copyright (C) 2012 The R Foundation for Statistical Computing

ISBN 3-900051-07-0

Platform: i386-apple-darwin9.8.0/i386 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

> source("simple.R")

user system elapsed

0.001 0.000 0.001

user system elapsed

0.002 0.000 0.002

user system elapsed

0.004 0.000 0.004

user system elapsed

0.012 0.000 0.013

user system elapsed

0.042 0.000 0.042

user system elapsed

0.151 0.000 0.150

user system elapsed

0.562 0.002 0.565

user system elapsed

2.165 0.002 2.169

user system elapsed

8.500 0.006 8.507

##

In this example, the advantage of R over Julia has decreased, but it is

still winning by a factor of 2.

Any ideas about what may explain this? More importantly, what can I

do about it?

Thanks in advance for your insight.

Matthew Clegg

I've just started using Julia. Nice language. I'm really impressed with the thought that has been put into it.

I am having a problem, and I was hoping someone could offer some guidance. I can't figure out why my Julia code

is not running faster. I am probably overlooking something simple, but I don't know what.

Here is my code:

% cat llst.jl

# log likelihood of a student's t distribution with arbitrary location and scale

function llst(A::Array{Float64}, mu::Float64, sigma::Float64, df::Float64)

n = numel(A)

c = n * (lgamma((df+1.)/2.) - 0.5 * log(df * pi) - lgamma(df/2.) - log(sigma))

# No difference found in execution time between reduce(f...) and sum(map(f...))

# s = -((df + 1.) / 2.) * reduce((ll, x) -> ll+log1p(((x - mu)/sigma)^2/df), 0., A)

s = -((df + 1.) / 2.) * sum(map(x -> log1p(((x - mu)/sigma)^2/df), A))

s + c

end

const vec = [1:1000]*.002

println("llst(vec,1.,2.,3.) = ", llst(vec,1.,2.,3.))

@time for i = 1:1000 llst(vec,1.,2.,3.) end

##

When I run this code on my computer, which is an ancient Mac Pro with 2.66 GHz Dual-core

Xeon processors running Mac OS 10.7.4, I get the following:

% julia

_

_ _ _(_)_ |

(_) | (_) (_) |

_ _ _| |_ __ _ | A fresh approach to technical computing

| | | | | | |/ _` | |

| | |_| | | | (_| | | Version 0.0.0+95599289.r17c9

_/ |\__'_|_|_|\__'_| | Commit 17c9d27eaa (2012-09-02 11:00:35)

|__/ |

julia> load("llst.jl")

llst(vec,1.,2.,3.) = -1748.2553729185386

elapsed time: 0.5310850143432617 seconds

##

For comparison, here is the original R code:

llst <- function(X, mu, sigma, df) {

# Computes the log likelihood that (X - mu)/sigma is distributed

# according to a t-distribution with df degrees of freedom.

N <- length(X);

C <- N * (lgamma((df+1)/2) - 0.5*log(df * pi) - lgamma(df/2) - log(sigma));

S <- -((df + 1)/2) * sum(log(1 + ((X-mu)/sigma)^2/df));

S+C

}

vec <- (1:1000)*0.002

cat(sprintf("llst(vec,1,2,3) = %f\n", llst(vec,1,2,3)))

print(system.time(for (i in 1:1000) llst(vec,1,2,3)))

##

When I run the R code, this is the output that I receive:

% R

R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"

Copyright (C) 2012 The R Foundation for Statistical Computing

ISBN 3-900051-07-0

Platform: i386-apple-darwin9.8.0/i386 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

> source("llst.R")

llst(vec,1,2,3) = -1748.255373

user system elapsed

0.086 0.011 0.096

##

As you can see from the above, the Julia code runs in 0.531

elapsed seconds, whereas the equivalent R code runs in 0.096

elapsed seconds. Why is that?

To try to get a better handle on this, I constructed the following

simplified example. The simplified code computes a random

vector of length N, and then sums it N times (for a total of N^2

additions). This is done for n=256,512,1024,..., 65,536. See

below.

% cat simple.jl

function simple(n::Int64)

vec = [rand() for i=1:n]

s = [sum(vec) for i=1:n]

end

for i=8:16 @time simple(2^i) end

% julia

_

_ _ _(_)_ |

(_) | (_) (_) |

_ _ _| |_ __ _ | A fresh approach to technical computing

| | | | | | |/ _` | |

| | |_| | | | (_| | | Version 0.0.0+95599289.r17c9

_/ |\__'_|_|_|\__'_| | Commit 17c9d27eaa (2012-09-02 11:00:35)

|__/ |

julia> load("simple.jl")

elapsed time: 0.015284061431884766 seconds

elapsed time: 0.0011169910430908203 seconds

elapsed time: 0.004406929016113281 seconds

elapsed time: 0.0424809455871582 seconds

elapsed time: 0.06986188888549805 seconds

elapsed time: 0.2790408134460449 seconds

elapsed time: 1.1160101890563965 seconds

elapsed time: 4.4562668800354 seconds

elapsed time: 17.841228008270264 seconds

##

Now, here is the same code in R:

% cat simple.R

simple <- function(n) {

vec <- as.double(runif(n))

s <- sapply(1:n, function(k) sum(vec))

}

for (i in 8:16) print(system.time(simple(2^i)))

% R

R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"

Copyright (C) 2012 The R Foundation for Statistical Computing

ISBN 3-900051-07-0

Platform: i386-apple-darwin9.8.0/i386 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

> source("simple.R")

user system elapsed

0.001 0.000 0.001

user system elapsed

0.002 0.000 0.002

user system elapsed

0.004 0.000 0.004

user system elapsed

0.012 0.000 0.013

user system elapsed

0.042 0.000 0.042

user system elapsed

0.151 0.000 0.150

user system elapsed

0.562 0.002 0.565

user system elapsed

2.165 0.002 2.169

user system elapsed

8.500 0.006 8.507

##

In this example, the advantage of R over Julia has decreased, but it is

still winning by a factor of 2.

Any ideas about what may explain this? More importantly, what can I

do about it?

Thanks in advance for your insight.

Matthew Clegg

Sep 7, 2012, 5:50:10 AM9/7/12

to juli...@googlegroups.com

Hi Matthew,

Welcome to Julia!

For your original problem, try this instead:

s = -((df + 1.) / 2.) * sum(log1p(((A-mu)/sigma).^2/df))

I think you'll be a bit happier (I get a 4x performance boost).

For your second example, I noticed that the time includes random number

generation. I can't speak to whether R and Julia use the same algorithm, but

that seems a bit unlikely. Because different rngs differ in their quality and

speed, and there is sometimes a tradeoff, I would generally not include rng

timing in performance evaluations (unless that was the topic I was focusing

on).

Best,

--Tim

Welcome to Julia!

For your original problem, try this instead:

s = -((df + 1.) / 2.) * sum(log1p(((A-mu)/sigma).^2/df))

I think you'll be a bit happier (I get a 4x performance boost).

For your second example, I noticed that the time includes random number

generation. I can't speak to whether R and Julia use the same algorithm, but

that seems a bit unlikely. Because different rngs differ in their quality and

speed, and there is sometimes a tradeoff, I would generally not include rng

timing in performance evaluations (unless that was the topic I was focusing

on).

Best,

--Tim

Sep 7, 2012, 10:09:48 AM9/7/12

to juli...@googlegroups.com

On Fri, Sep 7, 2012 at 5:50 AM, Tim Holy <tim....@gmail.com> wrote:

> For your original problem, try this instead:

>

> s = -((df + 1.) / 2.) * sum(log1p(((A-mu)/sigma).^2/df))

When I run Tim's version, it's a little faster than the R version (by
> For your original problem, try this instead:

>

> s = -((df + 1.) / 2.) * sum(log1p(((A-mu)/sigma).^2/df))

about 7%) if I gc() both before running. If you unroll the loop, Julia

is more than twice as fast as the R version. Here's the unrolled

version:

# log likelihood of a student's t distribution with arbitrary location and scale

n = numel(A)

c = n * (lgamma((df+1.)/2.) - 0.5 * log(df * pi) - lgamma(df/2.) -

log(sigma))

z = 0.0
c = n * (lgamma((df+1.)/2.) - 0.5 * log(df * pi) - lgamma(df/2.) -

log(sigma))

for i in 1:n

z += log1p(((A[i]-mu)/sigma).^2/df)

end

s = -((df + 1.) / 2.) * z

s + c

end

println("llst3(vec,1.,2.,3.) = ", llst3(vec,1.,2.,3.))

gc(); @time for i = 1:1000 llst3(vec,1.,2.,3.) end

Sep 7, 2012, 11:22:15 AM9/7/12

to juli...@googlegroups.com

Yes unfortunately map() is still slow. This will be addressed.

Our sum() may also be slower because we use a more accurate algorithm.

In fact we're currently debating whether to keep it or sell out for

performance like everybody else.

> --

>

>

>

Our sum() may also be slower because we use a more accurate algorithm.

In fact we're currently debating whether to keep it or sell out for

performance like everybody else.

>

>

>

Sep 7, 2012, 11:30:03 AM9/7/12

to juli...@googlegroups.com

On Fri, Sep 7, 2012 at 11:22 AM, Jeff Bezanson <jeff.b...@gmail.com> wrote:

Our sum() may also be slower because we use a more accurate algorithm.

In fact we're currently debating whether to keep it or sell out for

performance like everybody else.

For those not following every development conversation:

It's been a rather long-running conversation — summation is pretty fundamental, don't want to get it "wrong" (sometimes getting it right does not mean "correct").

Sep 10, 2012, 11:08:33 AM9/10/12

to juli...@googlegroups.com

Thank you for your suggestions. Your comments have been very illuminating.

I thought I would summarize the results that I obtained when trying the various approaches. As you recall, the original code was as follows:

# log likelihood of a student's t distribution with arbitrary location and scale

function llst(A::Array{Float64}, mu::Float64, sigma::Float64, df::Float64)

n = numel(A)

c = n * (lgamma((df+1.)/2.) - 0.5 * log(df * pi) - lgamma(df/2.) - log(sigma))

s = -((df + 1.) / 2.) * sum(map(x -> log1p(((x - mu)/sigma)^2/df), A)) # **

s + c

end

Let's call this the "map" version. Any hope for a performance gain will require changing the line that I have marked with "# **". The following alternatives were considered:

"reduce" version:

s = -((df + 1.) / 2.) * reduce((ll, x) -> ll+log1p(((x - mu)/sigma)^2/df), 0., A)

"vectorized" version (suggested by Tim):

s = -((df + 1.) / 2.) * sum(log1p(((A - mu)/sigma).^2/df))

"list comprehension" version:

s = -((df + 1.) / 2.) * sum([log1p(((A[i] - mu)/sigma)^2/df) for i=1:length(A)])

"loop unrolling" version (suggested by Tom Short):

z = 0.0

for i in 1:n

z += log1p(((A[i] - mu)/sigma)^2/df)

end

s = -((df + 1.) / 2.) * z

Here are the execution times of each of these implementations on my machine (recall, I previously measured the execution time of the equivalent R code at 0.096 seconds):

map elapsed time: 0.5343310832977295 seconds

reduce elapsed time: 0.48698997497558594 seconds

vectorized elapsed time: 0.08964395523071289 seconds

list comprehension elapsed time: 0.061167001724243164 seconds

loop unrolling elapsed time: 0.056635141372680664 seconds

Thanks to Tom Short's suggestion, I was able to achieve a significant improvement over my initial implementation.

It would certainly be nice if map and reduce could be sped up, as these are two of my favorite idioms.

It was also very interesting to read the discussion about the implementation of sum(). I would vote for making the default implementation fast, while giving the user the option of choosing the slower but more precise implementation.

My own situation is that I have been doing a lot of R programming for the last several years. Two of my chief complaints about R are that the language was poorly designed and it is very inefficient. The design of Julia does a great job of addressing many of my complaints about R's language design, but this by itself is not enough to justify the daunting task of porting all of my code to Julia, as well as having to re-invent a few R packages along the way. I would need to see a substantial performance improvement as well. I would not be surprised if a lot of other R programmers feel the same way. So, in my view, a (zealous) focus on performance by the Julia team would not be misplaced.

Please do not interpret the above as a criticism. I think you have done a great job, and I hope you will keep up the good work.

Thanks again for the help with my code.

I have attached the code that was used to generate the above performance benchmarks.

Sep 10, 2012, 1:15:55 PM9/10/12

to juli...@googlegroups.com

Thanks for the nice summary. This progression will make a nice addition to our performance benchmarks.

We also hope that effortless parallelism will also help make julia more compelling as we go ahead, even as the compiler gets better, and library functions become faster over time.

-viral

We also hope that effortless parallelism will also help make julia more compelling as we go ahead, even as the compiler gets better, and library functions become faster over time.

-viral

Sep 11, 2012, 11:18:30 AM9/11/12

to juli...@googlegroups.com

Is there a reason that you want to evaluate the log-likelihood from from the definition instead of a function like qt in R? I would have written the log-likelihood function as

In Julia I would have written

lls1 <- function(X, mu, sigma, df) sum(dt((X-mu)/sigma, df, log=TRUE))

but some experimentation shows that it gives a different answer than your code and also takes longer, which I find surprising.

load("distributions.jl")

function llst(X, mu::Real, sigma::Real, df::Real)

dd = TDist(df)

ss = 0.

for x in X ss += logpdf(dd, (x-mu)/sigma) end

ss

end

which, on my machine, is roughly equivalent to the R version in timing and produces the same answer, -1055.1081923585919.

This equivalence is not surprising because the logpdf function for a T distribution uses the same code as does R for the low-level evaluation.

It is not easy to see how the calculation is actually being done julia/deps/Rmath/src/dt.c because that code is a twisty maze of mulit-line C macros. One thing you can notice is that it uses different code according to whether ((x-mu)/sigma)^2 > 0.2* df.

Sep 11, 2012, 3:58:02 PM9/11/12

to juli...@googlegroups.com

On Tuesday, September 11, 2012 11:18:30 AM UTC-4, Douglas Bates wrote:

Is there a reason that you want to evaluate the log-likelihood from from the definition instead of a function like qt in R? I would have written the log-likelihood function aslls1 <- function(X, mu, sigma, df) sum(dt((X-mu)/sigma, df, log=TRUE))

but some experimentation shows that it gives a different answer than your code and also takes longer, which I find surprising.

Please correct me if I am wrong, but I think you may have left out a 1/sigma constant. If p(x) is a probability density on x, and y = f(x), then shouldn't the probability density on y be p(f(y))f'(y)? In our case, f'(y) = 1/sigma. So I believe the proper function definition should be:

lls2 <- function(X, mu, sigma, df) sum(dt((X-mu)/sigma, df, log=TRUE)) - length(X)*log(sigma)

After making this change, you'll find that the functions agree:

> llst(vec, 1, 2, 3)

[1] -1748.255

> lls1(vec, 1, 2, 3)

[1] -1055.108

> lls2(vec, 1, 2, 3)

[1] -1748.255

You can check that this is the right density function (or at least reasonable) by integrating:

> dst <- function(X, mu, sigma, df) dt((X-mu)/sigma, df)/sigma

> integrate(dst, -20, 20, 1, 2, 3)

0.9978412 with absolute error < 5.7e-07

In Julia I would have writtenload("distributions.jl")function llst(X, mu::Real, sigma::Real, df::Real)dd = TDist(df)ss = 0.for x in X ss += logpdf(dd, (x-mu)/sigma) endssendwhich, on my machine, is roughly equivalent to the R version in timing and produces the same answer, -1055.1081923585919.This equivalence is not surprising because the logpdf function for a T distribution uses the same code as does R for the low-level evaluation.It is not easy to see how the calculation is actually being done julia/deps/Rmath/src/dt.c because that code is a twisty maze of mulit-line C macros. One thing you can notice is that it uses different code according to whether ((x-mu)/sigma)^2 > 0.2* df.

Why is my code faster? Certainly in Julia, one part of the answer is that I am computing the normalization constant only once, whereas repeatedly calling logpdf causes it to be recomputed for each element of X, and this is the major cost of evaluating the t density. (This is another advertisement for the benefits of memoization, although the memoization would need to occur deep in the internals of the code.)

Your message caused me to look for the first time at the actual R source code for the implementation of the t-distribution, and I was surprised to find that it also seems to repeatedly recompute the normalization constant for each of the elements of the vector passed to dt(X, df). (In other words, dt() is not itself vectorized.) Perhaps this is the entire explanation for the speedup that I get.

However, it's also the case that the manner in which dt is computed in R certainly does not match up with a straightforward translation of the t-distribution from a college textbook. In fact, my first reaction to this code was ... She did what!?! :)

I hope this clears things up a bit ...

Sep 12, 2012, 10:23:48 AM9/12/12

to juli...@googlegroups.com

On Tuesday, September 11, 2012 2:58:02 PM UTC-5, Matthew Clegg wrote:

On Tuesday, September 11, 2012 11:18:30 AM UTC-4, Douglas Bates wrote:Is there a reason that you want to evaluate the log-likelihood from from the definition instead of a function like qt in R? I would have written the log-likelihood function aslls1 <- function(X, mu, sigma, df) sum(dt((X-mu)/sigma, df, log=TRUE))but some experimentation shows that it gives a different answer than your code and also takes longer, which I find surprising.Please correct me if I am wrong, but I think you may have left out a 1/sigma constant. If p(x) is a probability density on x, and y = f(x), then shouldn't the probability density on y be p(f(y))f'(y)? In our case, f'(y) = 1/sigma. So I believe the proper function definition should be:lls2 <- function(X, mu, sigma, df) sum(dt((X-mu)/sigma, df, log=TRUE)) - length(X)*log(sigma)

You're correct, of course. Rather embarrassing for a Statistics professor not to realize that. I'm retired now so perhaps when I first learned probability densities behaved differently :-)

After making this change, you'll find that the functions agree:> llst(vec, 1, 2, 3)[1] -1748.255> lls1(vec, 1, 2, 3)[1] -1055.108> lls2(vec, 1, 2, 3)[1] -1748.255You can check that this is the right density function (or at least reasonable) by integrating:> dst <- function(X, mu, sigma, df) dt((X-mu)/sigma, df)/sigma> integrate(dst, -20, 20, 1, 2, 3)0.9978412 with absolute error < 5.7e-07In Julia I would have writtenload("distributions.jl")function llst(X, mu::Real, sigma::Real, df::Real)dd = TDist(df)ss = 0.for x in X ss += logpdf(dd, (x-mu)/sigma) endssendwhich, on my machine, is roughly equivalent to the R version in timing and produces the same answer, -1055.1081923585919.This equivalence is not surprising because the logpdf function for a T distribution uses the same code as does R for the low-level evaluation.It is not easy to see how the calculation is actually being done julia/deps/Rmath/src/dt.c because that code is a twisty maze of mulit-line C macros. One thing you can notice is that it uses different code according to whether ((x-mu)/sigma)^2 > 0.2* df.Why is my code faster? Certainly in Julia, one part of the answer is that I am computing the normalization constant only once, whereas repeatedly calling logpdf causes it to be recomputed for each element of X, and this is the major cost of evaluating the t density. (This is another advertisement for the benefits of memoization, although the memoization would need to occur deep in the internals of the code.)

Exactly.

Your message caused me to look for the first time at the actual R source code for the implementation of the t-distribution, and I was surprised to find that it also seems to repeatedly recompute the normalization constant for each of the elements of the vector passed to dt(X, df). (In other words, dt() is not itself vectorized.) Perhaps this is the entire explanation for the speedup that I get.

Well it would certainly help. Good work on seeing that. My analysis was too superficial. I like the julia-dev list because there are so many smart and knowledgeable people contributing and I get to learn a lot.

However, it's also the case that the manner in which dt is computed in R certainly does not match up with a straightforward translation of the t-distribution from a college textbook. In fact, my first reaction to this code was ... She did what!?! :)

As I said, a twisty maze of multi-line C macros - a style of programming that I find challenging to understand. All this is for the purpose of squeezing the last ounce of performance out of code that, as you show, is a bad design in the first place because the bulk of the computation for each value in a vector is needlessly repeated.

We could perhaps consider additional methods for the logpdf generic from extras/distributions.jl (another file that I will convert to a module if no one gets there before me) that vectorizes more intelligently when warranted.

Sep 13, 2012, 9:03:57 AM9/13/12

to juli...@googlegroups.com

On Wednesday, September 12, 2012 10:23:48 AM UTC-4, Douglas Bates wrote:

On Tuesday, September 11, 2012 2:58:02 PM UTC-5, Matthew Clegg wrote:Your message caused me to look for the first time at the actual R source code for the implementation of the t-distribution, and I was surprised to find that it also seems to repeatedly recompute the normalization constant for each of the elements of the vector passed to dt(X, df). (In other words, dt() is not itself vectorized.) Perhaps this is the entire explanation for the speedup that I get.Well it would certainly help. Good work on seeing that. My analysis was too superficial. I like the julia-dev list because there are so many smart and knowledgeable people contributing and I get to learn a lot.

I too am impressed by what I have seen so far of the Julia community. There seems to be a genuine willingness to help and an openness to new ideas. Also, it seems like there might be a lot of scope for interesting user contributions.

However, it's also the case that the manner in which dt is computed in R certainly does not match up with a straightforward translation of the t-distribution from a college textbook. In fact, my first reaction to this code was ... She did what!?! :)As I said, a twisty maze of multi-line C macros - a style of programming that I find challenging to understand. All this is for the purpose of squeezing the last ounce of performance out of code that, as you show, is a bad design in the first place because the bulk of the computation for each value in a vector is needlessly repeated.We could perhaps consider additional methods for the logpdf generic from extras/distributions.jl (another file that I will convert to a module if no one gets there before me) that vectorizes more intelligently when warranted.

In terms of additional methods for distributions.jl, I guess I would vote for a log likelihood method. I have seen many problems that eventually boil down to defining the proper likelihood function and then maximizing it. Because computing a maximum likelihood fit involves a sum, there is a potential to eliminate the creation of an intermediate vector of (log) density values and to compute the sum in a tight inner loop.

However, I think an even better approach would be to go down to the C code and, where appropriate, explicitly vectorize the various d..., p..., q..., and r... functions, and perhaps add a family of ll... functions. It's conceivable that the trick of factoring out the normalization constant could be used in a few other cases, and vectorizing the C code would greatly reduce the number of calls from Julia to C, which are probably more costly than calling from C to C. Perhaps other optimizations would reveal themselves as well.

By my count, there are 21 distributions in Rmath, so a total of 84 functions would potentially need to be vectorized. Although I would be happy to contribute, I think this is too big a project for me to take on by myself at this point, and my background is not in numerical analysis, nor in statistics.

I think I might approach the problem as follows. I would create a new C library, say VRmath. For each of the functions d..., p..., q..., r... in Rmath, there would be a vectorized version, vd..., vp..., vq..., vr.... There would be a standardized calling interfaces

v{d,p,q}...(ResultVec, VectorLen, ArgumentVec, Param1, Param2, ...), and

vr...(ResultVec, VectorLen, Param1, Param2, ...)

The initial implementation for each would simply be a macro that repeatedly calls the corresponding routine from Rmath. This would quickly give a functioning library, which might immediately result in a performance improvement, due to the reduced number of calls between Julia and C. Over time, individual routines from Rmath could be replaced with customized vectorized versions. In a best case scenario, many functions pertaining to the core statistical distributions could end up running significantly faster in Julia than in R itself.

If you like the idea and you want to move forward on it, send me an email and we can discuss it further.

Sep 13, 2012, 10:02:02 AM9/13/12

to juli...@googlegroups.com

On Thu, Sep 13, 2012 at 9:03 AM, Matthew Clegg <matthew...@gmail.com> wrote:

I too am impressed by what I have seen so far of the Julia community. There seems to be a genuine willingness to help and an openness to new ideas.

It's a really great community :-)

Also, it seems like there might be a lot of scope for interesting user contributions.

The biggest "surprise" benefit of addressing the two language problem — i.e. the standard performance/usability compromise of using a slow high-level language for the logic and a fast low-level language for the heavy lifting — is that having both high-level and low-level code in one language reduces the social barrier to contributing to the internals of the language. All users become potential contributors, which ends up accelerating system development by a really unexpected amount.

However, I think an even better approach would be to go down to the C code and, where appropriate, explicitly vectorize the various d..., p..., q..., and r... functions, and perhaps add a family of ll... functions. It's conceivable that the trick of factoring out the normalization constant could be used in a few other cases, and vectorizing the C code would greatly reduce the number of calls from Julia to C, which are probably more costly than calling from C to C. Perhaps other optimizations would reveal themselves as well.

There is actually no overhead to calling C function from Julia — it's identical to calling a C function from C, except, of course, that the compiler doesn't have the opportunity to inline the function body. The inlining issue is one of the reasons it's good to have as much functionality as possible implemented in Julia: that way it can get inlined for even better performance (it's not just eliminating the function call overhead that's good, this also often exposes additional optimization opportunities like dead code elimination, etc.).

Sep 13, 2012, 11:57:18 AM9/13/12

to juli...@googlegroups.com

I think that trying to modify the C code could get horribly complicated. As I said, it is a twisty maze of multi-line C macros. Vectorizing the C calls could be trivial but it could also get very difficult. Most of those functions do not return the computed value directly - instead they call a macro with some terse name that you must then look up in dpq.h and, of course, that macro calls other macros several layers deep. And this also applies to the returns from error conditions. Trying to modify that code, even for something as simple as vectorizing one of the arguments, is an invitation to terminal frustration.

In my opinion it would be better to bite the bullet and translate the algorithms into Julia. The code was written in the mid-90's to take into account many systems that don't even support the IEEE floating point standard. It could be made much cleaner in Julia. If you look at extras/distributions.jl you will see that some of these functions for common distributions have already been translated into Julia, usually with much savings in complexity and time.

I have already added some functions to evaluate deviance, which is a generalization of log-likelihood, and the deviance residuals in src/GLMDistributions.jl from the repository git://github.com/dmbates/glm.jl.git or (http://github.com/dmbates/glm.jl)

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu