Our sum() may also be slower because we use a more accurate algorithm.
In fact we're currently debating whether to keep it or sell out for
performance like everybody else.
Is there a reason that you want to evaluate the log-likelihood from from the definition instead of a function like qt in R? I would have written the log-likelihood function aslls1 <- function(X, mu, sigma, df) sum(dt((X-mu)/sigma, df, log=TRUE))
but some experimentation shows that it gives a different answer than your code and also takes longer, which I find surprising.
In Julia I would have writtenload("distributions.jl")function llst(X, mu::Real, sigma::Real, df::Real)dd = TDist(df)ss = 0.for x in X ss += logpdf(dd, (x-mu)/sigma) endssendwhich, on my machine, is roughly equivalent to the R version in timing and produces the same answer, -1055.1081923585919.This equivalence is not surprising because the logpdf function for a T distribution uses the same code as does R for the low-level evaluation.It is not easy to see how the calculation is actually being done julia/deps/Rmath/src/dt.c because that code is a twisty maze of mulit-line C macros. One thing you can notice is that it uses different code according to whether ((x-mu)/sigma)^2 > 0.2* df.
On Tuesday, September 11, 2012 11:18:30 AM UTC-4, Douglas Bates wrote:Is there a reason that you want to evaluate the log-likelihood from from the definition instead of a function like qt in R? I would have written the log-likelihood function aslls1 <- function(X, mu, sigma, df) sum(dt((X-mu)/sigma, df, log=TRUE))but some experimentation shows that it gives a different answer than your code and also takes longer, which I find surprising.Please correct me if I am wrong, but I think you may have left out a 1/sigma constant. If p(x) is a probability density on x, and y = f(x), then shouldn't the probability density on y be p(f(y))f'(y)? In our case, f'(y) = 1/sigma. So I believe the proper function definition should be:lls2 <- function(X, mu, sigma, df) sum(dt((X-mu)/sigma, df, log=TRUE)) - length(X)*log(sigma)
After making this change, you'll find that the functions agree:> llst(vec, 1, 2, 3)[1] -1748.255> lls1(vec, 1, 2, 3)[1] -1055.108> lls2(vec, 1, 2, 3)[1] -1748.255You can check that this is the right density function (or at least reasonable) by integrating:> dst <- function(X, mu, sigma, df) dt((X-mu)/sigma, df)/sigma> integrate(dst, -20, 20, 1, 2, 3)0.9978412 with absolute error < 5.7e-07In Julia I would have writtenload("distributions.jl")function llst(X, mu::Real, sigma::Real, df::Real)dd = TDist(df)ss = 0.for x in X ss += logpdf(dd, (x-mu)/sigma) endssendwhich, on my machine, is roughly equivalent to the R version in timing and produces the same answer, -1055.1081923585919.This equivalence is not surprising because the logpdf function for a T distribution uses the same code as does R for the low-level evaluation.It is not easy to see how the calculation is actually being done julia/deps/Rmath/src/dt.c because that code is a twisty maze of mulit-line C macros. One thing you can notice is that it uses different code according to whether ((x-mu)/sigma)^2 > 0.2* df.Why is my code faster? Certainly in Julia, one part of the answer is that I am computing the normalization constant only once, whereas repeatedly calling logpdf causes it to be recomputed for each element of X, and this is the major cost of evaluating the t density. (This is another advertisement for the benefits of memoization, although the memoization would need to occur deep in the internals of the code.)
Your message caused me to look for the first time at the actual R source code for the implementation of the t-distribution, and I was surprised to find that it also seems to repeatedly recompute the normalization constant for each of the elements of the vector passed to dt(X, df). (In other words, dt() is not itself vectorized.) Perhaps this is the entire explanation for the speedup that I get.
However, it's also the case that the manner in which dt is computed in R certainly does not match up with a straightforward translation of the t-distribution from a college textbook. In fact, my first reaction to this code was ... She did what!?! :)
On Tuesday, September 11, 2012 2:58:02 PM UTC-5, Matthew Clegg wrote:Your message caused me to look for the first time at the actual R source code for the implementation of the t-distribution, and I was surprised to find that it also seems to repeatedly recompute the normalization constant for each of the elements of the vector passed to dt(X, df). (In other words, dt() is not itself vectorized.) Perhaps this is the entire explanation for the speedup that I get.Well it would certainly help. Good work on seeing that. My analysis was too superficial. I like the julia-dev list because there are so many smart and knowledgeable people contributing and I get to learn a lot.
However, it's also the case that the manner in which dt is computed in R certainly does not match up with a straightforward translation of the t-distribution from a college textbook. In fact, my first reaction to this code was ... She did what!?! :)As I said, a twisty maze of multi-line C macros - a style of programming that I find challenging to understand. All this is for the purpose of squeezing the last ounce of performance out of code that, as you show, is a bad design in the first place because the bulk of the computation for each value in a vector is needlessly repeated.We could perhaps consider additional methods for the logpdf generic from extras/distributions.jl (another file that I will convert to a module if no one gets there before me) that vectorizes more intelligently when warranted.
I too am impressed by what I have seen so far of the Julia community. There seems to be a genuine willingness to help and an openness to new ideas.
Also, it seems like there might be a lot of scope for interesting user contributions.
However, I think an even better approach would be to go down to the C code and, where appropriate, explicitly vectorize the various d..., p..., q..., and r... functions, and perhaps add a family of ll... functions. It's conceivable that the trick of factoring out the normalization constant could be used in a few other cases, and vectorizing the C code would greatly reduce the number of calls from Julia to C, which are probably more costly than calling from C to C. Perhaps other optimizations would reveal themselves as well.