Getting the best performance in Julia - Bayesian Inference with Julia

215 views
Skip to first unread message

Adham Beyki

unread,
Sep 9, 2015, 9:17:34 AM9/9/15
to julia-users

Well Julia newbie here! I intend to implement a number of Bayesian hierarchical clustering models (more specifically topic models) in Julia and here is my implementation for Latent Dirichlet Allocation as a gist:https://gist.github.com/odinay/3e49d50ba580a9bff8e3


I shall say my Julia implementation is almost 100 times faster than my Python(NumPy) implementation. For instance for a simulated dataset from 5 clusters with 1000 observations each containing 100 points:


true_kk = 5
n_groups
= 1000
n_group_j
= 100 * ones(Int64, n_groups)


Julia spends nearly 0.1 sec for each LDA Gibbs sampling iteration while it takes almost 9.5 sec in Python on my machine. But the code is still slow for real datasets. I know that Gibbs Inference for these models is expensive in nature. But how can I make sure I have optimised the performance of my code to the best. For example for a slightly bigger dataset such as


true_kk =20
n_groups
= 1000
n_group_j =1000 *ones(Int64, n_groups)


the output is:


iteration: 98, number of components: 20, elapsed time: 3.209459973                    
iteration: 99, number of components: 20, elapsed time: 3.265090272                    
iteration: 100, number of components: 20, elapsed time: 3.204902689                   
elapsed time: 332.600401208 seconds (20800255280 bytes allocated, 12.87% gc time)     


As I move to more complex models, optimizing the code to the best becomes a bigger concern. How can I make sure without changing the algorithm (I don't want to use other Bayesian approaches like variational methods or so), this is the best performance I can get?  Also parallelization is not the answer. Although efficient parallel Gibbs sampling for LDA has been proposed (e.g. here), it is not the case for more complex statistical models. Thus I want to know if I am tuning the loops and passing vars and types correctly or it can be done more efficiently.


What made me unsure of my work is the huge amount of data that is allocated, almost 20 GB. I am aware that since numbers are immutable types, Julia has to copy them for manipulation and calculations. But considering the complexity of my problem (3 nested loops) and size of my data, maybe based on your experience you can tell if moving around 20 GB is normal or I am doing something wrong?


Best, 

Adham


julia> versioninfo()
Julia Version 0.3.11
Commit 483dbf5* (2015-07-27 06:18 UTC)
Platform Info:
  System: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

Cedric St-Jean

unread,
Sep 9, 2015, 9:30:30 AM9/9/15
to julia-users
Cool to see more Bayesian inference in Julia! Those are the generic tips in case you haven't gone through them:

http://julia.readthedocs.org/en/latest/manual/performance-tips/

I particularly recommend profiling your code with

Profile.clear()
@profile ...some_function_call...
ProfileView.view()   # You'll have to Pkg.add it

The red boxes will show memory allocation. Also, if you @time your code, it'll tell you what fraction of the time is spent in GC (most likely a lot if it's 20 GB).

That's quite a bit of code, if you can tell us which part is the bottleneck, it'll be easier to help out.

Best,

Cédric
Reply all
Reply to author
Forward
0 new messages