Well Julia newbie here! I intend to implement a number of Bayesian hierarchical clustering models (more specifically topic models) in Julia and here is my implementation for Latent Dirichlet Allocation as a gist:https://gist.github.com/odinay/3e49d50ba580a9bff8e3
I shall say my Julia implementation is almost 100 times faster than my Python(NumPy) implementation. For instance for a simulated dataset from 5 clusters with 1000 observations each containing 100 points:
true_kk = 5
n_groups = 1000
n_group_j = 100 * ones(Int64, n_groups)
Julia spends nearly 0.1 sec for each LDA Gibbs sampling iteration while it takes almost 9.5 sec in Python on my machine. But the code is still slow for real datasets. I know that Gibbs Inference for these models is expensive in nature. But how can I make sure I have optimised the performance of my code to the best. For example for a slightly bigger dataset such as
true_kk =20
n_groups = 1000
n_group_j =1000 *ones(Int64, n_groups)iteration: 98, number of components: 20, elapsed time: 3.209459973 iteration: 99, number of components: 20, elapsed time: 3.265090272 iteration: 100, number of components: 20, elapsed time: 3.204902689 elapsed time: 332.600401208 seconds (20800255280 bytes allocated, 12.87% gc time)
As I move to more complex models, optimizing the code to the best becomes a bigger concern. How can I make sure without changing the algorithm (I don't want to use other Bayesian approaches like variational methods or so), this is the best performance I can get? Also parallelization is not the answer. Although efficient parallel Gibbs sampling for LDA has been proposed (e.g. here), it is not the case for more complex statistical models. Thus I want to know if I am tuning the loops and passing vars and types correctly or it can be done more efficiently.
What made me unsure of my work is the huge amount of data that is allocated, almost 20 GB. I am aware that since numbers are immutable types, Julia has to copy them for manipulation and calculations. But considering the complexity of my problem (3 nested loops) and size of my data, maybe based on your experience you can tell if moving around 20 GB is normal or I am doing something wrong?
Best,
Adham
julia> versioninfo()
Julia Version 0.3.11Commit 483dbf5* (2015-07-27 06:18 UTC)Platform Info: System: Windows (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge) LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3Profile.clear()
@profile ...some_function_call...
ProfileView.view() # You'll have to Pkg.add it