Hi, that sounds like interesting work! Some comments:
1. How are you storing a 4D dataset in a dataframe? One row per data-point? Could you store it in a 4D array instead? If dataframes work, that's great, but they are not optimized for every kind of use. You might get much better performance writing the join explicitly using another structure.
2. Do you have public code? In particular, have you profiled your code for a bottleneck? If you post some fragment that is slow, it'll be easier to provide concrete advice.
3. Re. Chi^2, I'm not familiar with that algorithm, but FYI there seems to be an R implementation, and you can call it with RCall.
4. If you want advice about choice of algorithm, you might have better luck asking in a forum specialized for your field. Also, consider posting on julia-users, it gets more traffic.
Best,
Cédric