Hi Chris,
Thanks for updating us. It's useful to have this here, because I'm sure this will be a common problem.
In case there are others who are stuck on this issue, here are three potential solutions if you are having dependency problems:
1. Do as Chris did and uninstall any anaconda/miniconda installations, then re-install the correct version of anaconda
3. Install your dependencies by hand like this conda install numpy pandas pytables pyparsing scipy scikit-learn
Cheers,
Rob
P.S. To answer your question Chris. My advice for a really large dataset is:
1. Do as you are, trying rcluster with default settings first.
2. Look at the improvements in the AICc score: as the algorithm progresses, these will tend closer and closer to zero. Most datasets have a long tail of small improvements, and you can use the changes to guesstimate how long it might take to finish.
If it's looking too slow, either of these will help, i'd try them sequentially first...:
1. Try reducing --rcluster-max (see the manual)
2. Try using the rclusterf algorithm (search = rclusterf; see the manual for details).
Cheers,
Rob