Hi George,
> How many physical CPU cores do you have?
> I have access to a maximum of 64 physical CPU cores.
OK it's fine then.
> 25K patterns are not much at all, and I'd rather use less threads (8 to 24).
> I have the tendency to think that more threads is always better, but I see that this is not always the case. I will
> adjust the number of threads when running this command a next time.
Unfortunately not, we usually recommend >1000 DNA patterns per thread, you can go below that threshold, but at some
point you will stop seeing runtime improvements from adding more threads.
> RAxML: no rate heterogeneity, empirical base freqs, rapid bootstrapping
> RAxML-NG: GAMMA model of rate het., ML estimate of base freqs., slow bootstrapping
> Yeap, I see that. I was tried to run RAxML with the *ASC_GTRGAMMA* model removing the *-V* option, but I experienced a
> also extended computation time and decided to try the *ASC_GTRCAT *model with the*-V *option.
OK I see.
> Currently, RAxML-NG doesn't support rapid bootstrapping, otherwise you can get the same settings as in (old) RAxML
> command line by using "--model GTR+F+ASC_LEWIS".
> Thanks, I will try to include the empirical base freqs. next time. However, I would rather prefer t have the ML
> estimates if possible.
That's all right, you can keep ML freqs, I just wanted to make sure you're aware of that (e.g. in case you want to
compare likelihood scores), since in RAxML-NG the default was changed from empirical to ML freqs.
> However, my main concern was actually on running time. Maybe I did not make myself completely clear before, but, if we
> take the RaxML-NG run as example,right now I am the* GTR *model with ML freqs. and applying the* ASC_LEWIS* correction.
> So, the program is first generating 20 random starting trees (*default*) to subsequently perform1,000 (*my choice*)
> slow bootstrap analysis on each of these trees. Using 46 threads, each of my *ML Tree Search *is taking approx. 42min.
> Would I be wrong if I assumed that each BS would also take approx. 42min and that 1,000 BSs will be performed for each
> tree plus the best tree? If so, the total time for my job to finished would be roughly: 42min*(20*1,001) = 584 days? Or
> are the random 20 trees only for the best ML tree? In this case, it would be: 42min*(20+1,000) = 30 days?
The latter calculation is correct (30 days). But usually BS replicate searches run faster, so this would be a
conservative/pessimistic estimate. Furthermore, you can parallelize across bootstraps to make better use of all 64 cores:
- with old RAxML, you can use raxmlHPC-HYBRID (please search in this google group for detailed explanations/recommendations)
- with RAxML-NG, you start multiple instances with "--bootstrap" command and different random seeds, then simlly
concatenate all *.raxml.bootstraps files and draw branch support with the "--support" command
> If that is the case, could I ask you if you would have any recommendation for how to decrease this computation time?
>
> * Probably, decrease the number of BSs from 1,000 to 100 is a good start?;
That's one option, you can also use bootstopping to find the appropriate number of replicates. With old RAxML, you can
specify "-N autoMRE", with RAxML-NG you can - for the time being - run bootstraps in batches of say 100, and then check
for convergence using the "-f B" option of the old RAxML.
> * Maybe also decrease the number of random starting trees from 20 to say 10 or 5?;
I would not recommend this, since most runtime will be spent on bootstrapping anyways.
> * Work with empirical freqs. instead of ML freqs. estimates? Does it affect running time much?
Usually it doesn't, so please use whatever you find more appropriate.
> * Maybe I would use a less complex model than GTR? However, I am not quite sure if there is any other compatible with
> the *ASC_LEWIS* correction.
In RAxML-NG, you can use any DNA matrix with any correction model. However, GTR is a good default choice. so I'd stick
to it.
> Finally, how expensive is the *ASC_LEWIS* correction is? Seeing that I have SNP data (and so have to employ the
> *ASC_LEWIS* correction), would it be faster if I worked with the full dataset (SNPs + non variable sites)? It is not
> clear to me if by doing so I would be increasing or decreasing the computation time considering that the amount of total
> sites would be rather expanded.
Lewis (or any other asc. correction) is rather cheap. Full dataset will surely contain more sites, but most of them will
be compressed into just 4 patterns (it depends how much missing data you have). In any case, I'd expect the full dataset
to run slower.
Finally, could you please sen full RAxML-NG log file and - if possible - alignment file to my e-mail address.
Just in case, I will check if anything is going wrong on this dataset.
Best,
Alexey
> >
www.exelixis-lab.org <
http://www.exelixis-lab.org> <
http://www.exelixis-lab.org>
> >
> > --
> > You received this message because you are subscribed to the Google Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to
raxml+un...@googlegroups.com