What is the standard running time for BUCKY ?

Noppol Kobmoo

unread,

Oct 12, 2018, 1:27:48 AM10/12/18

to BUCKy users

Hi,

I'm starting using BUCKy with my data. I'm just wondering how long does a run of BUCKy take in general ?

I know that it should depend on the number of loci and taxa.

Let's say I have around 300 loci of about 100 Kb (whole genome alignment in chunks) for 60 taxa. How long this would take ?

Thank you very much in advance.

Noppol.

Cécile Ané

unread,

Oct 12, 2018, 10:04:44 AM10/12/18

to BUCKy users

How many taxa? With 4 taxa, it will take a few seconds, even with 300 loci.

In any case, the most time-consuming step is to run each locus with MrBayes.

Noppol Kobmoo

unread,

Oct 15, 2018, 10:23:57 PM10/15/18

to BUCKy users

I have 60 taxa.

I have started running since already 5 days and it's still not finished. I found this to be suspiciously long...

I don't have any output file but *.out, at the end of this find I found the following messages:

Read 292 genes with a total of 2744382 different sampled tree topologies
Writing input file names to file NK_BCF.input....done.
Sorting trees by average posterior probability....done.
Initializing random number generator....done.
Initializing gene information....done.

And it's stayed like this since 5 days...

Should there be a bug with my BUCKy installation ?

เมื่อ วันศุกร์ที่ 12 ตุลาคม ค.ศ. 2018 21 นาฬิกา 04 นาที 44 วินาที UTC+7, Cécile Ané เขียนว่า:

Cécile Ané

unread,

Oct 17, 2018, 11:07:33 PM10/17/18

to BUCKy users

BUCKy takes a long time when the genes have a diffuse posterior distribution of trees: this happens when there are many taxa and the gene trees are not well resolved. Here, the output says that there are "2744382 different sampled tree topologies" total. It's this large number of possible gene trees that makes bucky slow.

To make things run faster, one option could be to use the loci whose trees are most resolved, judged by the number of distinct sampled trees for each locus. And drop loci with very many distinct trees. You can tell the number of distinct sampled trees for each locus by the size of the summarized file for that locus: there is exactly 1 line per distinct topology in that summary created by mbsum (not 1 line per generation as in the tree files created by MrBayes). This approach would be useless if all of your loci have similar numbers of distinct trees. But the approach might be useful if it's only a fraction of the loci that drive the high number of distinct trees.

To answer your question: it's not a bug. it's just slow...

Noppol Kobmoo

unread,

Oct 22, 2018, 4:51:54 AM10/22/18

to BUCKy users

Thank you very much for your suggestion. I have removed the genes with less resolved tress and it has signifcantly improved the performance; I could finish within a couple of hours ! I think this is good idea because by removing loci with too many tree we also increase phylogenetic signals.

Best regards,

Noppol.

เมื่อ วันพฤหัสบดีที่ 18 ตุลาคม ค.ศ. 2018 10 นาฬิกา 07 นาที 33 วินาที UTC+7, Cécile Ané เขียนว่า:

Reply all

Reply to author

Forward