bucky memory

67 views
Skip to first unread message

Helene Chiapello

unread,
Jun 26, 2015, 5:40:21 AM6/26/15
to bucky...@googlegroups.com
Hi,

I ran BUCKy 1.4.4 on a dataset of 10 taxa and 6878 genes on a cluster node with 60G of RAM  (+ 60G of virtual memory).

I get first this message : "The grid size to get genome-wide CFs is too small compared to the number of sampled genes... changing this grid size to 20634 (3 * # genes)"

And then the analysis ends with:
Initializing gene information....terminate called after 
throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)


I also tried the --opt-space without success.

thanks in advance for help

Helene

Cécile Ané

unread,
Jun 26, 2015, 10:05:28 AM6/26/15
to bucky...@googlegroups.com
The 'bad_alloc' error is most likely from a lack of memory. It occurs
before the MCMC starts, so it's not related to the grid size.

You can try subsampling the genes that are most informative, or in other
words ignore the genes with very little information. That way, you would
not lose much information but would save memory by not having to track
the trees for the less informative genes. One way to measure a gene's
'informativeness' is to see how many distinct trees it has in its
posterior distribution from MrBayes. If it has few distinct trees, its
posterior is well concentrated. If it has as many distinct trees as the
number of trees saved during the MCMC, then its posterior is all over
the place and the gene is not informative. A quick way to know the
number of distinct trees for a gene and to rank the genes is to use the
word count 'wc' command in Linux or Mac, to get the number of lines in
the mbsum (.in) file for the gene. With just 10 taxa, I think this
approach should work.

Hope that helps,
Cecile.
Reply all
Reply to author
Forward
0 new messages