CAFE5 running for a week now with no ouptput. Occaisional lamda scores every now and then.

Rijan Dhakal

unread,

Jun 24, 2022, 2:25:50 PM6/24/22

to hahnlab-cafe

Hi there,

The issue at hand: CAFE5 runs for a long time without output. I fired up CAFE5 on my dataset a couple of weeks back and since the dataset was large, I thought it was not weird for it to run for 3-4 days. For whatever reason, the run ended without warning on my server and left no output at all. I chalked it up to being some kind of OS/system error and fired it up again. It has been going on for almost a week now with an empty result directory.

Currently the shell screen reads the following information:

```

Command line: cafe5 -i Orthogroups.GeneCount.csv -t SpeciesTree_rooted.txt

Filtering families not present at the root from: 21748 to 9378

No root family size distribution specified, using uniform distribution

Optimizer strategy: Nelder-Mead with similarity cutoff
Iterations: 300
Expansion: 2
Reflection: 1

Starting Search for Initial Parameter Values
Lambda: 0.12116138704594
Score (-lnL): inf
Lambda: 2.0477577151642
Score (-lnL): nan
Lambda: 0.27461601752791
Score (-lnL): inf
Lambda: 0.08182515244939
Score (-lnL): inf
Lambda: 0.56652024022676
Score (-lnL): inf
Lambda: 2.0000950212729
Score (-lnL): nan
Lambda: 0.71135955015171
Score (-lnL): inf
Lambda: 0.26599710513942
Score (-lnL): inf
Lambda: 0.08100296671008
Score (-lnL): inf
Lambda: 1.866496869993
Score (-lnL): nan
Lambda: 0.10015139598787
Score (-lnL): inf
Lambda: 0.23642539287965
Score (-lnL): inf
Lambda: 0.44962880745106
Score (-lnL): inf
Lambda: 1.0580599215515
Score (-lnL): inf
Lambda: 0.35825164591241
Score (-lnL): inf
Lambda: 0.44415546904817
Score (-lnL): inf
Lambda: 0.27909670509144
Score (-lnL): inf
Lambda: 2.3103388278262
Score (-lnL): nan
Lambda: 0.34373836180637
Score (-lnL): inf
Lambda: 0.40776601052855
Score (-lnL): inf
Lambda: 1.5624418591584
Score (-lnL): nan
Lambda: 0.83380103912642
Score (-lnL): inf
Lambda: 1.1229503586454
Score (-lnL): inf
Lambda: 1.0458433234547
Score (-lnL): inf
Lambda: 0.58452264183749
Score (-lnL): inf
Lambda: 0.87656598560601
Score (-lnL): inf
Lambda: 1.5437407505061
Score (-lnL): nan
Lambda: 0.7874973166333
Score (-lnL): inf
Lambda: 0.51657814670635
Score (-lnL): inf
Lambda: 0.22078470471926
Score (-lnL): inf
Lambda: 0.054962929042812
Score (-lnL): inf
Lambda: 0.5309833353734
Score (-lnL): inf
Lambda: 0.8962395924528
Score (-lnL): inf
Lambda: 1.2945849963068
Score (-lnL): inf
Lambda: 0.5411051503877
Score (-lnL): inf
Lambda: 1.0654852342631
Score (-lnL): inf
Lambda: 0.081482317436751
Score (-lnL): inf
Lambda: 1.3964298131689
Score (-lnL): inf
Lambda: 2.3541287182919
Score (-lnL): nan
Lambda: 1.2232750538566
Score (-lnL): inf
Lambda: 0.30033573529425
Score (-lnL): inf
Lambda: 0.1949229946072
Score (-lnL): inf
Lambda: 1.6042103087856
Score (-lnL): nan
Lambda: 0.6330509271401

```

I thought it might be something similar to this discussed issue on GitHub but I noticed on my shell screen that families with no presence at root were automatically removed. Every now and then a new lamda score seems to be popping up and running `top` on my linux shell shows that CAFE is in fact running but after a week I am wondering if anything might be wrong.

The files I am using are:

The csv file

The newick file

CAFE5 seems to be using multiple cores and fairly substantial amount, but not all, of memory on my office workstation with 128GB of RAM and 8 cores, so compute power could be an issue but most likely not.

Am I doing something wrong? Does the raw data need clean up or wrangling?

Any help is much appreciated.

Sincerely,

Rijan

Hahn, Matthew

unread,

Jun 24, 2022, 3:34:28 PM6/24/22

to Rijan Dhakal, hahnlab-cafe

Hi,

It doesn’t seem like CAFE is giving any sensible results on the dataset from the beginning: that’s why it’s giving back “inf” on every iteration.You appear to be trying to analyze all of green plants—this may simply be too big a tree (and too deep) for CAFE to handle. You might have to try a smaller subset of the data.

cheers

Matt

--
You received this message because you are subscribed to the Google Groups "hahnlab-cafe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hahnlabcafe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hahnlabcafe/026003b2-2e28-412a-bdf7-1bd70c4fb5cbn%40googlegroups.com.

Rijan Dhakal

unread,

Jun 25, 2022, 10:51:46 AM6/25/22

to hahnlab-cafe

Gotcha. Would it be okay to make subsets of the data and compare the output of CAFE from different subsets for a somewhat holistic picture? Say, I split the tree to parts small enough for the CAFE to work on and could I compare the numbers between the nodes of two different subsets?

Sincerely,

Rijan

Hahn, Matthew

unread,

Jun 25, 2022, 4:54:04 PM6/25/22

to Rijan Dhakal, hahnlab-cafe

Certainly you can compare different trees, and the rates inferred between them. Hopefully that’s good enough.

Matt

On Jun 25, 2022, at 10:52 AM, Rijan Dhakal <dhakalr...@gmail.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/hahnlabcafe/4b4b1826-3b6b-48f0-ab6a-c942e2000e62n%40googlegroups.com.

Rijan Dhakal

unread,

Jun 25, 2022, 5:01:34 PM6/25/22

to hahnlab-cafe

Awesome, thank you!

I see the list of papers on Google scholar that have cited CAFE, I can look through them to see how different published project have used CAFE for their workloads but is there anything, off the top of your head, that you can point me to in particular. Something that used CAFE for markedly large trees?