cafe5 how to choose an proper k?

390 views
Skip to first unread message

张宁

unread,
Jun 30, 2021, 1:48:28 AM6/30/21
to hahnl...@googlegroups.com
In reality data do we need to use different k to choose one with the largest likelihood value?

when i check the result in gamma model like bellow:
```
Model Gamma Final Likelihood (-lnL): 322698
Lambda: 0.0036119107181668
Maximum possible lambda for this topology: 0.00552835
125 values were attempted (0% rejected)
The following families had failure rates >20% of the time:
OG0000004 had 38 failures
OG0000006 had 38 failures
OG0000007 had 38 failures
```
which lambda we should use? and how to handle with the failures?

Best Regards!

Ning


Dan Vanderpool

unread,
Jun 30, 2021, 4:13:30 PM6/30/21
to 张宁, hahnl...@googlegroups.com
Hello Ning,

We have considered adding automatic K selection but have not yet implemented it.  Until then you will want to try running it with several values of K starting at 1 and going up until you don’t see improvement or you get an even worse log likelihood value.  The failures are telling you that the family size may be too big in those clusters, if the variance among families is too high CAFE has a hard time converging on a reasonable value for lambda.  You may want to exclude some of these families  from the analysis (we added the failure reports to make it easier to find the families that are causing problems).

Dan

--
You received this message because you are subscribed to the Google Groups "hahnlab-cafe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hahnlabcafe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hahnlabcafe/CAM9aaf4GmLxdVUDDZY3CP2rURjffs1%3DrLvSVa4T%2BWYAz1qBp1Q%40mail.gmail.com.

Dan Vanderpool, PhD
Post Doctoral Researcher
Forest Service
National Genomics  Center for Wildlife and Fish Conservation
p: 406-396-4998
Daniel.V...@usda.gov
800 E Beckwith
Missoula, MT 59812
www.fs.fed.us

  
Caring for the land and serving people

张宁

unread,
Jul 1, 2021, 10:46:00 AM7/1/21
to hahnlab-cafe
Hello Dan

Looking forwards to see an updated cafe5 !  thanks for your great patient ! thanks !

best regards

Ning

Anezka Santolikova

unread,
Jan 23, 2025, 4:52:38 AMJan 23
to hahnlab-cafe
Hello,
I would like to attach to this question.
I have the same problem with more than 1/3 of problematic families (10 052 Orthogroups out of 27 333 had failure rates >20% of the time).
Excluding that much of the data seems a bit drastic to me.
On the other hand, the likelihood is better with the multiple gamma model (-lnL k 3: 189 429 vs -lnL basic model: 212 164)

Do you please have a recomendation if it is better to choose the model with worse likelihood or to ignore the note about failure rates?

Best regards,
Anezka

Dne čtvrtek 1. července 2021 v 16:46:00 UTC+2 uživatel ningzh...@gmail.com napsal:

Hahn, Matthew

unread,
Jan 23, 2025, 4:54:34 AMJan 23
to Anezka Santolikova, hahnlab-cafe
I would not ignore the fact that you have many failures. These are telling you about the appropriateness of applying CAFE to your dataset.



Matt

Anezka Santolikova

unread,
Jan 23, 2025, 5:16:51 AMJan 23
to Hahn, Matthew, hahnlab-cafe
Thank you for a quick reply! 
So the right thing do do here is either to choose a different dataset with less extravagant orthogroups or try to find different type of analysis to the current dataset?
Best regards,
Anezka


čt 23. 1. 2025 v 10:54 odesílatel Hahn, Matthew <m...@iu.edu> napsal:

Hahn, Matthew

unread,
Jan 23, 2025, 5:20:28 AMJan 23
to Anezka Santolikova, hahnlab-cafe
Yes, that is what I would recommend. You might try downsizing your dataset to a more closely related set of organisms, if possible.


matt

Reply all
Reply to author
Forward
0 new messages