families with the largest difference

748 views
Skip to first unread message

Elisa Ramos

unread,
Nov 17, 2021, 6:27:46 AM11/17/21
to hahnlab-cafe

Hello,

I am trying to do an expansion/retraction analysis on a vertebrate genome dataset. I have 20 different species. I have interest in some families that I know probably have expansion in my focus group, but the results are not being calculated for this families cause they have big size differentials. But if I am searching for expansions, this is exactly what I am looking for, right? Many genes in a family in a single species. For example, if I want to see that expansions and contractions on these families what is the point in removing them by filtering? So how to proceed here? There is any parameter that I should set to be able to include these families in the analysis? I already tried tried different gamma categories values, but without any results.

Thank you in advance!

Families with largest size differentials:

OG0000000: 234

OG0000002: 217

OG0000001: 189

OG0000003: 138

OG0000062: 136

OG0000067: 135

OG0000008: 100

OG0000005: 92

OG0000006: 86

OG0000009: 86

OG0000125: 84

OG0000044: 81

OG0000011: 77

OG0000010: 73

OG0000007: 67

OG0000068: 67

OG0000013: 61

OG0000021: 61

OG0000012: 61

OG0000015: 59

You may want to try removing the top few families with the largest difference

between the max and min counts and then re-run the analysis.


Hahn, Matthew

unread,
Nov 17, 2021, 9:59:05 AM11/17/21
to Elisa Ramos, hahnlab-cafe
Hi Elisa,

If families are changing too rapidly, it becomes hard to infer their most likely ancestral state, and therefore whether there have been gains or losses on a specific branch. But if you’d like to get a non-exact idea of what’s going on, you could try analyzing these families while setting lambda to a very low value (for instance, 0.0001)—do not search for lambda. This will tell you what the ancestral sizes were under a model where the families are evolving slowly, but should be an approximation of their true size.


cheers,
Matt

--
You received this message because you are subscribed to the Google Groups "hahnlab-cafe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hahnlabcafe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hahnlabcafe/2805ce71-098e-4014-a83a-b1c915cf82c6n%40googlegroups.com.

biotek...@gmail.com

unread,
Apr 25, 2023, 12:13:19 PM4/25/23
to hahnlab-cafe
Hi All,

Im having the same issue , wondering how did you fix the issue, what steps are good to tacke this issue? thanks
FYI - Im using direct outputs (Orthogroups.GeneCount.tsv and SpeciesTree_rooted.txt) from orthofineder.
I also tried - 
$clade_and_size_filter.py -i data/v2null_Orthogroups.GeneCount.txt -o filter_v2null_og_gc.txt -s

any leads will be appreciated
Thanks
sam

dos2unix Orthogroups.GeneCount.tsv

cafe5 -i data/v2null_Orthogroups.GeneCount.txt -t data/SpeciesTree_rooted.txt

Reply all
Reply to author
Forward
0 new messages