Orthofinder results with CAFE

900 views
Skip to first unread message

Michał T. Lorenc

unread,
Feb 21, 2021, 9:00:54 PM2/21/21
to hahnlab-cafe
Hi,
I found here a tutorial on how to run CAFE with MCL. Is it possible to use Orthofinder results?

Thank you in advance.

Best wishes,

Michal

Hahn, Matthew

unread,
Feb 21, 2021, 9:37:18 PM2/21/21
to Michał T. Lorenc, hahnlab-cafe
Hi Michal,

Sure, that should be fine, as long as it’s not identifying only one-to-one orthologs.




Matt

--
You received this message because you are subscribed to the Google Groups "hahnlab-cafe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hahnlabcafe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hahnlabcafe/b3536493-935e-41c5-a8b5-cc65a294e959n%40googlegroups.com.

Michał T. Lorenc

unread,
Feb 21, 2021, 9:47:24 PM2/21/21
to Hahn, Matthew, hahnlab-cafe
Hi Matt,
Thank you for your reply. Do you know which file from Othofinder would I need to use?
How to check for identifying only one-to-one orthologs in Othofinder results?

Thank you in advance,

Michal

Hahn, Matthew

unread,
Feb 22, 2021, 7:48:46 AM2/22/21
to Michał T. Lorenc, hahnlab-cafe
Hi,

Unfortunately, I don’t know which file. But you should check that each species can at least sometimes have more than one gene within a family/orthogroup. 



Matt

On Feb 21, 2021, at 9:47 PM, Michał T. Lorenc <m.t.l...@gmail.com> wrote:



Dan Vanderpool

unread,
Feb 22, 2021, 11:39:41 AM2/22/21
to Matthew William Hahn, "Michał T. Lorenc", hahnlab-cafe
Hello Michal,  

I might weigh in a little here as well.  As Matt said, you need to make sure Orthofinder isn’t splitting gene families too much.  I have seen this happen with Orthofinder in that it creates many small clusters among taxa, splitting gene families into sub-families.  This can result in a data matrix with a lot of zeros, meaning the families are then thrown out when we filter them for families that “don’t exist” at the root.  I use FastOrtho as it is uses the MCL algorithm and one can adjust the coarseness of the clustering.  This may be possible in Orthofinder as well but I don’t know the software so I can’t offer advice on the best way to implement it.  


Dan


__________________________
Dan Vanderpool
Postdoctoral Scholar
Department of Biology
Indiana University
Hahn Lab, Jordan Hall 249B
1001 East Third Street
Bloomington, IN 47405
Email: ddvand...@gmail.com

Seth Barribeau

unread,
Feb 22, 2021, 2:58:33 PM2/22/21
to hahnlab-cafe

Hi Michel,

I have been using ```OrthoFinder/Results_Jan21/Orthogroups/Orthogroups.GeneCount.tsv``` as an input for CAFE but may have run into the issue Dan describes as having ```was not found in gene family```

@Dan, was your solution to simply switch to a different orthology finding tool or is there a workaround ?

Best,

Seth

Dan Vanderpool

unread,
Feb 23, 2021, 10:20:51 AM2/23/21
to Seth Barribeau, hahnlab-cafe
Hello Seth,

As I mentioned I don’t use Orthofinder so I do not know if one can adjust the clustering to be more or less inclusive.  The inflation parameter does this in FastOrtho as well as the OrthoMCL implementation though FastOrtho is a much simpler software to use.  I believe the tutorial also has directions for using the BLAST scores directly with the MCL algorithm.  

Dan

Seth Barribeau

unread,
Feb 24, 2021, 10:20:29 PM2/24/21
to hahnlab-cafe
Hi Dan, 

Thanks as ever for your feedback. I have since run FastOrtho on my datasets and run into the same problem as I had with orthofinder. When I feed CAFE5 my table of gene family numbers it says that 'speciesX is not in gene family Y''. I thought that maybe this was being thrown by some wonky cases where all the genes were from a single species, but reducing the table to cases where there were genes from at least 2 (of the 11) species didn't fix the issue. 

Can you please advise how to get around this problem? If there is something that I missed in the documentation, please do let me know!

Best,

Seth

Seth Barribeau

unread,
Feb 24, 2021, 11:29:06 PM2/24/21
to hahnlab-cafe
In case this snippet makes life any clearer:


Command line: cafe5 -i /home/barribeau/FastOrtho_countsTable_twoplus.tsv -t /home/barribeau/SpeciesTree_rooted_shortnames.txt
Mrot was not found in gene family ORTHOMCL1

head ../FastOrtho_countsTable_twoplus.tsv 
Desc    Family ID       Aflo    Amel    Bimp    Bter    Ccal    Dnov    Emex    Hlab    Lalb    Mqua    Mrot
null    ORTHOMCL1       18      14      21      21      16      15      14      15      17      19      18
null    ORTHOMCL2       16      11      12      16      20      15      12      11      10      12      13
null    ORTHOMCL3       11      18      10      13      13      11      20      7       7       11      22
null    ORTHOMCL4       0       0       8       9       18      21      7       47      2       1       27
null    ORTHOMCL5       4       4       17      27      13      11      9       6       10      8       7
null    ORTHOMCL6       9       7       8       9       29      6       9       8       7       5       6
null    ORTHOMCL7       15      13      12      9       10      1       17      9       2       4       8
null    ORTHOMCL8       12      5       11      14      2       7       8       7       7       6       8
null    ORTHOMCL9       7       7       8       8       7       6       7       8       7       7       8


head ../SpeciesTree_rooted_shortnames.txt 
(((Dnov:0.0990623,Lalb:0.169629)0.744153:0.0479222,Mrot:0.117755)0.418429:0.00960115,(Ccal:0.145036,(Hlab:0.0987793,(Emex:0.0839156,((Mqua:0.161695,(Bter:0.0195347,Bimp:0.0190552)0.905335:0.0546258)0.504831:0.0230274,(Aflo:0.0363988,Amel:0.0348821)0.871307:0.0674352)0.236521:0.011043)0.436774:0.0174657)0.243803:0.0129161)0.418429:0.00960115);

Fulton, Ben

unread,
Feb 25, 2021, 8:21:01 AM2/25/21
to Seth Barribeau, hahnlab-cafe

This may indicate that there are extra command characters at the end of the line in the file.  Try running the dos2unix command on the file to clear those out.

 

--

Ben Fulton

Research Applications and Deep Learning

Research Technologies

Indiana University

E-Mail: befu...@iu.edu

Dan Vanderpool

unread,
Feb 25, 2021, 1:53:06 PM2/25/21
to Seth Barribeau, hahnlab-cafe
Hello Seth,

This sounds more like a parsing issue due to line breaks.  If you saved the input data file in excel or similar, make sure that the line breaks are saved as “Unix”.  I often open the file in excel to manipulate the columns and make a visual inspection of the data then save it as a tab delimited text file.  I then open it in BBEDIT or some other text editor and re-save it with the correct line breaks.  Alternatively, you can open it in BBEdit and select “View>Text Display>Show Invisibles” to see if you have an extra tab character in one of the columns.  Also your species tree does not look like it is ultra metric. 

Dan

Message has been deleted

Dan Vanderpool

unread,
Mar 5, 2021, 12:24:34 PM3/5/21
to Seth Barribeau, hahnlab-cafe
I shudder to think of the number of hours of my life spent trying to figure out line break issues, especially early on.  

On Mar 4, 2021, at 3:27 PM, Seth Barribeau <seth.ba...@gmail.com> wrote:

Hi Dan, 

Thanks, that was indeed the issue. dos2unix fixed it for the fastortho file. Still not for the orthofinder, but I didn't try too hard after getting it working on fastortho. 

Rookie mistake 🙄

S

Reply all
Reply to author
Forward
0 new messages