Dear Clifton,
this is a bug in GeneRax. MiniNJ is run before the gene tree inference step, which is absurd because it needs the gene trees to run correctly... The same issue occurs if you start from a random species tree, for the exact same reason. Thanks a lot for the report :-)
I will fix the bug, but I can't tell if it will be fixed before tomorrow, and then I will have one week off. Even if it's fixed by tomorrow, you will need to wait that the fix reaches the bioconda package (I think you use the conda installation), which typically takes them a few days.
After having looked at your data, here are two personal recommendations, not directly related to your issue:
- you have quite a large dataset, and just inferring the gene trees will be quite challenging. I would precompute the gene trees first, and save them preciously before running any other tool. This is exactly the case where ParGenes (see my post above) makes a lot of sense, if you have access to a cluster or a big machine. But even with ParGenes and a cluster, it's hard to tell how long this is gonna take, because you have a few very big trees (at least one with more than 5000 taxa).
- you are using the GTR model for proteins. Unless you are doing this for a particular reason and you know what you are doing, we usually discourage this. The GTR model is fine for DNA alignments, but it is very problematic for protein alignments (in short, the protein GTR model has almost 200 free parameters, which is huge, and will cause many issues). You can either pick another model, if you know which one makes sense for your data, or run modeltesting on each alignment. Note that ParGenes can run modeltesting before each gene tree inference automatically (sorry for the repeating advertisement, but ParGenes was developed for this exact purpose).
I will keep you in touch about the generax bug fix.
Best,
Benoit