Hi Clifton,
The question you are asking is a topic I am currently exploring :-)
Missing genes (whose absence is not due to gene loss) might indeed be detrimental to species tree inference. The --pruned-species-tree option is supposed to partly correct this problem, but this solution is not perfect.
What do you mean with collapsing most of the nodes? Could you maybe send me your result species trees?
Here are some ideas:
- try filtering out gene trees that cover too little species and see if it improves the results
- try competing methods, such as Astral-Pro, to see if they are more robust to missing data (I would expect quartets to behave slightly better
if missing data is the problem).
Best,
Benoit