Dear Jacopo,
Sorry for the late reply, I am currently on vacations.
I don't think that having duplications before the species tree root is an issue. The gene tree root position matters, but the exact position among all internal nodes that correspond to duplications before the species tree root should not matter (and can't be inferred).
I don't think that you need to split gene families, unless:
- the inference is too slow (it should get faster with more but smaller gene families)
- you expect that splitting the gene families could improve the sequence alignment quality
In general, I think that both approaches (splitting or not) are correct, and I don't expect one to be better than the other. You could try both and see if you find consistent results.
I hope it helps!
Benoit