Hello,
I've had experience with RAxML using partitioned DNA datasets, however this is the first time I've ever tried to analyze a partitioned AA dataset. I'm hoping that someone will be able to assist me in correctly setting up the analysis.
I have an 80-gene AA matrix. The best partitioning scheme determined by PartionFinderProtein includes 19 partitions:
LG, p1 = 1-338
JTT, p2 = 339-562, 1471-2221, 2222-2808, 4129-4575, 5221-5664, 7964-8729, 9459-9994, 9995-10542, 14073-14842
JTTF, p3 = 563-782, 1222-1470, 6664-7052, 9008-9196, 10543-11007, 15229-15711, 15712-16606, 18951-19843, 20292-20673, 21947-22316, 22623-23099, 23632-24113, 28902-29302, 31129-31522
JTT, p4 = 783-1049, 2809-3313, 4792-5220, 7554-7963, 12159-12566, 12875-13578, 14843-15228, 18445-18950, 24268-24743, 25884-26363, 30224-31128, 32114-33338
LG, p5 = 1050-1221
JTTF, p6 = 3314-3727, 11008-11343, 11803-12158, 12567-12874, 19844-20291, 25106-25883, 26364-26829, 27704-28463, 35718-36403
LG, p7 = 3728-4128, 6046-6454, 22317-22622
LG, p8 = 4576-4791
LGF, p9 = 5665-6045
LG, p10 = 6455-6663, 7394-7553, 9197-9458, 11344-11802, 17405-17676, 17980-18302, 21079-21447, 33339-34497
LG, p11 = 7053-7393
LG, p12 = 8730-9007
LG, p13 = 13579-14072, 17677-17979, 18303-18444, 23100-23631, 28464-28901, 29303-29666, 29667-30014, 30015-30223, 34683-35119
JTTF, p14 = 16607-17404
LGF, p15 = 20674-21078, 21715-21946, 24114-24267, 24744-25105, 26830-27007, 35120-35717
LG, p16 = 21448-21714
JTTF, p17 = 27008-27703
LG, p18 = 31523-32113
JTT, p19 = 34498-34682
Furthermore, the best model for each partition includes G and some include I. As far as I can tell, RAxML doesn't allow for I to be applied to specific partitions, but rather would apply to all partitions if selected.
Checking to ensure that I'm setting the analysis up correctly:
1) "Use a mixed/partitioned model? (-q)" - this would be a partition file as formatted above.
2) "Estimate proportion of invariable sites (GTRGAMMA + I)" - keep default (no) for reason mentioned above.
2) "Choose GAMMA or CAT model:" - select "Protein GAMMA".
3) I get confused at "Protein Analysis Option". My questions are, 1) do I not bother with the "Protein Substitution Matrix" option, but instead select "Use a Partition file that specifies AA Matrices"? How is the latter different from the -q option? If I select the latter, the help section indicates that the filenames must be specified as firstpartition, secondpartition, thirdpartition, fourthpartition, and fifthpartition, in order. Are these separate files and what exactly is suppose to be within these files?
4) "Use empirical frequencies" - some partitions include F, but if I select this option, would this include F for all partitions or just those indicating F in the partition file?
I appreciate any assistance and feedback! Cheers, Michael