I'm a graduate student, and I wanna perform divergence time estimation using whole genome SNPs dataset(about 220000 SNPs for each individual, 100 individuals total), but here comes some question when I performing this analysis:
1)it seems to be extremely calculation expensive, even by using a 10 individuals small dataset divided from the big one when I tried to use SNAPP on CIPRES(I checked the log file, only 59000 MCMC chains in 240 CPU Hrs) , then I wondered how to reduce my computing cost to an acceptable level while retaining more phylogenetic information? Should I reduce the number of individual or the number of SNPs? I tried to resample a smaller dataset randomly (22000 SNPs)by using a python script(generate 22000 random and no repeat index first, then use this sorted list to resample all sequences), but when I use it to perform phylogenetic analysis using IQ-tree, the topology was completely changed compared to the tree using original dataset, I think my resample is not available, or did it with a wrong way.
2)I have been read various papers and tutorials, some evidences indicate that the SNAPP is the only solution of divergence time estimation based on SNPs now, but there are still lots of researches treat SNPs as normal gene sequence and perform this analysis normally using BEAST, is that methodologically correct? If I do the same thing as them, will it reduce my computing cost?
3)I read the tutorial of SNAPP on github(
tutorials/README.md at main · ForBioPhylogenomics/tutorials · GitHub ), and I noticed there is a word "Saga" in this tutorial, it seems that this is a phylogenetic analysis platform like CIRPES, but I can't find it with google. Is there anyone knows the website or some other similarly platforms? I'm in a small research group so I have to find computation resources by myself.