Hi Remco/David/Others,
SNAPP is really great - thanks for your hard work! Remco - your BEAST blog posts are awesome! Relative to SNAPP, i have a quick question pertaining to missing data. currently, we've coded sites having individuals lacking a SNP call using "?", as suggested on the mailing list. e.g.,:
#NEXUS
begin data;
dimensions ntax=28 nchar=4430;
format datatype=standard missing=? gap=-;
matrix
103200 0000000000000200100001
144749 ??020???000???00?????0
67235 ?000000210002000020000
;
end;
this works fine in SNAPP, as expected given posts on the mailing list. my question is whether "support" for missing data simply means BEAUTI/SNAPP will allow the user to enter a nexus file with missing data coded as "?", while the sites having missing data are excluded prior to the analysis - something that seems to be happening given the output when I load the XML from the above file (in it's entirety) with `beast -threads 12 TIPS.xml` and see the following in the log (to stdout):
> WARNING: removed 3037 sites becaues they have one or more branches without data.
So, this suggests I can enter sites having missing data no problem, and that such an analysis runs just fine (the results from the run above look sensible so far)... but the warning message also suggests the data going into the analysis are only those sites having SNP calls across 100% of individuals in the analysis (also seen w/ the number of sites/patterns that follow the warning above). I'm using BEAST v2.1.3 and SNAPP v1.1.5 on linux (via the CLI).
Thanks very much for your thoughts - we're looking forward to using SNAPP a lot more!
Sincerely,
Brant