Hi Jun;
the only principal difference with using complete sequences vs only SNPs is that you add invariable sites, thus, would not need to correct for the so-called ascertainment bias.
"Too few patterns" indicates that your overall divergence in the data is already low, pending at which hierarchical level (taxonomy-wise), you're working and the gene set you extracted from the GBS run, the divergence between the tips in your data sets, the OTUs, may be just too low to infer a probabilistic tree.
Try to establish the Phytia score, is your data fit for tree-ing?
The main reason for poor Phytia scores in phylogenomic data is the general low divergence between the genes that can be identified as homologues, but more importantly that, especially at the genus-level, we have a lot of nearly identical or only randomly differing tips. That is, we often feed a lot of topology-indifferent data into our tree-inference programmes.
If your data ends up in the unfortunate to impossible score range:
Calculate the pairwise distances, simple Hamming will do, and make a heat-map and a neighbour-net. Use the circular arrangement in the neighbour-net to sort the heat-map to see which groups of tips are trivial (high ingroup coherence, distinct to any other set of tips), which ones are pointless to tree (generating very flat terminal subtrees, spider-cocoon-like graph portions in the neighbour-net), and where there is something a ML tree inference can work with. Then reduce the tip set to a set of placeholders that are suffitiently divergent to tree (i.e. a set producing a better Phytia score)
Cheers, Guido
Here an example for a very quick genomic similarity assessment. The sections of maples are signal-wise trivial but within the sections (coloured, note the very lush green areas in the heat-map); it's often random-noise that ends up building the tree (the tips, each bubble represent an individual of a distinct species, with long terminal edges). The centre part is spider-web like but fully resolved in a ML tree, this is where probabilistic methods go in their inference beyond the trivial. Note that this is
not a worst-case data scenario, I haven't calculated the Phytia score but I suppose it's well in the possible-to-tree range. Pics are from the related Res.I.P. posts [
Big Data = No Brain? #1][
Big Data = No Brain?#2]