Stacey and Collapse height parameter

153 views
Skip to first unread message

Marie

unread,
Oct 8, 2019, 4:42:41 AM10/8/19
to beast-users
Hi all,

I ran stacey on my dataset and tried different Collapse height parameter values 0.001, 0.0001, 0.0005 ... and everytime I get a different result:
For 0.001 I got 7 clusters, fraction = 0.72
For 0.0005 I got 9 clusters, fraction = 0.11
For 0.0001 I got 13 clusters, fraction = 0.56
I can pretty much 'decide' how many clusters I'm going to get by changing collapse height :s 

Normally, the results should be stable over a wide range of values... So does that mean my dataset is too weak to show anything? I do not have a big dataset, just 3 genes, two mitochondrial (COI, CytB) and one nuclear (ITS2).
I tried 'sample from the prior only' and compared the trace to the posterior to see if results are too much influenced by the prior because the dataset is not informative enough, but  it doesn't seem so, for most parameters the posterior is more precise/does not completely overlap with prior.

Thanks a lot for your insights and advices,

Bests,
Marie

Juan Carlos Zamora Señoret

unread,
Oct 8, 2019, 10:16:53 AM10/8/19
to beast...@googlegroups.com
Dear Marie,

You indeed have only two "loci", since you would not expect recombination among the mitochondrial genes, and thus they should most likely share the same tree (and probably you should link those trees). If you are working with closely related species and you expect, e.g., incomplete lineage sorting, it may happen that only two loci are not enough to provide the true species history.
Also, how many samples per putative species do you have? When species are very closely related or when you have a single species with considerable variation, you may expect species delimitation analyses failing to some extent, particularly when the sampling is uneven, and when you have either or both a low number of specimens and loci.

Anyway, if by changing epsilon you get such different results, it is probably because your clusters have and height very on the edge of the collapse height parameter values you are using. I.e., you have variation for some clades, but that variation is comparatively small, thus in the limit of what STACEY is collapsing as single species. Note that this was already commented in the DISSECT paper: " If epsilon is too large it will not be possible to distinguish very recent divergences". Your largest epsilon value, 0.001, looks quite large to me, as the most usual values are around 10E-4 to 10E-5. Then, you are perhaps underestimating the number of species with that prior. But, again, all depends on the number of specimens you have, and if only two loci are providing enough information for the analysis. Also, do not forget to consider other sources of information to decide among putative species numbers, such as ecology, phenotypic characters, etc. STACEY will give you putative species, but you are the one who will decide whether those should be recognized as true species or not.

I hope to see Graham's answer on this topic.

Best wishes,

Juan Carlos Zamora

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/1e5b3ada-5038-45a3-8da9-e4a946e7821c%40googlegroups.com.

Graham

unread,
Oct 9, 2019, 3:37:32 AM10/9/19
to beast-users
Juan gave a good answer. Please also see the advice in 2.2.3 of the manual. In particular note the value "is simply a trade-off between speed and accuracy".

Graham
Reply all
Reply to author
Forward
0 new messages