--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To post to this group, send email to structure...@googlegroups.com.
Visit this group at http://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.
"(page 20) When population structure is difficult to resolve, imposing a logistic prior and estimating its parameters using the data is likely to increase the power to detect weak structure. However, estimation of the hierarchical prior parameters by maximizing the approximate marginal likelihood also makes the model susceptible to overfitting by encouraging a small set of samples to be randomly, and often confidently, assigned to unnecessary components of the model. To correct for this, when using the logistic prior, we suggest estimating the variational parameters with multiple random restarts and using the mean of the parameters corresponding to the top 5 values of LLBO. In order to ensure consistent population labels when computing the mean, we permuted the labels for each set of variational parameter estimates to find the permutation with the lowest pairwise Jensen-Shannon divergence between admixture proportions among pairs of restarts."
As I said in another post ("poor chain mixing"), I seem to be having trouble with "overfitting" as described above for the dataset with weak structure. Running 100 reps and selecting top 25 for plotting helps in this particular case, but still feels like "seat of the pants" solution. I would greatly appreciate your guys' opinion on that.
Hi Mikhail,Excellent work! This should be rather useful.I have one question. Given that fastSTRUCTURE automatically performs iterations, is it necessary to replicate the runs as your script is set up to do? I am also copying Anil here who may have more input for us.Thanks againVikram
On Tue, Jul 15, 2014 at 3:55 PM, Mikhail Matz <cea.m...@gmail.com> wrote:
Hello - I have a few simple scripts that I cobbled together that might be useful, take a look at the _walkthrough.txt file for details.The idea is to run 100 replicates of fastStructure, select top 25 best-likelihood ones, average the assignment probabilities, and plot using ggplot2.These should work on a Mac or Linux/Unix.cheersMikhail
--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.
On a loosely-related note, PGDSpider has just (today, I think) been updated to produce a fastSTRUCTURE-formatted variant of the STRUCTURE file it was already outputting. I think this would get around a couple of the steps in your script? Even if not, it might be useful info for the wider STRUCTURE-using community.