Dear Peter
I do hope that you will get to read about our buffalo story. But first do we have a story?
Our hypothesised 'reduced' models have 14 or 15 parameters - 7 theta and 7 or 8 migration paths, but most posterior distributions in these runs are like those for the full model that you have seen - i.e. flat distributions,
Does this mean that our mtDNA sequence data cannot recover useful information, or is this information still useful to some degree - see below?
Would increasing the number of replicates, or any other modification of the run specifications, improve the situation?
Given the results, is it still valid to compare the gene flow models using Bayes factors? As noted before, this was the object of the exercise, and log-probabilities do differ. For example, for the Dloop data, the estimates of the log-probability of the data (Bezier approx.) are -
Full model -2301.10
reduced models 1 -2256.31
2 -2258.18
3 -2259.13
4 -2256.20
(2 other models still running). For twoloci and Cytb, differences among the log-probs for different models are even greater.
Again - many thanks in advance!!!
Stuart
On 21/11/2013 11:45 PM, Peter Beerli wrote:
Stuart,
given that the full model is not able to give meaningful results, I suggest to look at a ‘reasonable’ model (I am sure you have a prior belief which models are better than others [sometimes it helps to write them down before you do the experiment/comparison]) and check whether that is similarly problematic, I am sure models with few parameters will give less problems. This opens the discussion about what is the correct model? The model order (or probability) will depend on the different models that are tested AND how well the data can fit them. As a result, one is only able to say model x is the best (most plausible) model given the model set and data; but then, this is true even when not stated that explicitly.
Stuart,
I believe that you are still able to compare the outcomes and I would order your models, like this
4 � � 0.479087722518817091 � � 0.42918313563377292�2 � � 0.066147276453184523�3 � � 0.025581865394225429�Full �1.5156286423254215e-20��Model 4 and 1 seems the most plausible, I actually would combine the dloop and cyt loci and use site rate viation (estimate the parameters for�
F84 + Gamma for example in paup* then use the alpha for all the analyses, the combination of cyt and dloop should sharpen your answer.
Perhaps you could cut down on the population size parameters and assume that all the populations are similar �> using one averaged size?
or combine regions? For example north versus south, or east/west.
Peter
On Nov 21, 2013, at 9:52 PM, Stuart Barker (Prof) <sba...@une.edu.au> wrote:
Dear Peter
I do hope that you will get to read about our buffalo story. But first do we have a story?
Our hypothesised 'reduced' models have 14 or 15 parameters - 7 theta and 7 or 8 migration paths, but most posterior distributions in these runs are like those for the full model that you have seen - i.e. flat distributions,
Does this mean that our mtDNA sequence data cannot recover useful information, or is this information still useful to some degree - see below?
Would increasing the number of replicates, or any other modification of the run specifications, improve the situation?
Given the results, is it still valid to compare the gene flow models using Bayes factors? As noted before, this was the object of the exercise, and log-probabilities do differ. For example, for the Dloop data, the estimates of the log-probability of the data (Bezier approx.) are -
Full model ���-2301.10
reduced models 1 ���-2256.31
�����������������������2 ���-2258.18
�����������������������3 ���-2259.13
�����������������������4 �����-2256.20
(2 other models still running). For twoloci and Cytb, differences among the log-probs for different models are even greater.
Again - many thanks in advance!!!
Stuart
On 21/11/2013 11:45 PM, Peter Beerli wrote:
Stuart,
given that the full model is not able to give meaningful results, I suggest to look at a �reasonable� model (I am sure you have a prior belief which models are better than others [sometimes it helps to write them down before you do the experiment/comparison]) and check whether that is similarly problematic, I am sure models with few parameters will give less problems. This opens the discussion about what is the correct model? The model order (or probability) will depend on the different models that are tested AND how well the data can fit them. As a result, one is only able to say model x is the best (most plausible) model given the model set and data; but then, this is true even when not stated that explicitly.
I would love to read more about your buffalo story and how the models turn out.
Peter
On Nov 20, 2013, at 9:19 PM, Stuart Barker (Prof)<sba...@une.edu.au> �wrote:
��
Dear Peter
I do not know how you do it - keeping up with everyone's queries re migrate. But I for one certainly appreciate your help and advice.
Our species is the swamp buffalo throughout SE Asia and China - some 30 populations grouped (on genetic data and geography) into 7 regional groups. We have full sequence on Cytb and Dloop for 913 animals, but use a sample of 20 from each region.
We hypothesise six different models of migration that we think may give clues to the region of domestication of this species, and its subsequent spread to its current distribution, and have run the full model as a baseline. Thus our interest is not in the full model per se, but in comparing gene flow models using Bayes factors.
I am running the models for the Dloop data alone, for Cytb alone, and for the combined sequences - curiosity to see how the results compare.
Thanks
Stuart
On 21/11/2013 10:24 AM, Peter Beerli wrote:
����
Stuart,
Your run shows clearly that your single locus data will not support a consistent estimation of 49 parameters,
essentially you recover the uniform prior. Longer runs will make the posterior even more flat. I suggest that you
use the custom migration matrix and reduce the number of migration parameters considerably (Anecdotical: I have rarely seen examples from a single DNA locus that can estimate more than 16 parameters). If you tell me a little more about the geographic organization and what type of species the data represents, then I can help in more detail.
Peter
On Nov 20, 2013, at 5:09 PM, Stuart Barker (Prof)<sba...@une.edu.au> ��wrote:
������
Hi
I have a migrate run of one mtDNA sequence, 7 populations, full
migration model and the pdf outfile is attached.
In terms of general criteria for convergence, modes are within the 50%
credibility interval, reasonable agreement of mean and median,
acceptance ratio near 10% (maybe should be higher), and ESS all
parameters> ���1000.
��������
������
--
J.S.F. (Stuart) Barker HonDSc FTSE
Emeritus Professor
School of Environmental and Rural Science
University of New England
Armidale NSW 2351
Australia
Honorary Professor
Faculty of Veterinary Science
University of Sydney
Sydney NSW
Home: 5/19-23 Oaklands St.
Mittagong
NSW 2575
Ph. HOME ++ 612 48722490
email - sba...@une.edu.au
--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/groups/opt_out.
������
Dear Peter
Great to know that I can still compare the outcomes for the different migration models. Below you order the models in the same way as I would using the log-probability of the data estimates that I gave in my email, but what are these values that you present??
As noted, I have been running the data for Cytb alone, Dloop alone and the combined two loci primarily for curiosity as to how they would compare. I expected that the two loci analyses would be most useful. You suggest estimating the parameters and using the alpha (gamma setting) - but I understand that this is only applicable to ML, and I am using Bayesian. Is this OK?
You also suggest using one averaged size - to reduce the number of parameters to be estimated. I assume this means using 'm' for all diagonal entries in the custom migration set-up - correct? Would you expect this to improve the estimation of the migration parameters?
As always - many thanks.
Stuart
On 22/11/2013 11:32 PM, Peter Beerli wrote:
Stuart,
I believe that you are still able to compare the outcomes and I would order your models, like this
4 0.479087722518817091 0.429183135633772922 0.0661472764531845233 0.025581865394225429Full 1.5156286423254215e-20
Model 4 and 1 seems the most plausible, I actually would combine the dloop and cyt loci and use site rate viation (estimate the parameters for
F84 + Gamma for example in paup* then use the alpha for all the analyses, the combination of cyt and dloop should sharpen your answer.
Perhaps you could cut down on the population size parameters and assume that all the populations are similar —> using one averaged size?
or combine regions? For example north versus south, or east/west.
Peter
On Nov 21, 2013, at 9:52 PM, Stuart Barker (Prof) <sba...@une.edu.au> wrote:
Dear Peter
I do hope that you will get to read about our buffalo story. But first do we have a story?
Our hypothesised 'reduced' models have 14 or 15 parameters - 7 theta and 7 or 8 migration paths, but most posterior distributions in these runs are like those for the full model that you have seen - i.e. flat distributions,
Does this mean that our mtDNA sequence data cannot recover useful information, or is this information still useful to some degree - see below?
Would increasing the number of replicates, or any other modification of the run specifications, improve the situation?
Given the results, is it still valid to compare the gene flow models using Bayes factors? As noted before, this was the object of the exercise, and log-probabilities do differ. For example, for the Dloop data, the estimates of the log-probability of the data (Bezier approx.) are -
Full model -2301.10
reduced models 1 -2256.31
2 -2258.18
3 -2259.13
4 -2256.20
(2 other models still running). For twoloci and Cytb, differences among the log-probs for different models are even greater.
Again - many thanks in advance!!!
Stuart
On 21/11/2013 11:45 PM, Peter Beerli wrote:
Stuart,
given that the full model is not able to give meaningful results, I suggest to look at a ‘reasonable’ model (I am sure you have a prior belief which models are better than others [sometimes it helps to write them down before you do the experiment/comparison]) and check whether that is similarly problematic, I am sure models with few parameters will give less problems. This opens the discussion about what is the correct model? The model order (or probability) will depend on the different models that are tested AND how well the data can fit them. As a result, one is only able to say model x is the best (most plausible) model given the model set and data; but then, this is true even when not stated that explicitly.
I would love to read more about your buffalo story and how the models turn out.
Peter
On Nov 20, 2013, at 9:19 PM, Stuart Barker (Prof)<sba...@une.edu.au> wrote:
Dear Peter
I do not know how you do it - keeping up with everyone's queries re migrate. But I for one certainly appreciate your help and advice.
Our species is the swamp buffalo throughout SE Asia and China - some 30 populations grouped (on genetic data and geography) into 7 regional groups. We have full sequence on Cytb and Dloop for 913 animals, but use a sample of 20 from each region.
We hypothesise six different models of migration that we think may give clues to the region of domestication of this species, and its subsequent spread to its current distribution, and have run the full model as a baseline. Thus our interest is not in the full model per se, but in comparing gene flow models using Bayes factors.
I am running the models for the Dloop data alone, for Cytb alone, and for the combined sequences - curiosity to see how the results compare.
Thanks
Stuart
On 21/11/2013 10:24 AM, Peter Beerli wrote:
Stuart,
Your run shows clearly that your single locus data will not support a consistent estimation of 49 parameters,
essentially you recover the uniform prior. Longer runs will make the posterior even more flat. I suggest that you
use the custom migration matrix and reduce the number of migration parameters considerably (Anecdotical: I have rarely seen examples from a single DNA locus that can estimate more than 16 parameters). If you tell me a little more about the geographic organization and what type of species the data represents, then I can help in more detail.
Peter
On Nov 20, 2013, at 5:09 PM, Stuart Barker (Prof)<sba...@une.edu.au> wrote:
Hi
I have a migrate run of one mtDNA sequence, 7 populations, full
migration model and the pdf outfile is attached.
In terms of general criteria for convergence, modes are within the 50%
credibility interval, reasonable agreement of mean and median,
acceptance ratio near 10% (maybe should be higher), and ESS all
parameters> 1000.
Stuart,
(2) setting the thetas to �m� reduces the number of parameters at the cost that all popsizes are the same (in situations where say the carrying capacity of a population is defined by similar habitats/outside constraints this should be fine. Reducing the number of parameters will increase the confidence in the other parameters whether the variances improve depends.
(1) site rate variation is not equivalent to the estimation of mutation rate differences among loci (your ML reference),Put your data into a phylogenetic program (e.g. paup*) set the mutation model to F84 and allow for relative rates among sites and then estimate
the alpha parameter.�
in paup* this usuall boils down to do�
fix the migrate infile to a phylip infile (removing pop headers and make a few other changes)start paup* (in the commandline aup*)tonex from=infile.phy to=infile.nexexecute infile.nexnjlset variant=f84 rates=gamma shape=estimatelscore
read the alpha value from the output*************
Then in migrate under the sequence data model in the data model section adjust the���5 � Site rate variation?� � � � � � � � � � � � � � � � � � � � YES
and then it will show
� Regional rates:
Number of categories (1-9)?
I suggest to enter 3 (or 4)then you will be asked to enter the alpha value or rates, no enter the value thatwas calculated in paup*, this is the text (and I used 0.2 for the example)
Either enter the Shape parameter alpha for Gamma deviated rates*OR* enter the rates for each category (use a space to separate)===>0.2
this will use relative rates like this
Region type � � Rate of change� � Probability---------------------------------------------� � � � 1 � � � � � 0.428� � � � � � 0.680� � � � 2 � � � � � 2.081� � � � � � 0.308� � � � 3 � � � � � 5.491� � � � � � 0.012
this allows for the mutation differences in d-loop and cyt and also differences in 1,2,3rd positions[the same way phylogenetic analyses do]
all clear?[section 1, may need more explanation]Peter
On Nov 22, 2013, at 7:10 PM, Stuart Barker (Prof) <sba...@une.edu.au> wrote:
Dear Peter
Great to know that I can still compare the outcomes for the different migration models. Below you order the models in the same way as I would using the log-probability of the data estimates that I gave in my email, but what are these values that you present??
As noted, I have been running the data for Cytb alone, Dloop alone and the combined two loci primarily for curiosity as to how they would compare. I expected that the two loci analyses would be most useful. You suggest estimating the parameters and using the alpha (gamma setting) - but I understand that this is only applicable to ML, and I am using Bayesian. Is this OK?
You also suggest using one averaged size - to reduce the number of parameters to be estimated. I assume this means using 'm' for all diagonal entries in the custom migration set-up - correct? Would you expect this to improve the estimation of the migration parameters?
As always - many thanks.
Stuart
On 22/11/2013 11:32 PM, Peter Beerli wrote:
Stuart,
I believe that you are still able to compare the outcomes and I would order your models, like this
4 � � 0.479087722518817091 � � 0.42918313563377292�2 � � 0.066147276453184523�3 � � 0.025581865394225429�Full �1.5156286423254215e-20��
Model 4 and 1 seems the most plausible, I actually would combine the dloop and cyt loci and use site rate viation (estimate the parameters for�
F84 + Gamma for example in paup* then use the alpha for all the analyses, the combination of cyt and dloop should sharpen your answer.
Perhaps you could cut down on the population size parameters and assume that all the populations are similar �> using one averaged size?
or combine regions? For example north versus south, or east/west.
Peter
On Nov 21, 2013, at 9:52 PM, Stuart Barker (Prof) <sba...@une.edu.au> wrote:
Dear Peter
I do hope that you will get to read about our buffalo story. But first do we have a story?
Our hypothesised 'reduced' models have 14 or 15 parameters - 7 theta and 7 or 8 migration paths, but most posterior distributions in these runs are like those for the full model that you have seen - i.e. flat distributions,
Does this mean that our mtDNA sequence data cannot recover useful information, or is this information still useful to some degree - see below?
Would increasing the number of replicates, or any other modification of the run specifications, improve the situation?
Given the results, is it still valid to compare the gene flow models using Bayes factors? As noted before, this was the object of the exercise, and log-probabilities do differ. For example, for the Dloop data, the estimates of the log-probability of the data (Bezier approx.) are -
Full model ���-2301.10
reduced models 1 ���-2256.31
�����������������������2 ���-2258.18
�����������������������3 ���-2259.13
�����������������������4 �����-2256.20
(2 other models still running). For twoloci and Cytb, differences among the log-probs for different models are even greater.
Again - many thanks in advance!!!
Stuart
On 21/11/2013 11:45 PM, Peter Beerli wrote:
Stuart,
given that the full model is not able to give meaningful results, I suggest to look at a �reasonable� model (I am sure you have a prior belief which models are better than others [sometimes it helps to write them down before you do the experiment/comparison]) and check whether that is similarly problematic, I am sure models with few parameters will give less problems. This opens the discussion about what is the correct model? The model order (or probability) will depend on the different models that are tested AND how well the data can fit them. As a result, one is only able to say model x is the best (most plausible) model given the model set and data; but then, this is true even when not stated that explicitly.
I would love to read more about your buffalo story and how the models turn out.
Peter
On Nov 20, 2013, at 9:19 PM, Stuart Barker (Prof)<sba...@une.edu.au> �wrote:
��
Dear Peter
I do not know how you do it - keeping up with everyone's queries re migrate. But I for one certainly appreciate your help and advice.
Our species is the swamp buffalo throughout SE Asia and China - some 30 populations grouped (on genetic data and geography) into 7 regional groups. We have full sequence on Cytb and Dloop for 913 animals, but use a sample of 20 from each region.
We hypothesise six different models of migration that we think may give clues to the region of domestication of this species, and its subsequent spread to its current distribution, and have run the full model as a baseline. Thus our interest is not in the full model per se, but in comparing gene flow models using Bayes factors.
I am running the models for the Dloop data alone, for Cytb alone, and for the combined sequences - curiosity to see how the results compare.
Thanks
Stuart
On 21/11/2013 10:24 AM, Peter Beerli wrote:
����
Stuart,
Your run shows clearly that your single locus data will not support a consistent estimation of 49 parameters,
essentially you recover the uniform prior. Longer runs will make the posterior even more flat. I suggest that you
use the custom migration matrix and reduce the number of migration parameters considerably (Anecdotical: I have rarely seen examples from a single DNA locus that can estimate more than 16 parameters). If you tell me a little more about the geographic organization and what type of species the data represents, then I can help in more detail.
Peter
On Nov 20, 2013, at 5:09 PM, Stuart Barker (Prof)<sba...@une.edu.au> ��wrote:
������
Hi
I have a migrate run of one mtDNA sequence, 7 populations, full
migration model and the pdf outfile is attached.
In terms of general criteria for convergence, modes are within the 50%
credibility interval, reasonable agreement of mean and median,
acceptance ratio near 10% (maybe should be higher), and ESS all
parameters> ���1000.
But none of the posterior distributions look very nice. What does this
mean in terms of interpretation?
Comment/advice will be much appreciated.
Stuart
--