interpretation of posterior distributions

171 views

Skip to first unread message

Stuart Barker (Prof)

unread,

Nov 20, 2013, 5:09:03 PM11/20/13

to migrate...@googlegroups.com

Hi

I have a migrate run of one mtDNA sequence, 7 populations, full
migration model and the pdf outfile is attached.
In terms of general criteria for convergence, modes are within the 50%
credibility interval, reasonable agreement of mean and median,
acceptance ratio near 10% (maybe should be higher), and ESS all
parameters> 1000.

But none of the posterior distributions look very nice. What does this
mean in terms of interpretation?
Comment/advice will be much appreciated.

Stuart

--
J.S.F. (Stuart) Barker HonDSc FTSE
Emeritus Professor
School of Environmental and Rural Science
University of New England
Armidale NSW 2351
Australia

Honorary Professor
Faculty of Veterinary Science
University of Sydney
Sydney NSW

Home: 5/19-23 Oaklands St.
Mittagong
NSW 2575

Ph. HOME ++ 612 48722490
email - sba...@une.edu.au

outfile_Dloop1_2.zip

Peter Beerli

unread,

Nov 20, 2013, 6:24:12 PM11/20/13

to migrate...@googlegroups.com

Stuart,

Your run shows clearly that your single locus data will not support a consistent estimation of 49 parameters,
essentially you recover the uniform prior. Longer runs will make the posterior even more flat. I suggest that you
use the custom migration matrix and reduce the number of migration parameters considerably (Anecdotical: I have rarely seen examples from a single DNA locus that can estimate more than 16 parameters). If you tell me a little more about the geographic organization and what type of species the data represents, then I can help in more detail.

Peter

> --
> You received this message because you are subscribed to the Google Groups "migrate-support" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
> To post to this group, send email to migrate...@googlegroups.com.
> Visit this group at http://groups.google.com/group/migrate-support.
> For more options, visit https://groups.google.com/groups/opt_out.
> <outfile_Dloop1_2.zip>

Stuart Barker (Prof)

unread,

Nov 20, 2013, 9:19:12 PM11/20/13

to migrate...@googlegroups.com

Dear Peter
I do not know how you do it - keeping up with everyone's queries re
migrate. But I for one certainly appreciate your help and advice.

Our species is the swamp buffalo throughout SE Asia and China - some 30
populations grouped (on genetic data and geography) into 7 regional
groups. We have full sequence on Cytb and Dloop for 913 animals, but use
a sample of 20 from each region.

We hypothesise six different models of migration that we think may give
clues to the region of domestication of this species, and its subsequent
spread to its current distribution, and have run the full model as a
baseline. Thus our interest is not in the full model per se, but in
comparing gene flow models using Bayes factors.

I am running the models for the Dloop data alone, for Cytb alone, and
for the combined sequences - curiosity to see how the results compare.

Thanks
Stuart

Peter Beerli

unread,

Nov 21, 2013, 7:45:58 AM11/21/13

to migrate...@googlegroups.com

Stuart,

given that the full model is not able to give meaningful results, I suggest to look at a ‘reasonable’ model (I am sure you have a prior belief which models are better than others [sometimes it helps to write them down before you do the experiment/comparison]) and check whether that is similarly problematic, I am sure models with few parameters will give less problems. This opens the discussion about what is the correct model? The model order (or probability) will depend on the different models that are tested AND how well the data can fit them. As a result, one is only able to say model x is the best (most plausible) model given the model set and data; but then, this is true even when not stated that explicitly.
I would love to read more about your buffalo story and how the models turn out.

Peter

Stuart Barker (Prof)

unread,

Nov 21, 2013, 9:52:31 PM11/21/13

to migrate...@googlegroups.com

Dear Peter

I do hope that you will get to read about our buffalo story. But first
do we have a story?
Our hypothesised 'reduced' models have 14 or 15 parameters - 7 theta and
7 or 8 migration paths, but most posterior distributions in these runs
are like those for the full model that you have seen - i.e. flat
distributions,

Does this mean that our mtDNA sequence data cannot recover useful
information, or is this information still useful to some degree - see below?

Would increasing the number of replicates, or any other modification of
the run specifications, improve the situation?

Given the results, is it still valid to compare the gene flow models
using Bayes factors? As noted before, this was the object of the
exercise, and log-probabilities do differ. For example, for the Dloop
data, the estimates of the log-probability of the data (Bezier approx.)
are -
Full model -2301.10
reduced models 1 -2256.31
2 -2258.18
3 -2259.13
4 -2256.20
(2 other models still running). For twoloci and Cytb, differences among
the log-probs for different models are even greater.

Again - many thanks in advance!!!

Stuart

On 21/11/2013 11:45 PM, Peter Beerli wrote:
> Stuart,
>

> given that the full model is not able to give meaningful results, I suggest to look at a ï¿½reasonableï¿½ model (I am sure you have a prior belief which models are better than others [sometimes it helps to write them down before you do the experiment/comparison]) and check whether that is similarly problematic, I am sure models with few parameters will give less problems. This opens the discussion about what is the correct model? The model order (or probability) will depend on the different models that are tested AND how well the data can fit them. As a result, one is only able to say model x is the best (most plausible) model given the model set and data; but then, this is true even when not stated that explicitly.

Peter Beerli

unread,

Nov 22, 2013, 7:32:54 AM11/22/13

to migrate...@googlegroups.com

Stuart,

I believe that you are still able to compare the outcomes and I would order your models, like this

4     0.47908772251881709

1     0.42918313563377292 

2     0.066147276453184523 

3     0.025581865394225429 

Full  1.5156286423254215e-20 

Model 4 and 1 seems the most plausible, I actually would combine the dloop and cyt loci and use site rate viation (estimate the parameters for

F84 + Gamma for example in paup* then use the alpha for all the analyses, the combination of cyt and dloop should sharpen your answer.

Perhaps you could cut down on the population size parameters and assume that all the populations are similar —> using one averaged size?

or combine regions? For example north versus south, or east/west.

Peter

On Nov 21, 2013, at 9:52 PM, Stuart Barker (Prof) <sba...@une.edu.au> wrote:

Dear Peter

I do hope that you will get to read about our buffalo story. But first do we have a story?
Our hypothesised 'reduced' models have 14 or 15 parameters - 7 theta and 7 or 8 migration paths, but most posterior distributions in these runs are like those for the full model that you have seen - i.e. flat distributions,

Does this mean that our mtDNA sequence data cannot recover useful information, or is this information still useful to some degree - see below?

Would increasing the number of replicates, or any other modification of the run specifications, improve the situation?

Given the results, is it still valid to compare the gene flow models using Bayes factors? As noted before, this was the object of the exercise, and log-probabilities do differ. For example, for the Dloop data, the estimates of the log-probability of the data (Bezier approx.) are -
Full model    -2301.10
reduced models 1    -2256.31
                       2    -2258.18
                       3    -2259.13
                       4      -2256.20
(2 other models still running). For twoloci and Cytb, differences among the log-probs for different models are even greater.

Again - many thanks in advance!!!

Stuart

On 21/11/2013 11:45 PM, Peter Beerli wrote:

Stuart,

given that the full model is not able to give meaningful results, I suggest to look at a ‘reasonable’ model (I am sure you have a prior belief which models are better than others [sometimes it helps to write them down before you do the experiment/comparison]) and check whether that is similarly problematic, I am sure models with few parameters will give less problems. This opens the discussion about what is the correct model? The model order (or probability) will depend on the different models that are tested AND how well the data can fit them. As a result, one is only able to say model x is the best (most plausible) model given the model set and data; but then, this is true even when not stated that explicitly.

Stuart Barker (Prof)

unread,

Nov 22, 2013, 7:10:05 PM11/22/13

to migrate...@googlegroups.com

Dear Peter

Great to know that I can still compare the outcomes for the different migration models. Below you order the models in the same way as I would using the log-probability of the data estimates that I gave in my email, but what are these values that you present??

As noted, I have been running the data for Cytb alone, Dloop alone and the combined two loci primarily for curiosity as to how they would compare. I expected that the two loci analyses would be most useful. You suggest estimating the parameters and using the alpha (gamma setting) - but I understand that this is only applicable to ML, and I am using Bayesian. Is this OK?

You also suggest using one averaged size - to reduce the number of parameters to be estimated. I assume this means using 'm' for all diagonal entries in the custom migration set-up - correct? Would you expect this to improve the estimation of the migration parameters?

As always - many thanks.
Stuart

On 22/11/2013 11:32 PM, Peter Beerli wrote:

Stuart,

I believe that you are still able to compare the outcomes and I would order your models, like this

4 ï¿½ ï¿½ 0.47908772251881709

1 ï¿½ ï¿½ 0.42918313563377292ï¿½

2 ï¿½ ï¿½ 0.066147276453184523ï¿½

3 ï¿½ ï¿½ 0.025581865394225429ï¿½

Full ï¿½1.5156286423254215e-20ï¿½

ï¿½

Model 4 and 1 seems the most plausible, I actually would combine the dloop and cyt loci and use site rate viation (estimate the parameters forï¿½

F84 + Gamma for example in paup* then use the alpha for all the analyses, the combination of cyt and dloop should sharpen your answer.

Perhaps you could cut down on the population size parameters and assume that all the populations are similar ï¿½> using one averaged size?

or combine regions? For example north versus south, or east/west.

Peter

On Nov 21, 2013, at 9:52 PM, Stuart Barker (Prof) <sba...@une.edu.au> wrote:

Dear Peter

I do hope that you will get to read about our buffalo story. But first do we have a story?
Our hypothesised 'reduced' models have 14 or 15 parameters - 7 theta and 7 or 8 migration paths, but most posterior distributions in these runs are like those for the full model that you have seen - i.e. flat distributions,

Does this mean that our mtDNA sequence data cannot recover useful information, or is this information still useful to some degree - see below?

Would increasing the number of replicates, or any other modification of the run specifications, improve the situation?

Given the results, is it still valid to compare the gene flow models using Bayes factors? As noted before, this was the object of the exercise, and log-probabilities do differ. For example, for the Dloop data, the estimates of the log-probability of the data (Bezier approx.) are -

Full model ï¿½ï¿½ï¿½-2301.10
reduced models 1 ï¿½ï¿½ï¿½-2256.31
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½2 ï¿½ï¿½ï¿½-2258.18
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½3 ï¿½ï¿½ï¿½-2259.13
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½4 ï¿½ï¿½ï¿½ï¿½ï¿½-2256.20

(2 other models still running). For twoloci and Cytb, differences among the log-probs for different models are even greater.

Again - many thanks in advance!!!

Stuart

On 21/11/2013 11:45 PM, Peter Beerli wrote:

Stuart,

given that the full model is not able to give meaningful results, I suggest to look at a ï¿½reasonableï¿½ model (I am sure you have a prior belief which models are better than others [sometimes it helps to write them down before you do the experiment/comparison]) and check whether that is similarly problematic, I am sure models with few parameters will give less problems. This opens the discussion about what is the correct model? The model order (or probability) will depend on the different models that are tested AND how well the data can fit them. As a result, one is only able to say model x is the best (most plausible) model given the model set and data; but then, this is true even when not stated that explicitly.

I would love to read more about your buffalo story and how the models turn out.

Peter

On Nov 20, 2013, at 9:19 PM, Stuart Barker (Prof)<sba...@une.edu.au> ï¿½wrote:

ï¿½ï¿½

Dear Peter
I do not know how you do it - keeping up with everyone's queries re migrate. But I for one certainly appreciate your help and advice.

Our species is the swamp buffalo throughout SE Asia and China - some 30 populations grouped (on genetic data and geography) into 7 regional groups. We have full sequence on Cytb and Dloop for 913 animals, but use a sample of 20 from each region.

We hypothesise six different models of migration that we think may give clues to the region of domestication of this species, and its subsequent spread to its current distribution, and have run the full model as a baseline. Thus our interest is not in the full model per se, but in comparing gene flow models using Bayes factors.

I am running the models for the Dloop data alone, for Cytb alone, and for the combined sequences - curiosity to see how the results compare.

Thanks
Stuart

On 21/11/2013 10:24 AM, Peter Beerli wrote:

ï¿½ï¿½ï¿½ï¿½

Stuart,

Your run shows clearly that your single locus data will not support a consistent estimation of 49 parameters,
essentially you recover the uniform prior. Longer runs will make the posterior even more flat. I suggest that you
use the custom migration matrix and reduce the number of migration parameters considerably (Anecdotical: I have rarely seen examples from a single DNA locus that can estimate more than 16 parameters). If you tell me a little more about the geographic organization and what type of species the data represents, then I can help in more detail.

Peter

On Nov 20, 2013, at 5:09 PM, Stuart Barker (Prof)<sba...@une.edu.au> ï¿½ï¿½wrote:

ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½

Hi

I have a migrate run of one mtDNA sequence, 7 populations, full
migration model and the pdf outfile is attached.
In terms of general criteria for convergence, modes are within the 50%
credibility interval, reasonable agreement of mean and median,
acceptance ratio near 10% (maybe should be higher), and ESS all

parameters> ï¿½ï¿½ï¿½1000.

ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½

ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½

--
J.S.F. (Stuart) Barker HonDSc FTSE
Emeritus Professor
School of Environmental and Rural Science
University of New England
Armidale NSW 2351
Australia

Honorary Professor
Faculty of Veterinary Science
University of Sydney
Sydney NSW

Home: 5/19-23 Oaklands St.
Mittagong
NSW 2575

Ph. HOME ++ 612 48722490
email - sba...@une.edu.au

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/groups/opt_out.

ï¿½ï¿½ï¿½ï¿½

ï¿½ï¿½

Peter Beerli

unread,

Nov 22, 2013, 10:42:24 PM11/22/13

to migrate...@googlegroups.com

Stuart,

(2) setting the thetas to ‘m’ reduces the number of parameters at the cost that all popsizes are the same (in situations where say the carrying capacity of a population is defined by similar habitats/outside constraints this should be fine. Reducing the number of parameters will increase the confidence in the other parameters whether the variances improve depends.

(1) site rate variation is not equivalent to the estimation of mutation rate differences among loci (your ML reference),

Put your data into a phylogenetic program (e.g. paup*) set the mutation model to F84 and allow for relative rates among sites and then estimate

the alpha parameter.

in paup* this usuall boils down to do

fix the migrate infile to a phylip infile (removing pop headers and make a few other changes)

start paup* (in the commandline aup*)

tonex from=infile.phy to=infile.nex

execute infile.nex

lset variant=f84 rates=gamma shape=estimate

lscore

read the alpha value from the output

*************

Then in migrate under the sequence data model in the data model section adjust the

  5   Site rate variation?                                        YES

and then it will show

  Regional rates:

Number of categories (1-9)?

I suggest to enter 3 (or 4)

then you will be asked to enter the alpha value or rates, no enter the value that

was calculated in paup*, this is the text (and I used 0.2 for the example)

Either enter the Shape parameter alpha for Gamma deviated rates

*OR* enter the rates for each category (use a space to separate)

===>0.2

this will use relative rates like this

Region type     Rate of change    Probability

---------------------------------------------

        1           0.428            0.680

        2           2.081            0.308

        3           5.491            0.012

this allows for the mutation differences in d-loop and cyt and also differences in 1,2,3rd positions

[the same way phylogenetic analyses do]

all clear?[section 1, may need more explanation]

Peter

On Nov 22, 2013, at 7:10 PM, Stuart Barker (Prof) <sba...@une.edu.au> wrote:

Dear Peter

Great to know that I can still compare the outcomes for the different migration models. Below you order the models in the same way as I would using the log-probability of the data estimates that I gave in my email, but what are these values that you present??

As noted, I have been running the data for Cytb alone, Dloop alone and the combined two loci primarily for curiosity as to how they would compare. I expected that the two loci analyses would be most useful. You suggest estimating the parameters and using the alpha (gamma setting) - but I understand that this is only applicable to ML, and I am using Bayesian. Is this OK?

You also suggest using one averaged size - to reduce the number of parameters to be estimated. I assume this means using 'm' for all diagonal entries in the custom migration set-up - correct? Would you expect this to improve the estimation of the migration parameters?

As always - many thanks.
Stuart

On 22/11/2013 11:32 PM, Peter Beerli wrote:

Stuart,

I believe that you are still able to compare the outcomes and I would order your models, like this

4 0.47908772251881709

1 0.42918313563377292

2 0.066147276453184523

3 0.025581865394225429

Full 1.5156286423254215e-20

Model 4 and 1 seems the most plausible, I actually would combine the dloop and cyt loci and use site rate viation (estimate the parameters for

F84 + Gamma for example in paup* then use the alpha for all the analyses, the combination of cyt and dloop should sharpen your answer.

Perhaps you could cut down on the population size parameters and assume that all the populations are similar —> using one averaged size?

or combine regions? For example north versus south, or east/west.

Peter

On Nov 21, 2013, at 9:52 PM, Stuart Barker (Prof) <sba...@une.edu.au> wrote:

Dear Peter

I do hope that you will get to read about our buffalo story. But first do we have a story?
Our hypothesised 'reduced' models have 14 or 15 parameters - 7 theta and 7 or 8 migration paths, but most posterior distributions in these runs are like those for the full model that you have seen - i.e. flat distributions,

Does this mean that our mtDNA sequence data cannot recover useful information, or is this information still useful to some degree - see below?

Would increasing the number of replicates, or any other modification of the run specifications, improve the situation?

Given the results, is it still valid to compare the gene flow models using Bayes factors? As noted before, this was the object of the exercise, and log-probabilities do differ. For example, for the Dloop data, the estimates of the log-probability of the data (Bezier approx.) are -

Full model    -2301.10
reduced models 1    -2256.31
                       2    -2258.18
                       3    -2259.13

4 -2256.20

(2 other models still running). For twoloci and Cytb, differences among the log-probs for different models are even greater.

Again - many thanks in advance!!!

Stuart

On 21/11/2013 11:45 PM, Peter Beerli wrote:

Stuart,

given that the full model is not able to give meaningful results, I suggest to look at a ‘reasonable’ model (I am sure you have a prior belief which models are better than others [sometimes it helps to write them down before you do the experiment/comparison]) and check whether that is similarly problematic, I am sure models with few parameters will give less problems. This opens the discussion about what is the correct model? The model order (or probability) will depend on the different models that are tested AND how well the data can fit them. As a result, one is only able to say model x is the best (most plausible) model given the model set and data; but then, this is true even when not stated that explicitly.

I would love to read more about your buffalo story and how the models turn out.

Peter

On Nov 20, 2013, at 9:19 PM, Stuart Barker (Prof)<sba...@une.edu.au> wrote:

Dear Peter

I do not know how you do it - keeping up with everyone's queries re migrate. But I for one certainly appreciate your help and advice.

Our species is the swamp buffalo throughout SE Asia and China - some 30 populations grouped (on genetic data and geography) into 7 regional groups. We have full sequence on Cytb and Dloop for 913 animals, but use a sample of 20 from each region.

We hypothesise six different models of migration that we think may give clues to the region of domestication of this species, and its subsequent spread to its current distribution, and have run the full model as a baseline. Thus our interest is not in the full model per se, but in comparing gene flow models using Bayes factors.

I am running the models for the Dloop data alone, for Cytb alone, and for the combined sequences - curiosity to see how the results compare.

Thanks
Stuart

On 21/11/2013 10:24 AM, Peter Beerli wrote:

Stuart,

Your run shows clearly that your single locus data will not support a consistent estimation of 49 parameters,
essentially you recover the uniform prior. Longer runs will make the posterior even more flat. I suggest that you
use the custom migration matrix and reduce the number of migration parameters considerably (Anecdotical: I have rarely seen examples from a single DNA locus that can estimate more than 16 parameters). If you tell me a little more about the geographic organization and what type of species the data represents, then I can help in more detail.

Peter

On Nov 20, 2013, at 5:09 PM, Stuart Barker (Prof)<sba...@une.edu.au> wrote:

Hi

I have a migrate run of one mtDNA sequence, 7 populations, full
migration model and the pdf outfile is attached.
In terms of general criteria for convergence, modes are within the 50%
credibility interval, reasonable agreement of mean and median,
acceptance ratio near 10% (maybe should be higher), and ESS all

parameters> 1000.

Stuart Barker (Prof)

unread,

Nov 23, 2013, 5:01:14 AM11/23/13

to migrate...@googlegroups.com

Hi Peter

Thanks again for continuing responses.

Re (2), your reply confirms my expectation. I have started a couple of runs where each replicates an earlier one, except that now all thetas are specified as 'm', rather than 'x'. So we will see what difference that makes.

Re (1), I have many problems. I have never used PAUP and am not sure if I will be able to find anyone locally (in Sydney) who has used it and who might help me. You may not realise that I am long retired and 'out of the loop' on much of this stuff.
But if I do, I still have a problem in relating info in the MIGRATE manual to your description about inserting the alpha value into the parmfile.

How much difference do you think it might make to allow for the mutation differences in d-loop and cytb and also differences in 1,2,3rd positions? That is, is it likely to make a big difference and to be really important? Current runs for the twoloci data have the following Bezier ln ML -
Modelï¿½ï¿½ï¿½ Bezier ln ML
M3ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ -4191.55
M5ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ -4197.05
M6ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ -4203.00
M2ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ -4203.69
M1ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ -4206.62
M4ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ -4207.91

clear differences among the models.

Cheers
Stuart

On 23/11/2013 2:42 PM, Peter Beerli wrote:

Stuart,

(2) setting the thetas to ï¿½mï¿½ reduces the number of parameters at the cost that all popsizes are the same (in situations where say the carrying capacity of a population is defined by similar habitats/outside constraints this should be fine. Reducing the number of parameters will increase the confidence in the other parameters whether the variances improve depends.

(1) site rate variation is not equivalent to the estimation of mutation rate differences among loci (your ML reference),

Put your data into a phylogenetic program (e.g. paup*) set the mutation model to F84 and allow for relative rates among sites and then estimate

the alpha parameter.ï¿½

in paup* this usuall boils down to doï¿½

fix the migrate infile to a phylip infile (removing pop headers and make a few other changes)

start paup* (in the commandline aup*)

tonex from=infile.phy to=infile.nex

execute infile.nex

nj

lset variant=f84 rates=gamma shape=estimate

lscore

read the alpha value from the output

*************

Then in migrate under the sequence data model in the data model section adjust theï¿½

ï¿½ï¿½5 ï¿½ Site rate variation?ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ YES

and then it will show

ï¿½ Regional rates:

Number of categories (1-9)?

I suggest to enter 3 (or 4)

then you will be asked to enter the alpha value or rates, no enter the value that

was calculated in paup*, this is the text (and I used 0.2 for the example)

Either enter the Shape parameter alpha for Gamma deviated rates

*OR* enter the rates for each category (use a space to separate)

===>0.2

this will use relative rates like this

Region type ï¿½ ï¿½ Rate of changeï¿½ ï¿½ Probability

---------------------------------------------

ï¿½ ï¿½ ï¿½ ï¿½ 1 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ 0.428ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ 0.680

ï¿½ ï¿½ ï¿½ ï¿½ 2 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ 2.081ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ 0.308

ï¿½ ï¿½ ï¿½ ï¿½ 3 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ 5.491ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ 0.012

this allows for the mutation differences in d-loop and cyt and also differences in 1,2,3rd positions

[the same way phylogenetic analyses do]

all clear?[section 1, may need more explanation]

Peter

On Nov 22, 2013, at 7:10 PM, Stuart Barker (Prof) <sba...@une.edu.au> wrote:

Dear Peter

Great to know that I can still compare the outcomes for the different migration models. Below you order the models in the same way as I would using the log-probability of the data estimates that I gave in my email, but what are these values that you present??

As noted, I have been running the data for Cytb alone, Dloop alone and the combined two loci primarily for curiosity as to how they would compare. I expected that the two loci analyses would be most useful. You suggest estimating the parameters and using the alpha (gamma setting) - but I understand that this is only applicable to ML, and I am using Bayesian. Is this OK?

You also suggest using one averaged size - to reduce the number of parameters to be estimated. I assume this means using 'm' for all diagonal entries in the custom migration set-up - correct? Would you expect this to improve the estimation of the migration parameters?

As always - many thanks.
Stuart

On 22/11/2013 11:32 PM, Peter Beerli wrote:

Stuart,

I believe that you are still able to compare the outcomes and I would order your models, like this

4 ï¿½ ï¿½ 0.47908772251881709

1 ï¿½ ï¿½ 0.42918313563377292ï¿½

2 ï¿½ ï¿½ 0.066147276453184523ï¿½

3 ï¿½ ï¿½ 0.025581865394225429ï¿½

Full ï¿½1.5156286423254215e-20ï¿½

ï¿½

Model 4 and 1 seems the most plausible, I actually would combine the dloop and cyt loci and use site rate viation (estimate the parameters forï¿½

F84 + Gamma for example in paup* then use the alpha for all the analyses, the combination of cyt and dloop should sharpen your answer.

Perhaps you could cut down on the population size parameters and assume that all the populations are similar ï¿½> using one averaged size?

or combine regions? For example north versus south, or east/west.

Peter

On Nov 21, 2013, at 9:52 PM, Stuart Barker (Prof) <sba...@une.edu.au> wrote:

Dear Peter

I do hope that you will get to read about our buffalo story. But first do we have a story?
Our hypothesised 'reduced' models have 14 or 15 parameters - 7 theta and 7 or 8 migration paths, but most posterior distributions in these runs are like those for the full model that you have seen - i.e. flat distributions,

Does this mean that our mtDNA sequence data cannot recover useful information, or is this information still useful to some degree - see below?

Would increasing the number of replicates, or any other modification of the run specifications, improve the situation?

Given the results, is it still valid to compare the gene flow models using Bayes factors? As noted before, this was the object of the exercise, and log-probabilities do differ. For example, for the Dloop data, the estimates of the log-probability of the data (Bezier approx.) are -

Full model ï¿½ï¿½ï¿½-2301.10
reduced models 1 ï¿½ï¿½ï¿½-2256.31
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½2 ï¿½ï¿½ï¿½-2258.18
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½3 ï¿½ï¿½ï¿½-2259.13

ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½4 ï¿½ï¿½ï¿½ï¿½ï¿½-2256.20

(2 other models still running). For twoloci and Cytb, differences among the log-probs for different models are even greater.

Again - many thanks in advance!!!

Stuart

On 21/11/2013 11:45 PM, Peter Beerli wrote:

Stuart,

given that the full model is not able to give meaningful results, I suggest to look at a ï¿½reasonableï¿½ model (I am sure you have a prior belief which models are better than others [sometimes it helps to write them down before you do the experiment/comparison]) and check whether that is similarly problematic, I am sure models with few parameters will give less problems. This opens the discussion about what is the correct model? The model order (or probability) will depend on the different models that are tested AND how well the data can fit them. As a result, one is only able to say model x is the best (most plausible) model given the model set and data; but then, this is true even when not stated that explicitly.

I would love to read more about your buffalo story and how the models turn out.

Peter

On Nov 20, 2013, at 9:19 PM, Stuart Barker (Prof)<sba...@une.edu.au> ï¿½wrote:

ï¿½ï¿½

Dear Peter
I do not know how you do it - keeping up with everyone's queries re migrate. But I for one certainly appreciate your help and advice.

Our species is the swamp buffalo throughout SE Asia and China - some 30 populations grouped (on genetic data and geography) into 7 regional groups. We have full sequence on Cytb and Dloop for 913 animals, but use a sample of 20 from each region.

We hypothesise six different models of migration that we think may give clues to the region of domestication of this species, and its subsequent spread to its current distribution, and have run the full model as a baseline. Thus our interest is not in the full model per se, but in comparing gene flow models using Bayes factors.

I am running the models for the Dloop data alone, for Cytb alone, and for the combined sequences - curiosity to see how the results compare.

Thanks
Stuart

On 21/11/2013 10:24 AM, Peter Beerli wrote:

ï¿½ï¿½ï¿½ï¿½

Stuart,

Your run shows clearly that your single locus data will not support a consistent estimation of 49 parameters,
essentially you recover the uniform prior. Longer runs will make the posterior even more flat. I suggest that you
use the custom migration matrix and reduce the number of migration parameters considerably (Anecdotical: I have rarely seen examples from a single DNA locus that can estimate more than 16 parameters). If you tell me a little more about the geographic organization and what type of species the data represents, then I can help in more detail.

Peter

On Nov 20, 2013, at 5:09 PM, Stuart Barker (Prof)<sba...@une.edu.au> ï¿½ï¿½wrote:

ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½

Hi

I have a migrate run of one mtDNA sequence, 7 populations, full
migration model and the pdf outfile is attached.
In terms of general criteria for convergence, modes are within the 50%
credibility interval, reasonable agreement of mean and median,
acceptance ratio near 10% (maybe should be higher), and ESS all

parameters> ï¿½ï¿½ï¿½1000.

But none of the posterior distributions look very nice. What does this
mean in terms of interpretation?
Comment/advice will be much appreciated.

Stuart

--

Reply all

Reply to author

Forward

0 new messages