the usage of model_choice.r

154 views
Skip to first unread message

shu-ping

unread,
Mar 16, 2011, 1:09:55 PM3/16/11
to popABC
Hi all:

I tried to test two models-with and without gene flow- with
popABC.

And I find the model_choice.r on popABC website. Because I have no
knowledge in R -script,I don't know which command-line I should type.

Or can anyone provide me the other way to do the model choice?

The second question is about the .rej file. I'm not real sure
the meaning of reject file.

Does each number in the column indicate the distance between real and
simulation data?


Thanks!

shu-ping

Joao Sollari Lopes

unread,
Mar 24, 2011, 2:21:25 PM3/24/11
to popABC
Hi Shu-ping,

For the R script I am afraid you will have to install the R run-time
environment (http://www.r-project.org/).
However you will only need R to run the ABC-regression. Instead you
can run the ABC-rejection algorithm provided by the popABC package.
Just treat the parameter indicating the model used to simulate data
(e.g. with- and without-migration) has a parameter to be estimated.

The proportion of points accepted by the rejection-step are
approximations of the posterior probability of each model.

The rejection file contains informations on the simulations that were
accepted by the rejection-step (i.e. the simulation data that were
closest to the data in study). The meaning of each column is detailed
in the report file created when generating the simulated data (.txt
files):

e.g.

OUTPUT FILES:

-------------------------------

Data file: examples/toydata.dat

11 parameters [top|avMutS|sdMutS|avRecS|sdRecS|t1|Ne1|Ne2|NeA1|mig1|
mig2]

9 summstats [pi1|pi2|S1|S2|k_S1|k_S2|pi1-2|S1-2||k_S1-2]


Each line represents one simulated data. The first values correspond
to the parameter values used to simulate the data, and the second set
of values correspond to the summary statistics that summarize that
data.

Hope that helps,
Joao

shu-ping

unread,
Mar 29, 2011, 6:38:06 AM3/29/11
to popABC
Dear Joao:

I apologize for my unclear description about my question.
I know I should have used R-package, but I am not pretty sure how to
adapt the script.

After reading the content in model_chioce.r, I think I should combine
two .dat files which I want to compare, and then do the rejection
step.
Then I can use model_chioce.r, and type the command-line --
model_choice("...rej","...tar").

#Bar plot for model-choice (rejection)
top1<-length(which(abc.rej[,1]==1))/length(abc.rej[,1])
top2<-length(which(abc.rej[,1]==2))/length(abc.rej[,1])
(Original scrip is ...[,2]==1))/length(abc.rej[,2]) I have changed 2--
>1 because [,2] correspond the number of column which contain the
model labels.)

print(c(top1,top2))
write(c(top1,top2),"modelprob_rej.txt")
barplot(c(top1,top2),names=c("Model1","Model2"))
abline(h=1.0/2,col="red")

This part has worked well.

#Bar plot for model-choice (regression)
res.topol<-calmod(target,abc.rej[,1],abc.rej[,17:55],
1,rej=F)
(My model has 16 parameters and 39 summary statistics.)

print(res.topol$x2)
write(res.topol$x2,"modelprob_reg.txt")
barplot(res.topol$x2,names=c("Model1","Model2"))
abline(h=1.0/2,col="red")
print("model-choice done.")

}


Then system returned the error code as the following:
error in calmod target, abc.rej[, 1], abc.rej[, 17:55], 1, rej = F) :
subscript out of bounds

I noticed that the function in calmod should have 7 elements ,but
there were only 6 in model_choice.r.
calmod <- function(target,x,sumstat,tol,gwt,rejmethod=T)
=====>calmod.r
res.topol<-calmod(target,abc.rej[,1],abc.rej[,17:55],1,rej=F)
=====>model_choice.r

Is this the cause of the problem? Did I miss any important steps or
type the wrong command-line?


Another question about rejector step.
I combined two .dat file and did the rejection step, and I got 80%
data point belong to model 1 and 20% belong to model 2.

I'd like to calculate the bayes factor, B=P(model1|D)/P(model2|D), of
these two models.

I think the equation is B= 80%/20% = 4.

Is that right?

I really appreciate your patience and kindly help!

Shu-ping

Joao Sollari Lopes

unread,
Mar 29, 2011, 7:39:09 AM3/29/11
to popABC
Hi Shu-ping,

You seem to be doing it fine.

The only thing you need to assure is that both .dat files have the
same number of parameters and summary statistics so that when you join
the files you end up comparing like with like.

Also, the .trg file should have the same number of summary statistics
as the .dat files.

The use of the function should be as following:

>model_choice("file.trg", "file.rej")

which is the opposite that you wrote in the email (I take that was a
spelling mistake, since the rejection step run fine).

Your modifications to the script seems fine as well. But the error you
are getting seems to indicate that there is a disagreement between the
vector sizes of either target, x or sumstat.

The parameter that is missing in the calmod function call is the gwt.
But that is fine, since the missing of this parameter is contemplated
in the function.

I'm not sure what the problem ise, but my advice is for you to play
around with the R script and you might find it.

As for the calculation of the Bayes factor, it seems fine for me.

Let me know if you find out the problem.
Best,
Joao

Vishnupriya

unread,
Mar 29, 2011, 9:51:16 AM3/29/11
to popABC
Dear Joao,

I have a completely different question pertaining to the use of the
model choice file. What if I want to compare one population and two
population models, and these two models have differing number of
summary statistics?

Say, one has 11 summary statistics and the other has 6.

Looking forward to hearing from you

Vishnupriya

On Mar 29, 1:39 pm, Joao Sollari Lopes <j.sollari.lo...@gmail.com>
wrote:

shu-ping

unread,
Mar 29, 2011, 9:09:48 AM3/29/11
to popABC

Oh!! These are really useful!
According to your advices your advices, I typed
"model_choice("file.trg", "file.rej")",and it did work!
Thanks you very much!

Shu-ping

Joao Sollari Lopes

unread,
Mar 29, 2011, 10:55:59 AM3/29/11
to popABC
Hi Vishnupriya,

The use of the summary statistics is to be able to compare your data
set in study with simulated data. But you need to ensure that you are
comparing values that correspond to the same thing.

If you have a data set that is comprised of samples of two populations
you will calculate summary statistics that reflect these two
populations.

If you which you can simulate data belonging to two related
populations by considering a three-population model. But when you are
comparing summary statistics make sure you pick the correct ones from
the simulated data.

So, you can join data sets belonging to different models, but you have
to make sure that the columns correspond to the same summary
statistics. Also, when comparing the simulated data with your data you
have to make sure you are comparing the correct summary statistics.

In summary, you can perform a model-choice test to models with
different parameters and different summary statistics as long as you
have a set of summary statistics that are exactly the same in both
real data and simulated data.
The only caveat that I can think of of the top of my head is that you
may require different number of simulations to properly cover the
joint distribution of the different models (but when in doubt of the
required number of simulations, just use loads!).

Now, for parameter estimation it is a different, and more complex
story (that is worth a different topic).

Joao

Vishnupriya Kolipakam

unread,
Mar 31, 2011, 3:17:11 AM3/31/11
to pop...@googlegroups.com
Dear Joao, 

Thank you :) That makes a lot of sense. 

Vishnupriya

--
You received this message because you are subscribed to the Google Groups "popABC" group.
To post to this group, send an email to pop...@googlegroups.com.
To unsubscribe from this group, send email to popabc+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/popabc?hl=en-GB.




--
All you touch and all you see.. is all your life will ever be..
Reply all
Reply to author
Forward
0 new messages