very different model fits using DAF and MAF from the same dataset running the same model

643 views
Skip to first unread message

hola

unread,
Mar 1, 2019, 10:05:45 PM3/1/19
to fastsimcoal
Hi,

I've been having trouble finding a fitting model for my observed DAF, so decided to check its corresponding observed MAF just for curiosity. Funnily enough, using the same model that I was testing on my DAF, I found out it is almost a perfect fit for my MAF. Why is that, what does that mean and how can I improve my DAF results? Thanks!

----DAF-----

1 observation
d0_0        d0_1        d0_2  d0_3        d0_4        d0_5        d0_6        d0_7
5065349.430963481 27363.39537684538 18036.28234265734 19494.35712898213 9652.913558663558 6284.895104895105 6641.73545066045 15248.99007381508

MaxEstLhood    MaxObsLhood
-405131.977    -300453.498


-----its corresponding MAF-----

1 observation
d0_0    d0_1    d0_2    d0_3    d0_4    d0_5    d0_6    d0_7
5080598.421037296 34005.13082750583 24321.17744755245 29147.27068764569 0 0 0 0

MaxEstLhood    MaxObsLhood
-234234.722    -234005.925


-----tpl----- (MUTRATE$ is also on place of recombination cause on my species mutation and recombination rates are the same)

//Number of population samples (demes)
1
//Population effective sizes (number of genes)
NCUR
//Sample sizes
7
//Growth rates        : negative growth implies population expansion
0
//Number of migration matrices : 0 implies no migration between demes
0
//historical event: time, source, sink, migrants, new size, new growth rate, migr. matrix
1 historical event
GENSANC$ 0 0 0 RESIZE$ 0 0
//Number of independent loci [chromosome]
1 0
//Per chromosome: Number of linkage blocks
1
//per Block: data type, num loci, rec. rate and mut rate + optional parameters
FREQ 1 MUTRATE$ MUTRATE$ OUTEXP


-----est-----

// Search ranges and rules file
// ****************************

[PARAMETERS]
//#isInt? #name   #dist.#min  #max
//all Ns are in number of haploid individuals
1 NCUR unif 10 1000000 output
1 NANC unif 10 1000000 output  
0 MUTRATE$ unif 9e-11 9e-7 output
1 GENSANC$ unif 1000 100000 output

[RULES]

[COMPLEX PARAMETERS]
0  RESIZE =  NANC/NCUR hide

hola

unread,
Mar 2, 2019, 4:23:38 PM3/2/19
to fastsimcoal
here's my command btw

fsc26 -t 1Pop.tpl -e 1Pop.est -d -x -M -q -c2 -B2 -n 1000000 -L 150 (-m for MAF)

likelihood stagnates at 30-50 iterations btw

Laurent Excoffier

unread,
Mar 3, 2019, 3:08:29 PM3/3/19
to fastsimcoal
Hi,
in your daf file, the last entry shoudl be zero and not 15248.99007381508 
How did you compute this SFS?
I note that the fit to your obs DSFS is really bad 
MaxEstLhood    MaxObsLhood
-405131.977    -300453.498

This indicate serious problem with your model for the daf estimation
laurent

Message has been deleted

hola

unread,
Mar 3, 2019, 8:52:58 PM3/3/19
to fastsimcoal
Thanks for your quick answer!

Yeah, I noticed that in all example files the last category is 0, so I tried manually changing the last value to 0 alongside the -0 option on the command line and it's a whole different story:
MaxEstLhood    MaxObsLhood
-65840.425    -65373.264

Why should the last category be 0, though? And how can I get it to be 0?


I did my SNP calling with VarScan2.3.9 and computed my DSFS from a vcf file with easySFS (python easySFS.py -i VCF.FILE -p pops -a --ploidy 1 --unfolded --proj=7 -o OUTPUT_DSFS).

the vcf file is version version4.1  and pops file file looks like this (first column is the name of each individual and second column is the population to which they belong):
Sample1    pop1
Sample2    pop1
Sample3    pop1
Sample4    pop1
Sample5    pop1
Sample6    pop1
Sample7    pop1

stella...@gmail.com

unread,
Mar 4, 2021, 6:46:14 AM3/4/21
to fastsimcoal
Hi,

Sorry for bringing this conversation up.
Why the last value of DAF has to be zero ?

I have an unfolded SFS generated from ANGSD and ran my fastsimcoal analyses with it (I just added the corresponding headers). I didn't know that observed DAF should have 0 as last value (I didn't notice it was the case in all example files). I looked again thoroughly in the manual and in the google-group, but it is nowhere found but here about that rule. This may explain the huge difference I got between the MaxObsLhood and MaxEstLhood values (more than 2x lower) for my models... 

Should I just simply replace the last value in my observed DAF by 0 ? Or should I add it to the first value (assuming the first value is the total count of monomorphic sites, including derived and ancestral states) ? 

Although not all, but I also notice now that some jointDAF example files has 0 entry value in the bottom-right of the matrix. Is there any rule as well ? 
I got better estimates of MaxEstLhood (~ -14050500) and MaxObsLhood (-13558614.521) when running with my observed jointDAF (also generated with ANGSD) so I suppose there are no specific changes to make to the 2D-SFS, but I want to be sure.


Thanks for your help!

Best, Stella

Laurent Excoffier

unread,
Mar 4, 2021, 9:49:25 AM3/4/21
to fastsimcoal
Hi,

derived alleles are alleles having occurred since the TMRCA. A DAF of 1 implies that the mutation has occurred above the TMRCA, and that this allele is therefore ancestral, and it should be counted as such (you shoudl add these alleles to the zero class)
best
Laurent

Reply all
Reply to author
Forward
0 new messages