Dirichlet distribution for initializing nucleotide-based models

9 views
Skip to first unread message

David Cerny

unread,
6:30 AM (16 hours ago) 6:30 AM
to slim-discuss
Hi everyone,

In the "default" / WF model, we can create two alleles with predefined properties, and use two successive draws from the Uniform(0, 1) distribution to randomly assign them to all individuals in the population, without having to hardcode particular allele frequencies into the simulation:

threshold = runif(1);

effect = rnorm( 1, 0.0, 0.05 )[0];
wildtype_allele = p2.haplosomes[0].addNewMutation(mt, 0.0, pos);
mutant_allele = p2.haplosomes[1].addNewMutation(mt, effect, pos); 

for (ind in inds) {
    // Assign to haploid genome 1
    if (runif(1) < threshold) {
        ind.haploidGenome1.addMutations(allele_A_mut);
    } else {
        ind.haploidGenome1.addMutations(allele_B_mut);
    }

    // Assign to haploid genome 2
    if (runif(1) < threshold) {
        ind.haploidGenome2.addMutations(allele_A_mut);
    } else {
        ind.haploidGenome2.addMutations(allele_B_mut);
    }
}

To extend this initialization setup to nucleotide-based models, it would be nice to have a way of randomly drawing the frequencies of all four possible alleles (A, C, G, T) under the constraint that they sum to 1. A natural solution would be to use a Dirichlet distribution – indeed, this is a common prior on base equilibrium frequencies in phylogenetic studies. However, it looks like this distribution is not available out-of-the-box in SLiM, as there is no mention of it in the manual or on this forum.

I was wondering if anyone has perhaps already come up with an Eidos-level solution, as this looks like a reasonably common problem to run into. If not, my understanding is that it should be possible to simulate a draw from a Dirichlet distribution using normalized draws from an appropriate gamma distribution, and this shouldn't be difficult to script, since rgamma() is available in SLiM.

Thank you,
--
David Černý

Ben Haller

unread,
9:48 AM (13 hours ago) 9:48 AM
to slim-d...@googlegroups.com
Hi David!  I'm not familiar with the Dirichlet distribution, but it looks like it is supported by the GNU Scientific Library, which Eidos uses under the hood.  So it could probably be added fairly easily.  But for drawing random nucleotides, couldn't you just use sample() or rdunif()?  Either of those could easily allow you to draw an integer in [0, 3] or a nucleotide in ACGT.  Am I misunderstanding what you're trying to do?

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University
--
SLiM forward genetic simulation: http://messerlab.org/slim/
Before posting, please read http://benhaller.com/slim/bugs_and_questions.html
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/2f055ff5-5d2c-49b6-9f5d-f9af0a359597n%40googlegroups.com.

Peter Ralph

unread,
10:59 AM (12 hours ago) 10:59 AM
to Ben Haller, slim-discuss
To draw from the Dirichlet(a1, ..., an), draw independent Exponential random variables with rates a1, ..., an, and then divide them all by their sum (so the result adds up to one):

P

David Cerny

unread,
12:16 PM (11 hours ago) 12:16 PM
to slim-discuss
Hi Ben, Peter,

Many thanks to both of you for your prompt replies!

But for drawing random nucleotides, couldn't you just use sample() or rdunif()?  Either of those could easily allow you to draw an integer in [0, 3] or a nucleotide in ACGT.  Am I misunderstanding what you're trying to do?

My understanding is that by doing this, I would either commit to equal base probabilities, or I would have to specify *particular* non-equal probabilities (using the weights arguments to sample()) to avoid that outcome. In contrast, if I draw base probability vectors for individual sites from a Dirichlet distribution, this allows every site to have (1) its own base probabilities that (2) may be non-equal without being explicitly set by the user.
 
 To draw from the Dirichlet(a1, ..., an), draw independent Exponential random variables with rates a1, ..., an, and then divide them all by their sum (so the result adds up to one):

 Thank you! Yes, that's exactly the kind of solution I had in mind as well. So if I understand correctly, something like the following should work?

function (float) getDirichletBaseFrequencies(float alpha) {
    if (size(alpha) != 4) {
        stop("The alpha vector has to be of size 4.");
    }

    freqs = float(4);
    sum = 0.0;
    
    for (i in 0:3) {
        freqs[i] = rexp(1, 1/alpha[i]);
        sum = sum + freqs[i];
    }

    for (i in 0:3) {
        freqs[i] = freqs[i] / sum;
    }

    return freqs;
}

// need to put this inside a loop iterating over sites
site_probs = getDirichletBaseFrequencies( c(1, 1, 1, 1) ); // flat Dirichlet
// not sure yet what function to use to assign these, but the general idea should be correct
sample(c("A", "C", "G", "T"), size(inds), replace = T, weights = site_probs);

Ben Haller

unread,
12:37 PM (10 hours ago) 12:37 PM
to slim-d...@googlegroups.com
Ah, I think I see.  The sample() function can indeed be used to actually draw the nucleotides in the end, but you want to derive the weights to be passed to sample() using the Dirichlet distribution.  Well, that seems useful, so I'm happy to simply add support to it in Eidos.  I've opened an issue at https://github.com/MesserLab/SLiM/issues/602.  If there's any further discussion, let's move it to there.  Thanks!


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


--
SLiM forward genetic simulation: http://messerlab.org/slim/
Before posting, please read http://benhaller.com/slim/bugs_and_questions.html
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages