Given that we are now starting a new part, it is worthwhile to repeat the actual mathematics that Kleinman claims to be responding to.
The following is my analysis of the probability mathematics that Kleinman has so utterly botched. He is, of course, quite free to critique the following analysis. First, I will start with the definition of terms both in the language of probability (‘event’, ‘trial’, ‘probability of the event’, ‘binomial mass probability’) and the language of biology (‘examined population’, ‘mutant state’, ‘not-mutant-state’, ‘population frequency’).
Mutation – a change in genetic or inherent state from an original state to a new and different state.
n -- is the population size of ‘trials’ being examined for the binomial event; in mutation analysis the examined population size is a stand-in for the number of individual generations of time being examined for the binomial event. If one examines subpopulations over several generations, one is still really only interested in the total number of individuals examined. If N is the mean number of individuals examined per generation and g is the number of generations examined, then the total number of individuals/trials examined, n, is N*g. But in most cases discussed here only one generation is examined for the presence of the event/mutation-of-interest.
p – is the probability or frequency of the relevant binomial event in a population. That is, it is the frequency of the binomial event per trial regardless of whether the mutant is new or old. In mutation analysis, the binomial distinction of interest is always mutation(s)-of-interest or not-mutation(s)-of-interest.
If one is interested in mutations that can be selected for or against, one is interested in “functional mutation”; that is one is interested in a mutation that has a distinctive detectable (by the environment as well as by the observer) phenotypic effect. A phenotypic difference alone is not an indicator that there is going to be selection, so phenotypic difference that is relevant is that observed in some specified environment. Such mutations typically are described by their functional qualities, which also indicate the environments in which they are selectable. For example, penicillin-resistant, able-to-use-citrate, lac+. Any trait that has a significant beneficial effect or a detrimental effect relative to the alternatives in a specified environment is a change using the “functional” definition of mutation.
Typically, one uses the selective conditions to identify such mutants regardless of whether they are due to mutations at a single nt site, due to a specific base change, or even due to change in a specified gene (although one often subsequently categorizes them by the affected gene and type of mutation).
If one is uninterested in selection for phenotype and wants to look instead at “structural mutation”, then one chooses a binomial state on that basis. Either the ‘not-mutant state’ is the original gene sequence and the ‘mutant state’ includes all variants from that state or only one of those sequence changes is considered the ‘mutant state’ and any sequence other than that one is the ‘not-mutant’ state.
If one is interested in the binomial discrimination of ‘new-double-mutants’ and ‘not-new-double-mutants’, then p is the product of the individual probabilities of the individual mutant states. As there are a number of alternative ways of generating new-double-mutants, one must consider all the various mutually exclusive ways of generating new double-mutant states by adding them together to get the net probability of new-double-mutants.
m – the rate or frequency of “new” mutation to the defined binomial ‘mutant state’ from the ‘not-mutant state’ in a population, measured as the frequency of (new events or mutants) per (trial or individual examined) in a population.
The difference between p and m is that m is the lowest frequency of the mutant state and only occurs in a population that initially lacks the mutant state, either because the mutant state is selected against or the population started out with no individuals with the mutant state. p can, in theory, be any value between m and 1 and can change between these two both/either by virtue of selection for the mutant state or by neutral drift of one of the states toward fixation by chance alone.
Unlike the dear Dr. Dr., I will make a clear distinction between the probability of an ‘event’ per trial, p, and the probability of having at least one (one or more) ‘events’ in a population of n trials (the binomial mass probability), which I will call P to distinguish it from the probability of the ‘event’.
The argument that the dear Dr. Dr. makes, and the equation he tries to ‘derive’, although badly, although he appears to be ignorant of the fact, is not a solution of the binomial probability of an event per trial (the probability of new mutation, aka the mutation rate). It is, instead, an attempt to solve the binomial mass probability equation for one or more of new double-mutants in a population of size n.
The best analogy, with one major caveat, to what the dear Dr. Dr. is actually trying to solve is the binomial mass probability of rolling two dice (red and blue) at one time and determining the probability of rolling a sum of 4. The major caveat is that for rolling dice, each trial or roll produces results that are independent of any previous roll of the die. That is not the case for mutation. The probability of any observed individual having a mutation is NOT independent of what its parent had; it is, in fact, highly dependent on what its parent had. If the parent was not-mutant at a particular locus, the probability of its progeny having a mutant state is quite low. OTOH, if the parent was mutant at that locus, the probability of its progeny having the mutant state would be quite high.
If the mass probability for one or more dice rolls that produce a sum of 4 were to be correctly analyzed, one needs to determine the number of different mutually exclusive ways one can generate that sum with two dice and determine the probability per trial (roll) of generating that particular result, that is, p. Then use those values to determine the total probability of rolling a 4 with two dice. For a sum of 4, there are 3 different ways to generate that sum:
1) Roll a 1 on the red die (r1) and a 3 on the blue (b3)
2) Roll a 3 on the red die (r3) and a 1 on the blue (b1)
3) Roll a 2 on the red die (r2) and a 2 on the blue (b2)
In each case, we consider the ‘event’ to be the stated type of face (e.g., r1 or r2) and all other possibilities to be the ‘not-event’, making this analyzable as a binomial analysis.
The mass probability for generating a 4 by mechanism 1) is 1-[1-(pr1*pb3)]^n, where pr1 and pb3 are the respective probability per trial of rolling a r1 and b3 and n is the number of trials (a trial is a single roll of both dice). pr1*pb3 = the joint probability of rolling both a r1 and b3. This is where the joint probability of events is determined and used in the analysis. Obviously with dice, pr1 = pb3 = 1/6. Thus the probability of the double-event, pr1 and pb3, is 1/36, the product of the two individual probabilities.
Similarly, the mass probability of rolling two dice and getting r3 and b1 is also 1/36 as is the probability of rolling two dice and getting r2 and b2. Thus the binomial mass probability of rolling two dice and getting a sum of 4 in n rolls of the two dice, P(a+b = 4) is:
1-[1-(pr1*pb3)]^n + 1-[1-(pr3*pb1)]^n + 1-[1-(pr2*pb2)]^n
In this example, of course all three joint probabilities of a double-event producing a 4 are identical. However, that is not the case for an analysis of the different mutually exclusive ways of generating a new double-mutant. There are four different ways to generate a new double-mutant:
1) New mutation to the mutant-of-interest-state (A’ at an A locus which was A in the parent) in a cell whose parent already had a B’ mutation, rather than the B not-mutant.
2) New mutation to the mutant-of-interest-state (B’ at a B locus which was B in the parent) in a cell whose parent already had an A’ mutation, rather than the A not-mutant.
3) Simultaneous mutation to the mutant-of-interest-state (B’ at a B locus and A’ at the A locus) in a cell whose parent was not-mutant at both sites.
4) Recombination between an A’;B parent cell and an A;B’ parent cell producing an A’;B’ progeny from these pre-existing variants.
I am intentionally assuming that the parent generation had no A’:B’ individuals (as does Kleinman). If it did, the selective advantage/disadvantage of the double-mutant state would be more important than new mutation.
The mass probability one or more new double-mutants in a population of trials of size n from mechanism 1) is 1-[1-(pB’*mA)]^n]. pB’ is the frequency of the current generation that has B’ by descent from older mutants in the parent generation. This is the frequency of the allele that is above the level produced each generation by new mutation. It doesn’t matter whether the reason for the higher level is because that allele has been selected for in past generations or has reached this level by chance alone (neutral drift). It also doesn’t matter whether the selection, when there has been selection, occurred in the immediate past generation or several generations past. mA is the frequency of new mutation to A’ from cells that were in the not-mutant A state in the parent generation. pB’*mA is the joint probability of a double-mutant via this mechanism.
Anyone who wants to derive the binomial mass probability for one or more events-of-interest in n trials can do so by going to
http://en.wikipedia.org/wiki/Binomial_distribution
and solving the first equation there for k = 0 (the mass probability that one will not see any events in a population of size n) and subtracting that result from 1.
Similarly, the mass probability of one or more new double-mutants in a population of trials of size n from mechanism 2) is 1-[1-(mB*pA’)]^n].
The mass probability for one or more new double-mutants in a population of size n from mechanism 3) is 1-[1-(mB*mA)]^n].
The probability of generating a double-mutant by recombination, mechanism 4), depends on the frequency of recombination as well as the frequency of the two single-mutants in the parent population. So it will be some frequency of recombination, fr, times the probabilities or frequencies, pA’ and pB’, of the two single mutant alleles in the parent population. Thus fr*pA’*pB’. The exact value of fr depends on the mechanism of recombination.
The sum of these possible mechanisms represents all the mutually exclusive ways one can get new double-mutants. The only other qualification in these equations is that pA’ + pB’ must be less than or equal to 1. This is just a mathematical way of saying that we have assumed that the population has no double-mutants in the parent population. Again, if the parent population has double-mutants, selection will play a more important role than new mutation to double-mutants.
Final equation: Prob of new double-mutant, P(A’B’) in n trials =
1-[1-(pB’*mA)]^n] + 1-[1-(mB*pA’)]^n] + 1-[1-(mB*mA)]^n] + fr*pA’*pB’
assuming that pA’ + pB’ < 1.
NOTE: Not all of these terms will be significant in every real case. If there is little or no recombination, that term may drop out or be insignificant. That is often the case in experimental bacterial and some viral analyses. If pA’ is close to mA, then terms with pA’ (which is the observed probability of A minus new mutation) will drop out. Similarly, so will terms with pB’ when that is close to mB.
In a subsequent post I will go through the errors I remember in Kleinman’s analysis (there may be some that I miss because he has been inconsistent in his errors and what he claims his equation represents).
Kleinman apparently no longer responds to my mathematical analyses or to any of my posts. That is certainly because he can’t without making it obvious that he is an ignoramus. He must think it is better to bluster and repeat nonsense mantras that are lies. That doesn’t mean I will not continue to post my analysis of what Kleinman claims he “derived” nor, as will be presented shortly, will I stop pointing out all the flaws in his ‘equations’ and his faulty understanding of even his own equations.