rossum
unread,Nov 18, 2011, 10:57:56 AM11/18/11You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to
It is some time since I have posted this, so I thought it was time for
another outing. So, once again, we have ...
rossum
The Evolution of Boojmase
=========================
Creationists often talk about the impossible odds of evolution
producing a working protein. When they, rarely, show their
calculatins, they always have the same error in them. Evolutin is a
process including both random mutation and natural selection. The
creationist's models invarialy omit the natural selection part, and
only include the random mutation bit, what they often describe as a
"tornado in a junkyard", following Hoyle. That model is incorrect.
Using a correct model gives very different results.
Their naive "protein probability" calculation uses a very crude model
of evolution, basically it assumes the evolution of the whole protein
in a single large step. This gives a chance of one in 20^100 for a
protein with 100 amino acids. 20^100 comes to 1.27 x 10^131 so the
chance of the protein appearing in one step is 1 in 1.27 x 10^131.
The usual calculation is then to halve this number and say that in a
species that reproduces annually the protein is only likely to appear
after 6.35 x 10^130 years at 50% probability, which is far longer than
the age of the earth.
The model implied in the naive calculation is not the model used by
evolution. Under evolution small changes arise randomly, and may be
beneficial or deleterious. Once arisen they are selected very
non-randomly with the deleterious changes disappearing and the
beneficial changes spreading through the population. This has the
effect of spreading out the development of the protein so that instead
of one very large and unlikely jump there are a lot of smaller jumps
which are individually more likely. It also allows for the ratcheting
effect of natural selection, whereby any deleterious mutations tend to
disappear from the population and any beneficial mutations tend to
spread through the population and still be around when the next
mutation comes along.
Using a better model gives a time to evolve a protein with 100 amino
acids of just over two million years. The evolutionary model given
below for the evolution of boojumase is more complex than the naive
model, so it will need more complex calculations.
1 The Scenario
==============
Momerathius vulgaris, the common Mome Rath, is a sessile marine filter
feeder. It lives for one year, reproduces and dies. Its normal food
is a protist, Snarkius snarkius, the snark. However a proportion of
snarks are actually boojums, Snarkius boojum. Boojums have a
different cell wall, so are indigestible, most passing through the
Mome Rath's gut undigested. If there are too many boojums in the Mome
Rath's diet then it will softly and suddenly vanish away.
Mome Raths have an enzyme to digest the cell walls of snarks:
snarkase. The gene for snarkase is duplicated in the Mome Rath's
genome. This means that the second copy of the snarkase gene, called
"snk2", is available to evolve into a new gene to code for boojumase,
which will allow the Mome Rath to digest boojums as well. Any Mome
Rath possessing even a partially effective boojumase will have an
advantage in that it will have more food available and will have a
reduced chance of softly and suddenly vanishing away.
I will calculate the likely time to evolve a gene for boojumase from
the second copy of the snarkase gene. The answer turns out to be
2,096,000 years, a long time but certainly not impossible. This
figure is confirmed to within 2% by computer modelling.
2 Assumptions
=============
1 The population of Mome Raths is stable at about 10 million
individuals. Predation limits the population so the effectiveness of
boojumase is not a factor.
2 Both snarkase and boojumase contain 100 amino acids.
3 In a boojumase there is only one effective amino acid allowed in
each of the 100 positions, any of the nineteen other amino acids is
ineffective in that position.
4 Snarkase and boojumase are very different so their initial match is
just 5% (1 in 20). This is the level of matching expected from any
two random series of 100 amino acids.
5 I will only deal with mutations that have a real effect on the snk2
gene, called "significant mutations". This effect may be good or bad,
but there must be a real effect. Neutral mutations are ignored,
including replacing one ineffective amino acid with a different, but
still ineffective, amino acid in a given position. Mutations
affecting other genes are also ignored.
6 Every four thousand years there is a significant mutation in the
snk2 gene of one Mome Rath in the population of ten million. The
resulting change is random and may be good or bad. In each of the 100
positions replacing an ineffective amino acid with an effective amino
acid is good. Replacing an effective amino acid with an ineffective
one is bad. For the purposes of this model any beneficial mutation
that appears and then gets eaten by a predator or otherwise fails to
reproduce is ignored.
7 The effectiveness of a boojumase at digesting boojums is equal to
the percentage of effective amino acids it contains. Thus since
normal snarkase has 5% effective amino acids it can digest 5% of
boojums during their passage through the mome rath's gut. A boojumase
with 20 effective amino acids would be 20% effective and would digest
20% of boojums and so forth.
8 Each 1% of increased effectiveness of boojumase gives a Mome Rath a
1% advantage in reproduction. Similarly a 1% decrease in
effectiveness will give a 1% decrease in the effectiveness of
reproduction. This includes the effect of the changed probability of
the Mome Rath softly and suddenly vanishing away.
3 Preliminary Calculations
==========================
3.1 Beneficial Mutations
------------------------
First I will look at the spread of beneficial mutations. On average
each Mome Rath will reproduce exactly one Mome Rath that survives to
maturity and reproduces - this keeps the population stable. A Mome
Rath with a single mutation for a better boojumase will reproduce 1.01
Mome Raths (1% better than average), while one with a mutation for a
worse boojumase will only reproduce 0.99 Mome Raths (1% worse than
average). Better and worse will be relative, since the population
will stay at 10 million Mome Raths as the improved boojumase evolves.
How long will it take a single beneficial mutation to spread through
the population?
Initially the number of Mome Raths with the beneficial mutation will
increase from the original single individual as the powers of 1.01.
After 1551 years they will form half of the population, about 5
million (1.01 ^ 1551 = 5,040,234). From this point they will be the
normal population while those without the mutation will be the
minority. The minority will decrease as the powers of 0.99. After a
further 1604 years those without the mutation will be extinct, less
than half an individual in the population: 5,000,000 x (0.99 ^ 1604) =
0.4986802).
This means that it will take about 1551 + 1604 = 3155 years to replace
a population with the old snk2 gene with a new population with the new
improved snk2 gene.
3.2 Deleterious Mutations
-------------------------
If a deleterious mutation occurs then it will decrease as powers of
0.99 since it will be 1% less efficient at reproducing. From an
initial population of one single individual it will fall below 0.5 in
69 years (1 x 0.99 ^ 69 = 0.499837). The deleterious mutation will be
eliminated by natural selection in less than 100 years.
3.3 Mutation Rates
------------------
I have assumed one significant mutation every four thousand years. Is
this a reasonable assumption? The mutation rate for mammals is about
3 x 10 ^ -8 mutations per base pair per generation. Applying this
rate to M. vulgaris we would expect about 3 x 10 ^ -6 mutations in the
100 base pair snk2 gene per individual. With 10 million individuals
in the population this is 3 x 10 ^ -6 x 10 ^ 7 = 30 mutations in the
snk2 gene over the whole population in each generation. Naturally,
many of these mutations will be neutral. Over 4000 years there will
be 4000 x 30 = 120000 mutations in the snk2 gene over the whole
population.
From this it would appear that the rate of one significant mutation
every 4000 years is probably an underestimate. Nevertheless I will
retain it in order to avoid overlapping mutations and so simplify the
calculations. This means that the time calculated will be longer than
it would be if a more realistic rate for significant mutations was
used.
4 Mutate!
=========
4.1 The First Three Mutations
-----------------------------
Every four thousand years there is a significant mutation in the snk2
gene of one individual Mome Rath. Since the initial snarkase has 5%
effective amino acids, the first significant mutation will have a 95%
chance of being beneficial; switching an ineffective amino acid to an
effective one. It will have a 5% chance of being deleterious;
switching an effective amino acid to an ineffective one.
Deleterious mutations will disappear in 69 years, so they will be gone
before the next significant mutation in 4,000 years. Beneficial
mutations spread through the entire population in 3,155 years, so the
entire population will have the improved boojumase before the next
significant mutation. This means that significant mutations will not
overlap.
Deleterious mutation will disappear and so will not change the
probabilities for the next mutation in 4,000 years; the boojumase will
be unchanged. Beneficial mutations will be preserved and so will
increase the probability of a subsequent deleterious mutation by 1%
and reduce the probability of a beneficial mutation in 4,000 years
time. Boojumase will now be 1% more effective than before.
Drawing up the first three mutations in tables (best in a monospaced
font like Courier):
Mutation 1 (Year 4000) Mutation 2 (8000) Mutation 3 (12000)
D = 5% DD = 5% x 5% DDD = 5% x 5% x 5%
B = 95% DB = 5% x 95% DDB = 5% x 5% x 95%
BD = 95% x 6% DBD = 5% x 95% x 6%
BB = 95% x 94% DBB = 5% x 95% x 94%
BDD = 95% x 6% x 6%
BDB = 95% x 6% x 94%
BBD = 95% x 94% x 7%
BBB = 95% x 94% x 93%
Here D is a deleterious mutation and B is a beneficial mutation.
Taking an example, BBD in the seventh row of the third table, there is
a beneficial mutation followed by another beneficial mutation followed
by a deleterious mutation. The first beneficial mutation has a
probability of 95%, the second beneficial mutation only has a
probability of 94% since the boojumase now has 6 effective amino acids
and 94 ineffective ones as it was improved by the first beneficial
mutation. The probability of the final deleterious mutation is 7%
since there are seven effective amino acids in the boojumase after two
beneficial mutations.
4.2 Average Expected Effectiveness
----------------------------------
Tracing this through many mutations will result in huge tables: 2 ^
100 rows after 100 mutations. In order to proceed I am going to
simplify the calculation by working out a single "Average Expected
Effectiveness" (AEE) for the effectiveness of the boojumase. Doing
some more calculations on the table for the third mutation gives:
Prob. Effect. P x E
DDD 0.0125% 5% 0.000625%
DDB 0.2375% 6% 0.014250%
DBD 0.2850% 6% 0.017100%
DBB 4.4650% 7% 0.312550%
BDD 0.3420% 6% 0.020520%
BDB 5.3580% 7% 0.375060%
BBD 6.2510% 7% 0.437570%
BBB 83.0490% 8% 6.643920%
-------- ---------
100.0000% 7.821595% = AEE
Here the "Prob." column is the probability of that particular outcome
for the three mutations; for example the probability of DBD is 5% x
95% x 6% = 0.2850%. The sum of the probabilities is 100% as a check
on the calculation. The "Effect." column is the effectiveness of the
boojumase after the mutations; start at 5% and add 1% for each B, so
DDD is still at 5% effectiveness while BBB is at the maximum possible
8% effectiveness after three beneficial mutations. The "P x E" column
is the previous two columns multiplied together and adjusted to a
percentage. Each entry is the proportion of the effectiveness that
this row contributes to the overall expected effectiveness of the
boojumase. The sum of this column is the "average expected
effectiveness" (AEE) that I wish to calculate: 7.82 to two decimal
places.
4.3 The Fourth Mutation
-----------------------
Coming into the fourth mutation the average expected effectiveness
(AEE) is 7.82. This gives a 7.82% chance of a deleterious mutation
and a (100.00 - 7.82) = 92.18% chance of a beneficial mutation. The
table looks like:
Mutation 4 (16000) Initial AEE = 7.82%
Prob. Effect. P x E
D 7.82% 7.82% 0.61%
B 92.18% 8.82% 8.13%
------- -----
100.00% 8.74% = new AEE
The probability of a deleterious mutation, D, is the AEE, 7.82. The
probability of a beneficial mutation, B, is (100 - AEE), 92.18%. The
effectiveness of the boojumase after a deleterious mutation is
unchanged, the AEE, 7.82%. The effectiveness of the boojumase after a
beneficial mutation is increased by 1%, (AEE + 1), 8.82%. The P x E
column has AEE x AEE / 100 in the D row and (100 - AEE) x (AEE + 1) /
100 in the B row. In each case the "/ 100" is to get the P x E column
back into a percentage. The new AEE is the sum of these two values:
(AEE x AEE / 100) + ((100 - AEE) x (AEE + 1) / 100). This is the new
value of the AEE to go forward to the next mutation.
4.4 The Fifth Mutation
----------------------
From the discussion of the fourth mutation there is a formula for
calculating the AEE after fifth mutation. The formula is:
New AEE = (AEE x AEE / 100) + ((100 - AEE) x (AEE + 1) / 100)
This can be simplified to:
New AEE = ((99 x AEE) + 100) / 100
Putting the AEE of 8.74 coming into the fifth mutation into the
formula gives 9.65 to two decimal places for the AEE after the fifth
mutation.
4.5 And so on...
----------------
The simplified formula from section 4.4 can be used to step from
mutation to mutation. The calculation is best shown in a table. Rows
are missed out purely for reasons of space. It is simple to set up
the whole thing on a spreadsheet.
Year Mutation AEE after
12000 3 7.82%
16000 4 8.74%
20000 5 9.65%
40000 10 14.08%
80000 20 22.30%
100000 25 26.11%
200000 50 42.52%
500000 125 72.95%
800000 200 87.27%
1000000 250 92.30%
2000000 500 99.38%
2096000 524 99.51%
This shows that after a million years of evolution and 250 significant
mutations M. vulgaris has a snk2 gene that codes for a boojumase that
is on average 92% effective. 92 of the hundred amino acids in the
boojumase are effective, on average only eight are ineffective. After
two million years 99 of the hundred amino acids are effective with an
average of one ineffective amino acid.
The table also shows that as the boojumase becomes more effective,
random mutations are more likely to be deleterious, so it takes longer
between beneficial mutations.
5 Result
========
After an average of 2,096,000 years and 524 significant mutations the
Mome Raths have evolved the most effective boojumase possible as there
is less than half an amino acid that is ineffective on average. The
Mome Raths have adapted to an environment containing boojums and will
not softly and suddenly vanish away.
This average figure of 2,096,000 years to evolve a protein with 100
amino acids compares with the 6.35 x 10^130 years calculated from the
less realistic naive model that failed to account for the non-random
element of natural selection.
6 Computer Modelling
====================
Putting this model into a computer program and running it through to
the evolution of a 99.5% effective boojumase a million times gave the
results:
Mean Mutations Std Deviation
513.74 125.97
Running the program three more times, each with a million repetitions
gave:
Mean Mutations Std Deviation
513.65 125.89
513.71 125.79
513.70 125.86
This seems to indicate that the calculations above are a little
pessimistic, and the average should be 514 mutations, taking 2,056,000
years instead of 524 mutations taking 2,096,000 years. The error is
less than two percent. No doubt a better mathematician or
statistician than me could explain the discrepancy.
7 The Boojumase Model
=====================
This is a simple model, deliberately so in order to simplify the
calculations. However it is more complex and closer to the real
situation than the model implied by the naive probability calculation.
The naive model covers the random nature of mutations but it does not
include either the highly non-random process of natural selection or
the ratcheting effect of small changes over the generations in a
population and so gives a misleading result. The boojumase model by
including random mutations, the non-random element of natural
selection and the ratchet effect gives a less misleading result.
The boojumase model is intended as a learning aid. For that reason it
is simplified to remove all calculus and more advanced mathematics.
It is intended for an interested lay audience, not for publication in
Science or Nature.
I have deliberately made life difficult for the model by starting with
the 5% random match between snarkase and boojumase, by allowing only
one amino acid to be effective at each position and by picking a long
interval between significant mutations. This is to avoid criticism
that the model is biased in favour of a short time to evolve the
protein; if anything the model is biased towards a long time to evolve
the protein.
The model is by no means perfect. Possible improvements to it are:
- To improve the calculation of the time taken to spread a beneficial
mutation through the whole population. I tried this myself and got a
figure of 3293 years; not different enough to warrant the extra
complexity and with no effect on the overall result as it is still
less than 4,000 years.
- To take into account sexual reproduction in the spreading of
beneficial mutations.
- Run the exact calculation of tables for more than three mutations
before switching to the AEE.
- Explain the transition from the exact tables to the AEE better than
I have in section 4.2.
- Allow mutations to overlap so a second significant mutation might
occur before the previous mutation has spread through the whole
population. This would allow a more realistic rate of significant
mutations.
- Look at mutation rates in real life and make a better assumption
for the interval between significant mutations. I picked 4000 years
purely to avoid complications with overlapping mutations.
Feel free to take up this model, clean it up and make it a better
reflection of reality. If you do so please bear in mind its purpose
and do not complicate it too much; remember the target audience.
8 Bibliography
==============
Lewis Carroll: Jabberwocky
Lewis Carroll: The Hunting of the Snark